You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/16 14:38:04 UTC

[GitHub] [airflow] stoiandl opened a new issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

stoiandl opened a new issue #14261:
URL: https://github.com/apache/airflow/issues/14261


   **Apache Airflow version 2.0**:
   
   **Kubernetes version 1.18.14**
   
   **Environment**: Azure - AKS
   
   
   **What happened**:
   
   I have just upgraded my Airflow from 1.10.13 to 2.0. I am running it in Kubernetes (AKS Azure) with Kubernetes Executor. Unfortunately, I see my Scheduler getting killed every 15-20 mins due to Liveness probe failing. Hence my pod keeps restarting.
   
   **Liveness probe**
   
   ```
   import os
   os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
   os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
   from airflow.jobs.scheduler_job import SchedulerJob
   from airflow.utils.db import create_session
   from airflow.utils.net import get_hostname
   import sys
   
   with create_session() as session:
     job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
         SchedulerJob.latest_heartbeat.desc()).limit(1).first()
   
   sys.exit(0 if job.is_alive() else 1)
   ```
   
   **Scheduler logs**
   ```
   [2021-02-16 12:18:22,422] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-16 12:18:22,426] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-16 12:18:22,426] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-16 12:18:22,426] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-16 12:18:22,427] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-16 12:18:22,427] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-16 12:18:22,427] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-16 12:18:22,439] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
   [2021-02-16 12:18:22,452] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12819)
   [2021-02-16 12:18:22,460] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor490-Process' pid=12819 parent=9286 stopped exitcode=0>
   [2021-02-16 12:18:23,009] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12826)
   [2021-02-16 12:18:23,017] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor491-Process' pid=12826 parent=9286 stopped exitcode=0>
   [2021-02-16 12:18:23,594] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12833)
   
   ... Many of these Disposing DB connection pool entries here
   
   [2021-02-16 12:20:08,212] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor675-Process' pid=14146 parent=9286 stopped exitcode=0>
   [2021-02-16 12:20:08,916] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14153)
   [2021-02-16 12:20:08,924] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor676-Process' pid=14153 parent=9286 stopped exitcode=0>
   [2021-02-16 12:20:09,475] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14160)
   [2021-02-16 12:20:09,484] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor677-Process' pid=14160 parent=9286 stopped exitcode=0>
   [2021-02-16 12:20:10,044] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14167)
   [2021-02-16 12:20:10,053] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor678-Process' pid=14167 parent=9286 stopped exitcode=0>
   [2021-02-16 12:20:10,610] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14180)
   [2021-02-16 12:23:42,287] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
   [2021-02-16 12:23:43,290] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
   [2021-02-16 12:23:43,494] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
   [2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=14180, status='terminated', started='12:20:09') (14180) terminated with exit code None
   [2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=9286, status='terminated', exitcode=0, started='12:13:35') (9286) terminated with exit code 0
   [2021-02-16 12:23:43,506] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
   [2021-02-16 12:23:43,506] {scheduler_job.py:1296} INFO - Exited execute loop
   [2021-02-16 12:23:43,523] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   [2021-02-16 12:23:43,525] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7)
   ```
   
   ** Scheduler deployment**
   ```
   ---
   ################################
   ## Airflow Scheduler Deployment/StatefulSet
   #################################
   kind: Deployment
   apiVersion: apps/v1
   metadata:
     name: airflow-scheduler
     namespace: airflow
     labels:
       tier: airflow
       component: scheduler
   spec:
     replicas: 1
     selector:
       matchLabels:
         tier: airflow
         component: scheduler
     template:
       metadata:
         labels:
           tier: airflow
           component: scheduler
         annotations:
           cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
       spec:
         nodeSelector:
           {}
         affinity:
           {}
         tolerations:
           []
         restartPolicy: Always
         terminationGracePeriodSeconds: 10
         serviceAccountName: airflow-scheduler
         securityContext:
           runAsUser: 50000
           fsGroup: 50000
         initContainers:
           - name: run-airflow-migrations
             image: apache/airflow:2.0.0-python3.8
             imagePullPolicy: IfNotPresent
             # Support running against 1.10.x and 2.0.0dev/master
             args: ["bash", "-c", "airflow db upgrade"]
             env:          
               # Dynamically created environment variables
               # Dynamically created secret envs
                         
               # Hard Coded Airflow Envs
               - name: AIRFLOW__CORE__FERNET_KEY
                 valueFrom:
                   secretKeyRef:
                     name: fernet-key
                     key: fernet-key
               - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
                 valueFrom:
                   secretKeyRef:
                     name: airflow-airflow-metadata
                     key: connection
               - name: AIRFLOW_CONN_AIRFLOW_DB
                 valueFrom:
                   secretKeyRef:
                     name: airflow-airflow-metadata
                     key: connection
         containers:
           # Always run the main scheduler container.
           - name: scheduler
             image: apache/airflow:2.0.0-python3.8
             imagePullPolicy: Always
             args: ["bash", "-c", "exec airflow scheduler"]
             env:          
               # Dynamically created environment variables
               # Dynamically created secret envs
                         
               # Hard Coded Airflow Envs
               - name: AIRFLOW__CORE__FERNET_KEY
                 valueFrom:
                   secretKeyRef:
                     name: fernet-key
                     key: fernet-key
               - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
                 valueFrom:
                   secretKeyRef:
                     name: airflow-airflow-metadata
                     key: connection
               - name: AIRFLOW_CONN_AIRFLOW_DB
                 valueFrom:
                   secretKeyRef:
                     name: airflow-airflow-metadata
                     key: connection
               - name: DEPENDENCIES
                 value: "/opt/airflow/dags/repo/dags/dependencies/"
             # If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
             # scheduler and let Kubernetes restart it
             livenessProbe:
               failureThreshold: 10
               periodSeconds: 30
               exec:
                 command:
                   - python
                   - -Wignore
                   - -c
                   - |
                     import os
                     os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                     os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
                     from airflow.jobs.scheduler_job import SchedulerJob
                     from airflow.utils.db import create_session
                     from airflow.utils.net import get_hostname
                     import sys
   
                     with create_session() as session:
                         job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                             SchedulerJob.latest_heartbeat.desc()).limit(1).first()
   
                     sys.exit(0 if job.is_alive() else 1)
             resources:
               {}
             volumeMounts:
               - name: config
                 mountPath: /opt/airflow/pod_templates/pod_template_file.yaml
                 subPath: pod_template_file.yaml
                 readOnly: true
               - name: logs
                 mountPath: "/opt/airflow/logs"
               - name: config
                 mountPath: "/opt/airflow/airflow.cfg"
                 subPath: airflow.cfg
                 readOnly: true
               - name: dags
                 mountPath: /opt/airflow/dags
               - name: logs-conf
                 mountPath: "/opt/airflow/config/log_config.py"
                 subPath: log_config.py
                 readOnly: true
               - name: logs-conf-ini
                 mountPath: "/opt/airflow/config/__init__.py"
                 subPath: __init__.py
                 readOnly: true
           - name: git-sync
             image: "k8s.gcr.io/git-sync:v3.1.6"
             securityContext:
               runAsUser: 65533
             env:
               - name: GIT_SYNC_REV
                 value: "HEAD"
               - name: GIT_SYNC_BRANCH
                 value: "master"
               - name: GIT_SYNC_REPO
                 value:  HIDDEN
               - name: GIT_SYNC_DEPTH
                 value: "1"
               - name: GIT_SYNC_ROOT
                 value: "/git"
               - name: GIT_SYNC_DEST
                 value: "repo"
               - name: GIT_SYNC_ADD_USER
                 value: "true"
               - name: GIT_SYNC_WAIT
                 value: "60"
               - name: GIT_SYNC_MAX_SYNC_FAILURES
                 value: "0"
               - name: GIT_SYNC_USERNAME
                 valueFrom:
                   secretKeyRef:
                     name: 'codecommit-key'
                     key: username
               - name: GIT_SYNC_PASSWORD
                 valueFrom:
                   secretKeyRef:
                     name: 'codecommit-key'
                     key: password
             volumeMounts:
             - name: dags
               mountPath: /git
           # Always start the garbage collector sidecar.
           - name: scheduler-gc
             image: apache/airflow:2.0.0-python3.8
             imagePullPolicy: Always
             args: ["bash", "/clean-logs"]
             volumeMounts:
               - name: logs
                 mountPath: "/opt/airflow/logs"
               - name: logs-conf
                 mountPath: "/opt/airflow/config/log_config.py"
                 subPath: log_config.py
                 readOnly: true
               - name: logs-conf-ini
                 mountPath: "/opt/airflow/config/__init__.py"
                 subPath: __init__.py
                 readOnly: true
         volumes:
           - name: config
             configMap:
               name: airflow-airflow-config
           - name: dags
             emptyDir: {}
           - name: logs
             emptyDir: {}
           - name: logs-conf
             configMap:
               name: airflow-airflow-config
           - name: logs-conf-ini
             configMap:
               name: airflow-airflow-config
   ```
   
   ![image](https://user-images.githubusercontent.com/35636329/108077458-71031500-7064-11eb-9858-b47790b82404.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-779877565


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-813016533


   Interesting can one of you create another GH issue with these details of reproduction using Helm Chart and KinD please @marclamberti @madden1706 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809560041


   Unfortunately not
   ```
   Events:
     Type     Reason     Age                 From               Message
     ----     ------     ----                ----               -------
     Normal   Scheduled  16m                 default-scheduler  Successfully assigned airflow/airflow-scheduler-78997cd45-6j5mw to airflow-cluster-worker
     Normal   Pulled     16m                 kubelet            Container image "apache/airflow:2.0.1" already present on machine
     Normal   Created    16m                 kubelet            Created container wait-for-airflow-migrations
     Normal   Started    16m                 kubelet            Started container wait-for-airflow-migrations
     Normal   Pulled     15m                 kubelet            Container image "apache/airflow:2.0.1" already present on machine
     Normal   Created    15m                 kubelet            Created container scheduler-gc
     Normal   Started    15m                 kubelet            Started container scheduler-gc
     Normal   Killing    11m                 kubelet            Container scheduler failed liveness probe, will be restarted
     Normal   Pulled     11m (x2 over 15m)   kubelet            Container image "apache/airflow:2.0.1" already present on machine
     Normal   Started    11m (x2 over 15m)   kubelet            Started container scheduler
     Normal   Created    11m (x2 over 15m)   kubelet            Created container scheduler
     Warning  Unhealthy  95s (x29 over 15m)  kubelet            Liveness probe failed:
   ```
   
   But, after a lot of trials and errors, those 2 lines:
   ```
                   from airflow.jobs.scheduler_job import SchedulerJob
                   from airflow.utils.db import create_session
   ```
   make the livenessprob failing.
   Right now I have only this:
   ```
                   import os
                   os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                   os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
                   from airflow.jobs.scheduler_job import SchedulerJob
                   from airflow.utils.db import create_session
                   from airflow.utils.net import get_hostname
                   import sys
   
                   sys.exit(0)
   ```
   and still, every 5 mins the scheduler is restarted.
   If I remove those lines, it works as sys.exit(0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809523328


   @marclamberti is there anything useful in the pods events? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202


   I managed to fix my restart by setting up the following configs:
   ```
   [kubernetes]
   ...
   delete_option_kwargs = {"grace_period_seconds": 10}
   enable_tcp_keepalive = True
   tcp_keep_idle = 30
   tcp_keep_intvl = 30
   tcp_keep_cnt = 30
   ```
   
   I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version, I realized the problem is with Azure Kubernetes, the rest api calls to the api server.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] careduz edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
careduz edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784574696


   We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:
   
   **Airflow: Version 1.10.14 & 1.10.13**
   **Kubernetes: Version 1.20.2** (DigitalOcean)
   **Helm airflow-stable/airflow: Version 7.16.0**
   
   ```
   Events:
     Type     Reason     Age                From               Message
     ----     ------     ----               ----               -------
     Normal   Scheduled  27m                default-scheduler  Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
     Normal   Pulled     27m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    27m                kubelet            Created container git-clone
     Normal   Started    27m                kubelet            Started container git-clone
     Normal   Pulled     26m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    26m                kubelet            Created container git-sync
     Normal   Started    26m                kubelet            Started container git-sync
     Normal   Killing    12m (x2 over 19m)  kubelet            Container airflow-scheduler failed liveness probe, will be restarted
     Normal   Pulled     11m (x3 over 26m)  kubelet            Container image "apache/airflow:1.10.14-python3.7" already present on machine
     Normal   Created    11m (x3 over 26m)  kubelet            Created container airflow-scheduler
     Normal   Started    11m (x3 over 26m)  kubelet            Started container airflow-scheduler
     Warning  Unhealthy  6m (x12 over 21m)  kubelet            Liveness probe failed:
   ```
   
   And the logs are basically on a loop:
   ```
   1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
   [2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
   [2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
   [2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
   [2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
   [2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
   [2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
   [2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
   [2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
   [2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
   [2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
   [2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
   [2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
   [2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
   [2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
   [2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
   [2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
   [2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
   [2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
   [2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   ```
   
   EDIT: Tried it on Airflow 1.10.13 and same thing. Updated versions above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marclamberti edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
marclamberti edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-808933078


   @kaxil I get the same issue in local with KinD. Every 5 mins as described in scheduler-deployment.yaml
   ```
             # If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
             # scheduler and let Kubernetes restart it
             livenessProbe:
               failureThreshold: 10
               periodSeconds: 30
               exec:
                 command:
                 - python
                 - -Wignore
                 - -c
                 - |
                   import os
                   os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                   os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
                   from airflow.jobs.scheduler_job import SchedulerJob
                   from airflow.utils.db import create_session
                   from airflow.utils.net import get_hostname
                   import sys
   
                   with create_session() as session:
                       job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                           SchedulerJob.latest_heartbeat.desc()).limit(1).first()
   
                   sys.exit(0 if job.is_alive() else 1)
   ```
   I ran the script in the scheduler pod and it returns True. However, it still continues to fail. I didn't figure out yet why is that.
   Even with the suggestions of @stoiandl it doesn't work 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-802040819


   At least some of these seem to be caused by #11982.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-809663845


   That's the last logs I get before the scheduler gets killed by Kubernetes
   ```
   021-03-29T19:46:05.9176489Z stdout F [2021-03-29 19:46:05,917] {base_executor.py:161} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   2021-03-29T19:46:05.9177509Z stdout F [2021-03-29 19:46:05,917] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   2021-03-29T19:46:05.9180176Z stdout F [2021-03-29 19:46:05,917] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   2021-03-29T19:46:05.9189036Z stdout F [2021-03-29 19:46:05,918] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   2021-03-29T19:46:05.9190065Z stdout F [2021-03-29 19:46:05,918] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   2021-03-29T19:46:05.9370131Z stdout F [2021-03-29 19:46:05,936] {base_job.py:219} DEBUG - [heartbeat]
   2021-03-29T19:46:05.9371407Z stdout F [2021-03-29 19:46:05,936] {scheduler_job.py:1395} DEBUG - Next timed event is in 2.919806
   2021-03-29T19:46:05.9372007Z stdout F [2021-03-29 19:46:05,937] {scheduler_job.py:1397} DEBUG - Ran scheduling loop in 0.05 seconds
   2021-03-29T19:46:06.5618142Z stdout F [2021-03-29 19:46:06,561] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
   2021-03-29T19:46:07.5643446Z stdout F [2021-03-29 19:46:07,563] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 46
   2021-03-29T19:46:07.6988128Z stdout F [2021-03-29 19:46:07,698] {process_utils.py:66} INFO - Process psutil.Process(pid=46, status='terminated', exitcode=0, started='19:41:10') (46) terminated with exit code 0
   2021-03-29T19:46:07.6993846Z stdout F [2021-03-29 19:46:07,699] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 46
   2021-03-29T19:46:07.6994528Z stdout F [2021-03-29 19:46:07,699] {scheduler_job.py:1301} INFO - Exited execute loop
   2021-03-29T19:46:07.7051995Z stdout F [2021-03-29 19:46:07,704] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   2021-03-29T19:46:07.7052466Z stdout F [2021-03-29 19:46:07,705] {settings.py:292} DEBUG - Disposing DB connection pool (PID 7)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marclamberti commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
marclamberti commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-808933078


   @kaxil I get the same issue in local with KinD. Every 5 mins as described in scheduler-deployment.yaml
   ```
             # If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
             # scheduler and let Kubernetes restart it
             livenessProbe:
               failureThreshold: 10
               periodSeconds: 30
               exec:
                 command:
                 - python
                 - -Wignore
                 - -c
                 - |
                   import os
                   os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                   os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'
   
                   from airflow.jobs.scheduler_job import SchedulerJob
                   from airflow.utils.db import create_session
                   from airflow.utils.net import get_hostname
                   import sys
   
                   with create_session() as session:
                       job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                           SchedulerJob.latest_heartbeat.desc()).limit(1).first()
   
                   sys.exit(0 if job.is_alive() else 1)
   ```
   I ran the script in the scheduler pod and it returns True. However, it still continues to fail. I didn't figure out yet why is that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] careduz commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
careduz commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784574696


   We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:
   
   **Airflow: Version 1.10.14**
   **Kubernetes: Version 1.20.2** (DigitalOcean)
   **Helm airflow-stable/airflow: Version 7.16.0**
   
   ```
   Events:
     Type     Reason     Age                From               Message
     ----     ------     ----               ----               -------
     Normal   Scheduled  27m                default-scheduler  Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
     Normal   Pulled     27m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    27m                kubelet            Created container git-clone
     Normal   Started    27m                kubelet            Started container git-clone
     Normal   Pulled     26m                kubelet            Container image "alpine/git:latest" already present on machine
     Normal   Created    26m                kubelet            Created container git-sync
     Normal   Started    26m                kubelet            Started container git-sync
     Normal   Killing    12m (x2 over 19m)  kubelet            Container airflow-scheduler failed liveness probe, will be restarted
     Normal   Pulled     11m (x3 over 26m)  kubelet            Container image "apache/airflow:1.10.14-python3.7" already present on machine
     Normal   Created    11m (x3 over 26m)  kubelet            Created container airflow-scheduler
     Normal   Started    11m (x3 over 26m)  kubelet            Started container airflow-scheduler
     Warning  Unhealthy  6m (x12 over 21m)  kubelet            Liveness probe failed:
   ```
   
   And the logs are basically on a loop:
   ```
   1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
   [2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
   [2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
   [2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
   [2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
   [2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
   [2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
   [2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
   [2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
   [2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
   [2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
   [2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
   [2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
   [2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
   [2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
   [2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   [2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
   [2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
   [2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
   [2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
   [2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
   [2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
   [2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
   [2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
   [2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
   [2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
   [2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
   [2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
   [2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
   [2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
   [2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780603101


   It is not a liveness probe issue. For some reason the server stops the heartbeats. I basically have heartbeats every 5 secs in the logs.
   Started scheduler pod at 14:26:21.
   ```
   [2021-02-17 14:26:21,095] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f19ad429670>]
     ____________       _____________
    ____    |__( )_________  __/__  /________      __
   ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
   ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
    _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   [2021-02-17 14:26:21,615] {executor_loader.py:82} DEBUG - Loading core executor: KubernetesExecutor
   [2021-02-17 14:26:21,682] {scheduler_job.py:1241} INFO - Starting the scheduler
   [2021-02-17 14:26:21,682] {scheduler_job.py:1246} INFO - Processing each file at most -1 times
   [2021-02-17 14:26:21,683] {kubernetes_executor.py:491} INFO - Start Kubernetes executor
   [2021-02-17 14:26:21,683] {kubernetes_executor.py:495} DEBUG - Start with scheduler_job_id: 236
   [2021-02-17 14:26:21,685] {kubernetes_executor.py:235} DEBUG - Creating Kubernetes executor
   [2021-02-17 14:26:21,685] {kubernetes_executor.py:240} DEBUG - Kubernetes using namespace airflow
   [2021-02-17 14:26:21,699] {kubernetes_executor.py:458} DEBUG - Clearing tasks that have not been launched
   [2021-02-17 14:26:21,704] {kubernetes_executor.py:126} INFO - Event: and now my watch begins starting at resource_version: 0
   [2021-02-17 14:26:21,718] {kubernetes_executor.py:462} INFO - When executor started up, found 0 queued task instances
   [2021-02-17 14:26:21,724] {dag_processing.py:250} INFO - Launched DagFileProcessorManager with pid: 34
   [2021-02-17 14:26:21,725] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
   ```
   
   I can see the heartbeat running every 5 sec until 14:31:22. Below is the last 2 heartbeats before the actual scheduler hangs.
   ```
   [2021-02-17 14:31:17,266] {base_job.py:219} DEBUG - [heartbeat]
   [2021-02-17 14:31:17,273] {scheduler_job.py:1390} DEBUG - Next timed event is in 4.999986
   [2021-02-17 14:31:17,274] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.08 seconds
   [2021-02-17 14:31:17,798] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3408)
   [2021-02-17 14:31:17,805] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor481-Process' pid=3408 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:18,305] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:18,309] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:18,309] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:18,309] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:18,310] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:18,310] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:18,310] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:18,321] {scheduler_job.py:1390} DEBUG - Next timed event is in 3.951973
   [2021-02-17 14:31:18,322] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.05 seconds
   [2021-02-17 14:31:18,435] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3415)
   [2021-02-17 14:31:18,444] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor482-Process' pid=3415 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:19,104] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3422)
   [2021-02-17 14:31:19,113] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor483-Process' pid=3422 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:19,351] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:19,354] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:19,354] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:19,354] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:19,355] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:19,355] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:19,355] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:19,367] {scheduler_job.py:1390} DEBUG - Next timed event is in 2.906800
   [2021-02-17 14:31:19,367] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.04 seconds
   [2021-02-17 14:31:19,702] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3429)
   [2021-02-17 14:31:19,710] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor484-Process' pid=3429 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:20,340] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3436)
   [2021-02-17 14:31:20,348] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor485-Process' pid=3436 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:20,408] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:20,412] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:20,412] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:20,413] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:20,413] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:20,413] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:20,413] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:20,414] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:20,425] {scheduler_job.py:1390} DEBUG - Next timed event is in 1.848308
   [2021-02-17 14:31:20,425] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.05 seconds
   [2021-02-17 14:31:20,975] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3443)
   [2021-02-17 14:31:20,983] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor486-Process' pid=3443 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:21,462] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:21,471] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:21,471] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:21,471] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:21,472] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:21,472] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:21,472] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:21,473] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:21,474] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:21,484] {scheduler_job.py:1390} DEBUG - Next timed event is in 0.789546
   [2021-02-17 14:31:21,484] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.06 seconds
   [2021-02-17 14:31:21,569] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3450)
   [2021-02-17 14:31:21,579] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor487-Process' pid=3450 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:21,999] {serialized_dag.py:193} DEBUG - Deleting Serialized DAGs (for which DAG files are deleted) from serialized_dag table 
   [2021-02-17 14:31:22,004] {dag.py:2199} DEBUG - Deactivating DAGs (for which DAG files are deleted) from dag table 
   [2021-02-17 14:31:22,010] {dagcode.py:135} DEBUG - Deleting code from dag_code table 
   [2021-02-17 14:31:22,174] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3457)
   [2021-02-17 14:31:22,183] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor488-Process' pid=3457 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:22,303] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:22,307] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:22,307] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:22,308] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:22,308] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:22,308] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:22,308] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:22,309] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:22,339] {base_job.py:219} DEBUG - [heartbeat]
   [2021-02-17 14:31:22,348] {scheduler_job.py:1390} DEBUG - Next timed event is in 0.502616
   [2021-02-17 14:31:22,348] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.07 seconds
   [2021-02-17 14:31:22,868] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3464)
   [2021-02-17 14:31:22,880] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor489-Process' pid=3464 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:22,886] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:31:22,894] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:31:22,894] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:22,894] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:22,895] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:22,895] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:31:22,896] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:31:22,897] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:31:22,907] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
   [2021-02-17 14:31:23,487] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3471)
   [2021-02-17 14:31:23,494] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor490-Process' pid=3471 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:24,111] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3478)
   [2021-02-17 14:31:24,123] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor491-Process' pid=3478 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:24,747] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3485)
   [2021-02-17 14:31:24,754] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor492-Process' pid=3485 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:25,376] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3492)
   [2021-02-17 14:31:25,385] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor493-Process' pid=3492 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:26,022] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3499)
   [2021-02-17 14:31:26,030] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor494-Process' pid=3499 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:26,608] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3506)
   [2021-02-17 14:31:26,617] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor495-Process' pid=3506 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:27,241] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3513)
   [2021-02-17 14:31:27,249] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor496-Process' pid=3513 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:27,852] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3520)
   [2021-02-17 14:31:27,861] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor497-Process' pid=3520 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:28,454] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3527)
   [2021-02-17 14:31:28,472] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor498-Process' pid=3527 parent=34 stopped exitcode=0>
   [2021-02-17 14:31:29,086] {settings.py:290} DEBUG - Disposing DB connection pool (PID 3534)
   ... many of these disposing db connections
   ```
   
   In the frontend I get:
   
   _The scheduler does not appear to be running. Last heartbeat was received 11 minutes ago.
   The DAGs list may not update, and new tasks will not be scheduled._


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780665486


   After removing the terminated pods, it works without restarts...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-804474746


   I am going to close this since https://github.com/apache/airflow/issues/14261#issuecomment-784619202 fixed it for Azure Kubernetes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #14261:
URL: https://github.com/apache/airflow/issues/14261


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] madden1706 commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
madden1706 commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-801177247


   I am getting this issue on minikube:
   
   > [2021-03-17 14:54:07,522] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-03-17 14:54:07,522] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-03-17 14:54:07,523] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-03-17 14:54:07,523] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-03-17 14:54:07,534] {scheduler_job.py:1390} DEBUG - Next timed event is in 3.959892
   [2021-03-17 14:54:07,534] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 0.04 seconds
   [2021-03-17 14:54:08,525] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
   [2021-03-17 14:54:09,528] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 42
   [2021-03-17 14:54:09,742] {process_utils.py:61} INFO - Process psutil.Process(pid=42, status='terminated', exitcode=0, started='14:49:18') (42) terminated with exit code 0
   [2021-03-17 14:54:09,743] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 42
   [2021-03-17 14:54:09,743] {scheduler_job.py:1296} INFO - Exited execute loop
   [2021-03-17 14:54:09,749] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   [2021-03-17 14:54:09,749] {settings.py:290} DEBUG - Disposing DB connection pool (PID 9)
   
   minikube version: v1.18.1
   
   Restarts seem to be every 5 mins. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202


   I managed to fix my restart by setting up the following configs:
   ```
   [kubernetes]
   ...
   kube_client_request_args = { "_request_timeout": 60 }
   delete_option_kwargs = {"grace_period_seconds": 10}
   enable_tcp_keepalive = True
   tcp_keep_idle = 30
   tcp_keep_intvl = 30
   tcp_keep_cnt = 30
   ```
   
   I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version,
   
   I realized the problem is with Azure Kubernetes, the rest api calls to the api server. Now, I can see in the scheduler logs, some kubernetes connection errors but those are not killers since airflow tries to reconnect and it is successful the second time. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202


   I managed to fix my restart by setting up the following configs:
   ```
   [kubernetes]
   ...
   kube_client_request_args = { "_request_timeout": 60 }
   delete_option_kwargs = {"grace_period_seconds": 10}
   enable_tcp_keepalive = True
   tcp_keep_idle = 30
   tcp_keep_intvl = 30
   tcp_keep_cnt = 30
   ```
   
   I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version,
   
   I realized the problem is with Azure Kubernetes, the rest api calls to the api server. Now, I can see in the scheduler logs, some kubernetes connection errors but those are not killers since airflow tries to reconnect and it is successful the second time. 
   
   Now my logs are full of connection errors:
   ```
   [2021-02-24 00:44:06,188] {kubernetes_executor.py:106} WARNING - There was a timeout error accessing the Kube API. Retrying request.
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 313, in recv_into
       return self.connection.recv_into(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
       self._raise_ssl_error(self._ssl, result)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
       raise WantReadError()
   OpenSSL.SSL.WantReadError
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 436, in _error_catcher
       yield
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 763, in read_chunked
       self._update_chunk_length()
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 693, in _update_chunk_length
       line = self._fp.fp.readline()
     File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
       return self._sock.recv_into(b)
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 326, in recv_into
       raise timeout("The read operation timed out")
   socket.timeout: The read operation timed out
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 102, in run
       self.resource_version = self._run(
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 145, in _run
       for event in list_worker_pods():
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 144, in stream
       for line in iter_resp_lines(resp):
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
       for seg in resp.read_chunked(decode_content=False):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 792, in read_chunked
       self._original_response.close()
     File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
       self.gen.throw(type, value, traceback)
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 441, in _error_catcher
       raise ReadTimeoutError(self._pool, None, "Read timed out.")
   urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out.
   [2021-02-24 00:44:07,330] {kubernetes_executor.py:126} INFO - Event: and now my watch begins starting at resource_version: 0
   [2021-02-24 00:45:07,394] {kubernetes_executor.py:106} WARNING - There was a timeout error accessing the Kube API. Retrying request.
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 313, in recv_into
       return self.connection.recv_into(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
       self._raise_ssl_error(self._ssl, result)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
       raise WantReadError()
   OpenSSL.SSL.WantReadError
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 436, in _error_catcher
       yield
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 763, in read_chunked
       self._update_chunk_length()
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 693, in _update_chunk_length
       line = self._fp.fp.readline()
     File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
       return self._sock.recv_into(b)
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py", line 326, in recv_into
       raise timeout("The read operation timed out")
   socket.timeout: The read operation timed out
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 102, in run
       self.resource_version = self._run(
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 145, in _run
       for event in list_worker_pods():
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 144, in stream
       for line in iter_resp_lines(resp):
     File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
       for seg in resp.read_chunked(decode_content=False):
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 792, in read_chunked
       self._original_response.close()
     File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
       self.gen.throw(type, value, traceback)
     File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 441, in _error_catcher
       raise ReadTimeoutError(self._pool, None, "Read timed out.")
   urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed out.
   ```
   
   It is not the prettiest solution but it works.... Definitely needs to be addresses somehow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780616603


   15 mins later the scheduler comes back up:
   ```
   [2021-02-17 14:47:02,093] {connectionpool.py:751} WARNING - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(110, 'ETIMEDOUT')"))': /api/v1/namespaces/airflow/pods?fieldSelector=status.phase%3DSucceeded&labelSelector=kubernetes_executor%3DTrue
   ```
   
   Returns a bunch of pods config jsons in debug mode. Then:
   ```
   2021-02-17 14:47:03,060] {scheduler_job.py:1390} DEBUG - Next timed event is in 4.999955
   [2021-02-17 14:47:03,060] {scheduler_job.py:1392} DEBUG - Ran scheduling loop in 940.21 seconds
   [2021-02-17 14:47:04,090] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
   [2021-02-17 14:47:04,096] {base_executor.py:147} DEBUG - 0 running task instances
   [2021-02-17 14:47:04,096] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:47:04,096] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:47:04,096] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:47:04,096] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
   [2021-02-17 14:47:04,096] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
   [2021-02-17 14:47:04,098] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,099] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,100] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,101] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,102] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,103] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,104] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,105] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,106] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,107] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,108] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,109] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,110] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,111] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,112] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,113] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,114] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,115] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,116] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,117] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,118] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,119] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,120] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,121] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,122] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,123] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,124] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,125] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,126] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,127] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,128] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,129] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,130] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,131] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,132] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,133] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,134] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor676-Process' pid=4773 parent=34 stopped exitcode=0>
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,137] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,138] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,139] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,140] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,141] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,142] {serialized_dag.py:193} DEBUG - Deleting Serialized DAGs (for which DAG files are deleted) from serialized_dag table 
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,143] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,144] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,145] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,146] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,147] {dag.py:2199} DEBUG - Deactivating DAGs (for which DAG files are deleted) from dag table 
   [2021-02-17 14:47:04,147] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,148] {dag_processing.py:383} DEBUG - Received message of type DagParsingStat
   [2021-02-17 14:47:04,152] {dagcode.py:135} DEBUG - Deleting code from dag_code table 
   [2021-02-17 14:47:04,174] {base_job.py:219} DEBUG - [heartbeat]
   ```
   
   This might be because I have many terminated worker pods that are not deleted? They come from either a deleted DAG or previous airflow 1.10.13.
   
   I cleared my namespace of pods and now I will rerun with liveness probe up. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-815210940


   @marclamberti, @madden1706, I believe the livenessprobe `timeoutSeconds` and kube v1.20 is the culprit here. I've open #15259 to address it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] norfe commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
norfe commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-782901575


   I'm seeing a similar issue when trying to run airflow on minikube.
   
   ```
   $ minikube version
   minikube version: v1.17.1
   
   $ kubectl version
   Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
   Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
   ```
   
   I can reproduce it always with:
   
   ```
   $ minikube delete
   $ minikube start
   $ curl -OL https://github.com/apache/airflow/archive/master.zip
   $ unzip master.zip
   $ helm dep update airflow-master/chart/
   $ helm install airflow ./airflow-master/chart
   ```
   After running for a while, scheduler restarts periodically:
   
   ```
   $ kubectl get pods
   NAME                                 READY   STATUS    RESTARTS   AGE
   airflow-postgresql-0                 1/1     Running   0          17m
   airflow-scheduler-5567f545c8-qv7cg   2/2     Running   3          17m
   airflow-statsd-5556dc96bc-twbz9      1/1     Running   0          17m
   airflow-webserver-65cc966d7c-68wnv   1/1     Running   0          17m
   ```
   
   ```
   $ kubectl describe pod airflow-scheduler-5567f545c8-qv7cg
   Events:
     Type     Reason     Age                   From               Message
     ----     ------     ----                  ----               -------
     Normal   Scheduled  18m                   default-scheduler  Successfully assigned default/airflow-scheduler-5567f545c8-qv7cg to minikube
     Normal   Pulling    18m                   kubelet            Pulling image "apache/airflow:2.0.0"
     Normal   Pulled     17m                   kubelet            Successfully pulled image "apache/airflow:2.0.0" in 42.1015082s
     Normal   Created    17m                   kubelet            Created container wait-for-airflow-migrations
     Normal   Started    17m                   kubelet            Started container wait-for-airflow-migrations
     Normal   Started    16m                   kubelet            Started container scheduler-gc
     Normal   Pulled     16m                   kubelet            Container image "apache/airflow:2.0.0" already present on machine
     Normal   Created    16m                   kubelet            Created container scheduler-gc
     Normal   Killing    11m                   kubelet            Container scheduler failed liveness probe, will be restarted
     Normal   Pulled     11m (x2 over 16m)     kubelet            Container image "apache/airflow:2.0.0" already present on machine
     Normal   Started    11m (x2 over 16m)     kubelet            Started container scheduler
     Normal   Created    11m (x2 over 16m)     kubelet            Created container scheduler
     Warning  Unhealthy  3m20s (x27 over 16m)  kubelet            Liveness probe failed:
   ```
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] careduz edited a comment on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
careduz edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-785145637


   Neat, thanks for the info @stoiandl - will give those settings a shot when we upgrade Airflow (TTL seems to be new for 2.0). Will share our experience afterwards.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780571043


   What I am not sure is if I should remove the liveness probe for now. Is it an issue with the scheduler in the end since I do not see any errors in the log?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lewismc commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
lewismc commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-786154745


   We are experiencing this when deploying the chart into a local installations of k3d
   ```
   % kubectl get pods -n airflow-test-local            
   NAME                 READY   STATUS       RESTARTS  AGE
   airflow-statsd-5556dc96bc-w28cz   1/1    Running      0     7m29s
   airflow-postgresql-0         1/1    Running      0     7m29s
   airflow-webserver-7d5fbc5675-x6dc7  1/1    Running      0     7m29s
   airflow-scheduler-7f59d9c69c-5v9pl  2/3    CrashLoopBackOff  7     7m29s
   airflow-cleanup-1614276000-xbcmz   0/1    Completed     0     39s
   airflow-scheduler-7f59d9c69c-cvzvx  2/3    CrashLoopBackOff  7     7m29s
   ```
   We also found some interesting `WARNING`'s when looking into the `wait-for-airflow-migrations` container...
   ```
   % kubectl logs airflow-webserver-7d5fbc5675-x6dc7 -c wait-for-airflow-migrations -n airflow-test-local
   BACKEND=postgresql
   DB_HOST=airflow-postgresql.airflow-test-local.svc.cluster.local
   DB_PORT=5432
   ....
   [2021-02-25 17:53:43,435] {migration.py:163} INFO - Context impl PostgresqlImpl.
   [2021-02-25 17:53:43,436] {migration.py:170} INFO - Will assume transactional DDL.
   [2021-02-25 17:53:49,416] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
   [2021-02-25 17:53:50,300] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
   [2021-02-25 17:53:51,345] {<string>:35} INFO - Waiting for migrations... 1 second(s)
   [2021-02-25 17:53:52,349] {<string>:35} INFO - Waiting for migrations... 2 second(s)
   [2021-02-25 17:53:53,352] {<string>:35} INFO - Waiting for migrations... 3 second(s)
   [2021-02-25 17:53:54,355] {<string>:35} INFO - Waiting for migrations... 4 second(s)
   [2021-02-25 17:53:55,358] {<string>:35} INFO - Waiting for migrations... 5 second(s)
   echiu@MT-308022 chart %
   ```
   I don't think that Azure hooks should be interpreted by default... 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stoiandl commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-782423867


   Few days later the issue appeared again...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] careduz commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
careduz commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-785145637


   Neat, thanks for the info @stoiandl - will give those settings a shot when we upgrade to Airflow (TTL seems to be new for 2.0). Will share our experience afterwards.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] madden1706 commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)

Posted by GitBox <gi...@apache.org>.
madden1706 commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-810093024


   I can confirm the observations that @marclamberti has about the two import lines causing the probe to fail. 
   
   ```
   from airflow.jobs.scheduler_job import SchedulerJob
   from airflow.utils.db import create_session
   ```
   
   I assumed this was local to me after the issue was closed. Either of these lines causes the liveness probe to fail.
   
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org