You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/03 22:56:54 UTC

[GitHub] [airflow] MatrixManAtYrService opened a new issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

MatrixManAtYrService opened a new issue #20644:
URL: https://github.com/apache/airflow/issues/20644


   ### Official Helm Chart version
   
   1.3.0 (latest released)
   
   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### Kubernetes Version
   
   version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.4-3+adc4115d990346", GitCommit:"adc4115d990346b87714cc4f033d225711bf744d", GitTreeState:"clean", BuildDate:"2021-11-17T22:03:17Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
   
   ### Helm Chart configuration
   
   Nothing special going on, deployed like this:
   ```
   helm repo add apache-airflow https://airflow.apache.org
   helm repo update
   cat <<- 'EOF' | helm install airflow \
                       --namespace tmp \
                       apache-airflow/airflow  \
                       -f -
   defaultAirflowRepository: apache/airflow
   defaultAirflowTag: 2.2.3
   EOF
   ```
   
   ### Docker Image customisations
   
   No customizations, just using `apache/airflow:2.2.3`
   
   ### What happened
   
   
   I deploy airflow and just leave it alone, and every 5 minutes, I get events like this one:
   ```
   LAST SEEN   TYPE      REASON      OBJECT                                   MESSAGE
   51s         Normal    Killing     pod/airflow-scheduler-84b9f855b8-2b9cj   Container scheduler failed liveness probe, will be restarted
   ```
   
   Same story for the triggerer.  The weird thing is that when I run the liveness probe code manually, it returns 0 every time (see the gist for an example of this), so I'm not sure why k8s is getting the idea
   
   ### What you expected to happen
   
   Maybe a restart or two as part of the deployment process, but after that I expected there to be no periodic resets.
   
   ### How to reproduce
   
   I made a gist with a script that replicates this, and also its output: https://gist.github.com/MatrixManAtYrService/97179963d5b543218d704a4594ca2720
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699


   One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
   ```
   ❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c "cat - > ./liveness.py ; echo $?"
   0
   kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c   0.55s user 0.14s system 111% cpu 0.623 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699


   One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
   ```
   $ time bash -c '
   kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c "cat - > ./liveness.py ; echo $?"
   '
       0
       bash -c   0.60s user 0.15s system 109% cpu 0.678 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1013619155


   Maybe I shouldn't have settled for my workaround.  This looks like it fixes it: https://github.com/apache/airflow/pull/20833/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699


   One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
   ```
   ❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 'cat - > ./liveness.py ; echo $?'
   0
   kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c   0.55s user 0.14s system 111% cpu 0.623 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004437053


   I switched to using this values.yml
   ```
   defaultAirflowRepository: apache/airflow
   defaultAirflowTag: 2.2.3
   scheduler:
     livenessProbe:
       timeoutSeconds: 30
   ```
   It appears to be showing up:
   ```
    ❯ kubectl get pod/airflow-scheduler-7f58748c46-dfftd -n tmp -ojsonpath='{.spec.containers[0].livenessProbe.timeoutSeconds}'
   
       30
   ```
   I still see the unhealthy warning, but no restarts yet.  I'm going to leave it alone for a while and see what happens.  Either way, the timeout point was what I needed to see, thanks @Jed Cunningham


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004513324


   4 hours later, and no restarts.  Meanwhile, the triggerer (whose timeoutSeconds I didn't touch) restarted 44 times.  This workaround is as good as a fix for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004437053


   I switched to using this values.yml
   ```
   defaultAirflowRepository: apache/airflow
   defaultAirflowTag: 2.2.3
   scheduler:
     livenessProbe:
       timeoutSeconds: 30
   ```
   It appears to be showing up:
   ```
    ❯ kubectl get pod/airflow-scheduler-7f58748c46-dfftd -n tmp -ojsonpath='{.spec.containers[0].livenessProbe.timeoutSeconds}'
   
       30
   ```
   I still see the unhealthy warning, but no restarts yet.  I'm going to leave it alone for a while and see what happens.  Either way, the timeout point was what I needed to see, thanks @jedcunningham 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004421410


   Could it be that 14 seconds is too long?
   
   ```
   ❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c '/entrypoint python -Wignore ./liveness.py ; echo $?'
   
       0
       kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c   0.78s user 0.15s system 6% cpu 14.140 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService removed a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService removed a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699


   One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
   ```
   ❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 'cat - > ./liveness.py ; echo $?'
   0
   kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c   0.55s user 0.14s system 111% cpu 0.623 total
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004425363


   Yes, the default timeout is 10 seconds:
   https://github.com/apache/airflow/blob/6dfc939833fd3dc477b3971d965ba142d3b8bd77/chart/values.yaml#L512


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatrixManAtYrService closed issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes

Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService closed issue #20644:
URL: https://github.com/apache/airflow/issues/20644


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org