You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/03 22:56:54 UTC
[GitHub] [airflow] MatrixManAtYrService opened a new issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
MatrixManAtYrService opened a new issue #20644:
URL: https://github.com/apache/airflow/issues/20644
### Official Helm Chart version
1.3.0 (latest released)
### Apache Airflow version
2.2.3 (latest released)
### Kubernetes Version
version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.4-3+adc4115d990346", GitCommit:"adc4115d990346b87714cc4f033d225711bf744d", GitTreeState:"clean", BuildDate:"2021-11-17T22:03:17Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
### Helm Chart configuration
Nothing special going on, deployed like this:
```
helm repo add apache-airflow https://airflow.apache.org
helm repo update
cat <<- 'EOF' | helm install airflow \
--namespace tmp \
apache-airflow/airflow \
-f -
defaultAirflowRepository: apache/airflow
defaultAirflowTag: 2.2.3
EOF
```
### Docker Image customisations
No customizations, just using `apache/airflow:2.2.3`
### What happened
I deploy airflow and just leave it alone, and every 5 minutes, I get events like this one:
```
LAST SEEN TYPE REASON OBJECT MESSAGE
51s Normal Killing pod/airflow-scheduler-84b9f855b8-2b9cj Container scheduler failed liveness probe, will be restarted
```
Same story for the triggerer. The weird thing is that when I run the liveness probe code manually, it returns 0 every time (see the gist for an example of this), so I'm not sure why k8s is getting the idea
### What you expected to happen
Maybe a restart or two as part of the deployment process, but after that I expected there to be no periodic resets.
### How to reproduce
I made a gist with a script that replicates this, and also its output: https://gist.github.com/MatrixManAtYrService/97179963d5b543218d704a4594ca2720
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699
One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
```
❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c "cat - > ./liveness.py ; echo $?"
0
kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 0.55s user 0.14s system 111% cpu 0.623 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699
One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
```
$ time bash -c '
kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c "cat - > ./liveness.py ; echo $?"
'
0
bash -c 0.60s user 0.15s system 109% cpu 0.678 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1013619155
Maybe I shouldn't have settled for my workaround. This looks like it fixes it: https://github.com/apache/airflow/pull/20833/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699
One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
```
❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 'cat - > ./liveness.py ; echo $?'
0
kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 0.55s user 0.14s system 111% cpu 0.623 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004437053
I switched to using this values.yml
```
defaultAirflowRepository: apache/airflow
defaultAirflowTag: 2.2.3
scheduler:
livenessProbe:
timeoutSeconds: 30
```
It appears to be showing up:
```
❯ kubectl get pod/airflow-scheduler-7f58748c46-dfftd -n tmp -ojsonpath='{.spec.containers[0].livenessProbe.timeoutSeconds}'
30
```
I still see the unhealthy warning, but no restarts yet. I'm going to leave it alone for a while and see what happens. Either way, the timeout point was what I needed to see, thanks @Jed Cunningham
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004513324
4 hours later, and no restarts. Meanwhile, the triggerer (whose timeoutSeconds I didn't touch) restarted 44 times. This workaround is as good as a fix for me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService edited a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService edited a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004437053
I switched to using this values.yml
```
defaultAirflowRepository: apache/airflow
defaultAirflowTag: 2.2.3
scheduler:
livenessProbe:
timeoutSeconds: 30
```
It appears to be showing up:
```
❯ kubectl get pod/airflow-scheduler-7f58748c46-dfftd -n tmp -ojsonpath='{.spec.containers[0].livenessProbe.timeoutSeconds}'
30
```
I still see the unhealthy warning, but no restarts yet. I'm going to leave it alone for a while and see what happens. Either way, the timeout point was what I needed to see, thanks @jedcunningham
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004421410
Could it be that 14 seconds is too long?
```
❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c '/entrypoint python -Wignore ./liveness.py ; echo $?'
0
kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 0.78s user 0.15s system 6% cpu 14.140 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService removed a comment on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService removed a comment on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004417699
One thought was that maybe the probe is taking too long to respond, but at least when I run it via `kubectl` it returns within a second:
```
❯ time kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 'cat - > ./liveness.py ; echo $?'
0
kubectl -n tmp exec $SCHEDULER_POD -c scheduler -- sh -c 0.55s user 0.14s system 111% cpu 0.623 total
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #20644:
URL: https://github.com/apache/airflow/issues/20644#issuecomment-1004425363
Yes, the default timeout is 10 seconds:
https://github.com/apache/airflow/blob/6dfc939833fd3dc477b3971d965ba142d3b8bd77/chart/values.yaml#L512
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatrixManAtYrService closed issue #20644: Liveness probe fails, causing scheduler and triggerer restarts every 5 minutes
Posted by GitBox <gi...@apache.org>.
MatrixManAtYrService closed issue #20644:
URL: https://github.com/apache/airflow/issues/20644
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org