You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/05 23:01:58 UTC

[GitHub] [airflow] jedcunningham opened a new pull request, #25561: Add liveness probe to Celery workers

jedcunningham opened a new pull request, #25561:
URL: https://github.com/apache/airflow/pull/25561

   This adds a liveness probe to our workers, to help guard against the worker being "up" but not communicating with Celery.
   
   Might help with #24731, though it'll be a pretty blunt solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25561:
URL: https://github.com/apache/airflow/pull/25561#issuecomment-1385169617

   > @jedcunningham, I have enabled health checks for workers as workers not processing any messages when redis and workers communication broken. After enabling the liveness checks ended up with High memory utilization for worker pods. I have disabled the liveness checks and memory utilization fine. Could you please help on this issue.
   > 
   > The liveness checks are causing memory leak.
   
   I believe this is the issue with K8S livenessprobe https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/778 - you can update K8S to latest version and check that the CSI livenessprobe is of the right version https://github.com/kubernetes-csi/livenessprobe/pull/94 
   
   Generally upgrading whatever K8S you are usiung to latest version is highly recommended.
   
   Please double-check that @anu251989 and in case you observe the same issue with latest version of K8S, report it please as a new issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #25561:
URL: https://github.com/apache/airflow/pull/25561#issuecomment-1211092499

   > @pingzh, very good call. Do you know of a better probe to use when it's disabled?
   > 
   > I'm tempted to just add an `enabled` flag around this feature so it can just be turned off. What do you think about that?
   
   I am not aware of other better probe methods. For us, we turn off `worker_enable_remote_control`, it is due to that we use SQS as the message broker, which `worker_enable_remote_control` it creates lots of pidbox queues. It should be ok for other cases.
   
   
   I like the idea of adding an `enabled` flag`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] anu251989 commented on pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
anu251989 commented on PR #25561:
URL: https://github.com/apache/airflow/pull/25561#issuecomment-1372019198

   @jedcunningham, I have enabled health checks for workers as workers not processing any messages when redis and workers communication broken.
   After enabling the liveness checks ended up with High memory utilization for worker pods. I have disabled the liveness checks and memory utilization fine. Could you please help on this issue.
   
   The liveness checks are causing memory leak. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25561:
URL: https://github.com/apache/airflow/pull/25561


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25561:
URL: https://github.com/apache/airflow/pull/25561#issuecomment-1228965562

   > Is it worth adding a note somewhere about not enabling this with SQS?
   
   SQS is not officially supported by Airflow. We disucssed it, but Amazon team experience is that it has many more quirks and the level of support in Celery is definitely not on par with Redis/RabbitMQ so we should refrain from even stating that SQS can be used in Airflow https://github.com/apache/airflow/pull/24019. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on pull request #25561: Add liveness probe to Celery workers

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on PR #25561:
URL: https://github.com/apache/airflow/pull/25561#issuecomment-1209622682

   @pingzh, very good call. Do you know of a better probe to use when it's disabled?
   
   I'm tempted to just add an `enabled` flag around this feature so it can just be turned off. What do you think about that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org