You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/11/17 11:46:20 UTC

[GitHub] [airflow] potiuk commented on issue #24731: Celery Executor : After killing Redis or Airflow Worker Pod, queued Tasks not getting executed even after pod is up.

potiuk commented on issue #24731:
URL: https://github.com/apache/airflow/issues/24731#issuecomment-1318516581

   Thanks for the diagnosis, but I think you applied one of "good" solutions and there is not much we can and will do in Airlfow for that.
   
   I think what you did is the right approach (one of) not a workaround. This is expected. Airflow has no support for active/active setup for Redis or postgres and expects to talk to one database server only. There is no way for airflow components to recover when there is an established connection and an IP address of the componnent it talks to change in the way that Airlfow does not even know that the other party has changed the address. this is really a deployment issue, I think airflow should not really take into account such changes. 
   
   Airflow is not a "critical/real-time" service that should react and reconfigure it's networking dynamically and we have no intention to turn it into such service. Developing such 'autohealing" service is far more costly and unless someone comes up with idea, and create Airflow Improvement Proposal and implement such auto-healing, this is not something that is going to happen. There are many consequences and complexities to implement such services and there is no need to do so for Airlfow because this is perfectly fine to restart and redeploy airflow components from time to time and this is OK - far easier and less costly for development and maintenance. 
   
   This task is put on the deployment - that's why for example in our helm chart we have liveness probes and healthy checks and auto-healing in K8S is done exactly the way you did - when service becomes unhealthy, you restart it. This is perfectly ok and perfectly viable solution - especially when things like virtual IP changes which happen infrequently.
   
   Even better solution for you will be to react on the event of IP changes and restart the services immediately. This the kind of things that usually should and can be done on the deployment level - Airlfow has no knowledge about such events and cannot react to it - but your deployment can. And should. This will help you to recover much faster. 
   
   Another option - if you want to avoid such restarts - will be to avoid changing the Virtual IP and use static IP addresses allocated to each component. Usually changing virtual IP addresses is not something that happens in enterprise setup - it is safe to assume that you can come up with the approach that IP addresses are static - even if you have some dynamically changing Public IP addresses, you can usually have static private ones and you can configure your deployment to use them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org