You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/26 22:14:19 UTC

[GitHub] [airflow] natemoseman commented on issue #8907: Airflow web UI is slow

natemoseman commented on issue #8907:
URL: https://github.com/apache/airflow/issues/8907#issuecomment-681151249


   I had a similar issue with airflow running on kubernetes cluster that I was, fortunately, able to solve. 
   
   Regular HTTP connections were taking a minimal of 5 seconds to complete.  Even using curl to fetch static content, like a CSS file, was taking 5 seconds.   When looking at the logs of the airflow web process it didn't show anything.  Although following them 'realtime' showed that the GET was showing up in the logs with the same 5 second delays as the curl command was taking.
   
   by-passing everything and running curl directly from the container, and thus by-passing all the kubernetes networking stuff, was still having the delays. 
   
   It took me a few days to realize what was going on with my setup.
   
   As it turned out it was due to the 'type: LoadBalancer' service I was using to expose the airflow webserver to outside the cluster.  The loadbalancer was a external network load balancer that connected to the service via NodePort on each virtual machine in the node.   For whatever reason this meant that there was a large number of connections just kept open to the webserver at any time.   
   
   In a 20 node cluster this meant 20 connections.  
   
   So when I killed the LoadBalancer service and started using nginx-ingress instead then the problem instantly resolved itself.   No more delays.  Admin web UI went back to normal.
   
   I am not exactly sure what was going on here. But I suspect that having a large number of connections always open was causing gunicorn process to delay routing new connections to the pool of webserver worker processes.   I was only using 4 processes at the time. 
   
   So if you are seeing these strange 5 second delays then use netstat or similar tool to count the number of "ESTABLISHED" connections to the webserver process.   If you have a lot of connections and you are using service 'type: LoadBalancer' then try switching to using a ingress controller.   Also increasing the number of worker processes to exceed the number of established connections will probably work too.
   
   Hope that helps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org