You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "lexmiln (via GitHub)" <gi...@apache.org> on 2023/02/07 18:04:51 UTC

[GitHub] [airflow] lexmiln commented on issue #13637: Scheduler takes 100% of CPU without task execution

lexmiln commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1421227055

   We also saw this 100% CPU issue in our kubernetes cluster.
   
   We later observed that liveness checks on the scheduler pod were consistently timing out.
   
   Manually running `time airflow jobs check` on the scheduler container (as the liveness probe does) showed that this command takes about a minute to run to completion with our configuration (500 mCPUs, 2GB RAM).
   
   Given this, we increased the scheduler liveness probe timeout and interval. 
   
   ```
     scheduler:
       # The liveness probe takes a while to run on our cluster due to limited
       # resources, so we run it only very occasionally, and we give it lots of
       # time to complete.
       livenessProbe:
         initialDelaySeconds: 120
         timeoutSeconds: 180
         failureThreshold: 5
         periodSeconds: 600
         command: ~
   ```
   
   Since we made this change, the pod averages around 250mCPU utilisation (ie. 50% of its limit). A possible explanation for the CPU saturation that many are seeing is that the container is perpetually tied up trying to complete execution of the liveness probe in too short a window.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org