You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/14 17:22:30 UTC

[GitHub] [airflow] potiuk opened a new issue #14782: The scale-in settings are sligthly too aggressive I think

potiuk opened a new issue #14782:
URL: https://github.com/apache/airflow/issues/14782


   Hey @ashb -  following your request of observing the self-hosted behavior.
   
   I believe the current scale-in settings are slightly too aggressive I believe - especially when the traffic is low (weekends).
   
   I pushed a number of builds here (and @turbaszek as well):
   
   https://github.com/apache/airflow/actions/workflows/build-images-workflow-run.yml
   
   And quite a number a lot of them failed without logs indicating the scale-in event happened. some of them with 'git' failed, some of them with explicitly `lost communication`
   
   Some example here: 
   
   * https://github.com/apache/airflow/actions/runs/651685483
   * https://github.com/apache/airflow/actions/runs/651569791
   * https://github.com/apache/airflow/actions/runs/651571502
   * https://github.com/apache/airflow/actions/runs/651550722
   * https://github.com/apache/airflow/actions/runs/651642721
   * https://github.com/apache/airflow/actions/runs/651688017
   
   (maybe some of those were cancelled as duplicates - but at most 1 or 2)
   
   At the same time a number of those jobs succeeded, so I think the scale-in events are the ones to blame.
   
   The previous setting was much more stable (but more costly as well) - however I think I will merge the #14531 which should sigificantly decrease the time needed from the runners so hopefully we will be able to tune up the scale-in settings so that they are more stable. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14782: The scale-in settings are sligthly too aggressive I think

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14782:
URL: https://github.com/apache/airflow/issues/14782#issuecomment-808269754


   Just noticed this happening a lot today, and (thanks to the extra logs) I am testing a change to when we mark a node as OkayToTerminate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14782: The scale-in settings are sligthly too aggressive I think

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14782:
URL: https://github.com/apache/airflow/issues/14782#issuecomment-808395294


   Let me know if you notice any more instances now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #14782: The scale-in settings are sligthly too aggressive I think

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #14782:
URL: https://github.com/apache/airflow/issues/14782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14782: The scale-in settings are sligthly too aggressive I think

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14782:
URL: https://github.com/apache/airflow/issues/14782#issuecomment-798945362


   Failed without logs isn't scale-in triggered, but the "lost communication" is.
   
   I'll dig in to logs on Monday and see if I can see what went on


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14782: The scale-in settings are sligthly too aggressive I think

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14782:
URL: https://github.com/apache/airflow/issues/14782#issuecomment-808764799


   closing as it's been already updated several times since and is better I belive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org