You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "hakuno (via GitHub)" <gi...@apache.org> on 2023/10/23 14:27:54 UTC

[PR] fix(helm): set worker safeToEvict properly [airflow]

hakuno opened a new pull request, #35130:
URL: https://github.com/apache/airflow/pull/35130

   It improves (or fixes) the `workers.safeToEvict` usage.
   
   Kubernetes administrators/users agree that `cluster-autoscaler.kubernetes.io/safe-to-evict` can be managed to avoid the Autoscaler to kill the worker pod when scaling nodes down.
   
   > When this annotation is set to "true", the cluster autoscaler is allowed to evict a Pod even if other rules would normally prevent that. The cluster autoscaler never evicts Pods that have this annotation explicitly set to "false"; you could set that on an important Pod that you want to keep running. If this annotation is not set then the cluster autoscaler follows its Pod-level behavior.
   
   So, if I set `workers.safeToEvict` to `false`, it gets nothing. The Autoscaler will kill that still. Because the Helm chart has no effect on handling that. See:
   
   ```
           {{- if .Values.workers.safeToEvict }}
           cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
           {{- end }}
   ```
   
   It's like a useless piece of statement.
   
   What about if I need to set that to false? I couldn't. Anybody is unable to.
   
   A probaly **workaround** would set up `workers.podAnnotations` with map of _annotations_ and `workers.safeToEvict` falsely together.
   
   Please, check it out carefully. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "hakuno (via GitHub)" <gi...@apache.org>.
hakuno commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1778325712

   Ps.¹ Other Airflow components also has `safeToEvict` values.
   
   Ps.² At another moment, I see I can contribute with other PRs. Because `workers.podAnnotations` and `airflowPodAnnotations` can fight each other whenever it has a set of annotations in common.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1777228011

   > @jedcunningham Thanks for that. Unfortunately, I've had some difficult to setup this test environment.
   > 
   > Thoughts?
   
   If you do not write what kind of difficulties you had, It's hard to have more thoughts. But I recommend to follow the steps in https://github.com/apache/airflow/blob/main/CONTRIBUTORS_QUICK_START.rst to setup breeze and then the docs pointed at by @jedcunningham provide a step-by-step (almost wizard-like) approach to make it works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal merged PR #35130:
URL: https://github.com/apache/airflow/pull/35130


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "hakuno (via GitHub)" <gi...@apache.org>.
hakuno commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1777184228

   @jedcunningham Thanks for that. Unfortunately, I've had some difficult to setup this test environment.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "jedcunningham (via GitHub)" <gi...@apache.org>.
jedcunningham commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1776502995

   @hakuno, Hussein meant a test in our test suit to make sure this doesn't regress. [Here is an example](https://github.com/apache/airflow/blob/f457228bd21b13a7fdb29e154202d5357d024050/helm_tests/airflow_core/test_worker.py#L50-L65), and [some details on our helm chart tests](https://github.com/apache/airflow/blob/main/TESTING.rst#helm-unit-tests).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "hakuno (via GitHub)" <gi...@apache.org>.
hakuno commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1776255255

   Thank you so, @hussein-awala 
   
   I tested the Helm Chart itself with `--validate` on a Kubernetes cluster to verify the generated manifests.
   
   ```
   $ helm template --dry-run --validate --set workers.safeToEvict=false .
   ```
   
   We got
   
   ```
   # Source: airflow/templates/workers/worker-deployment.yaml
   ################################
   ## Airflow Worker Deployment
   #################################
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     name: release-name-worker
     labels:
       tier: airflow
       component: worker
       release: release-name
       chart: "airflow-1.12.0-dev"
       heritage: Helm
   spec:
     serviceName: release-name-worker
     replicas: 1
     selector:
       matchLabels:
         tier: airflow
         component: worker
         release: release-name
     template:
       metadata:
         labels:
           tier: airflow
           component: worker
           release: release-name
         annotations:
           checksum/metadata-secret: 8b8ce685079b3075a4b91c47e267db7b50cd8bfda2269dd36fef1e258a3a38eb
           checksum/result-backend-secret: 98a68f230007cfa8f5d3792e1aff843a76b0686409e4a46ab2f092f6865a1b71
           checksum/pgbouncer-config-secret: 1dae2adc757473469686d37449d076b0c82404f61413b58ae68b3c5e99527688
           checksum/webserver-secret-key: 1b48a02846657f40ac54dc50b230572e67e9f4a88e9a486e63bd25209173bfc4
           checksum/kerberos-keytab: 80979996aa3c1f48c95dfbe9bb27191e71f12442a08c0ed834413da9d430fd0e
           checksum/airflow-config: 520107ea8f2b203f84baed2ba60545a7afc8fc9a9f6978aa7cd45c4dba7c8dbe
           checksum/extra-configmaps: e862ea47e13e634cf17d476323784fa27dac20015550c230953b526182f5cac8
           checksum/extra-secrets: e9582fdd622296c976cbc10a5ba7d6702c28a24fe80795ea5b84ba443a56c827
           cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
   ...
   ```
   
   And
   
   ```
   $ helm template --dry-run --validate --set workers.safeToEvict=true .
   ```
   
   We got
   
   ```
   # Source: airflow/templates/workers/worker-deployment.yaml
   ################################
   ## Airflow Worker Deployment
   #################################
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     name: release-name-worker
     labels:
       tier: airflow
       component: worker
       release: release-name
       chart: "airflow-1.12.0-dev"
       heritage: Helm
   spec:
     serviceName: release-name-worker
     replicas: 1
     selector:
       matchLabels:
         tier: airflow
         component: worker
         release: release-name
     template:
       metadata:
         labels:
           tier: airflow
           component: worker
           release: release-name
         annotations:
           checksum/metadata-secret: 8b8ce685079b3075a4b91c47e267db7b50cd8bfda2269dd36fef1e258a3a38eb
           checksum/result-backend-secret: 98a68f230007cfa8f5d3792e1aff843a76b0686409e4a46ab2f092f6865a1b71
           checksum/pgbouncer-config-secret: 1dae2adc757473469686d37449d076b0c82404f61413b58ae68b3c5e99527688
           checksum/webserver-secret-key: c20aeee646c14a8ab31991c17de268a577d34f05d4a1ae045725bf974f999c69
           checksum/kerberos-keytab: 80979996aa3c1f48c95dfbe9bb27191e71f12442a08c0ed834413da9d430fd0e
           checksum/airflow-config: 520107ea8f2b203f84baed2ba60545a7afc8fc9a9f6978aa7cd45c4dba7c8dbe
           checksum/extra-configmaps: e862ea47e13e634cf17d476323784fa27dac20015550c230953b526182f5cac8
           checksum/extra-secrets: e9582fdd622296c976cbc10a5ba7d6702c28a24fe80795ea5b84ba443a56c827
           cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
   ...
   ```
   
   And default values is same than `workers.safeToEvict=true` as expected.
   
   Please, tell me if you know more ways to check it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1775330488

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   - Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "hakuno (via GitHub)" <gi...@apache.org>.
hakuno commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1777258718

   Thanks, @potiuk - I meant the environment setup itself. My Pytest couldn't discovery tests etc.
   
   I'll read that docs as soon as possible... thanks again :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "hakuno (via GitHub)" <gi...@apache.org>.
hakuno commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1778309120

   @jedcunningham @hussein-awala @potiuk I got it.
   
   ```
   pytest helm_tests/airflow_core/test_worker.py::TestWorker::test_safetoevict_annotations
   ============================================================================================ test session starts ============================================================================================
   platform linux -- Python 3.8.18, pytest-7.4.2, pluggy-1.3.0 -- /usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow
   configfile: pyproject.toml
   plugins: timeouts-1.2.1, xdist-3.3.1, capture-warnings-0.0.4, instafail-0.5.0, mock-3.12.0, asyncio-0.21.1, requests-mock-1.11.0, time-machine-2.13.0, cov-4.1.0, rerunfailures-12.0, anyio-4.0.0, httpx-0.21.3
   asyncio: mode=strict
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 2 items                                                                                                                                                                                           
   
   helm_tests/airflow_core/test_worker.py::TestWorker::test_safetoevict_annotations[true-True] PASSED                                                                                                    [ 50%]
   helm_tests/airflow_core/test_worker.py::TestWorker::test_safetoevict_annotations[false-False] PASSED                                                                                                  [100%]
   
   ============================================================================================ 2 passed in 16.65s =============================================================================================
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(helm): set worker safeToEvict properly [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #35130:
URL: https://github.com/apache/airflow/pull/35130#issuecomment-1783266148

   Awesome work, congrats on your first merged pull request! You are invited to check our [Issue Tracker](https://github.com/apache/airflow/issues) for additional contributions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org