You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/06 22:48:02 UTC

[GitHub] [airflow] thomasrockhu opened a new pull request #20087: Upload coverage for PRs to main

thomasrockhu opened a new pull request #20087:
URL: https://github.com/apache/airflow/pull/20087


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   Right now, coverage isn't uploaded to Codecov except on commits on the `main` branch. This change will also upload coverage on PRs to the `main` branch on `apache/airflow` on PRs (both on the main repository and forks)
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20087:
URL: https://github.com/apache/airflow/pull/20087


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thomasrockhu-codecov edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
thomasrockhu-codecov edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002392632


   hey @potiuk sorry for the delay here (holidays). I'm having trouble reproducing the issue you mentioned. For both cases
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py`
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py --cov`
   
   I'm hitting about +20MB of memory usage in the `coverage` case at most
   
   If it helps,
   ```
   docker: 20.10.7
   python: 3.7
   ```
   
   I uploaded a video of what I'm seeing on a clean run: https://youtu.be/VjX7892gL6Y


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002543301


   Try with those args (and use ./breeze as it contains .coveragerc for example:
   
   ```
   if [[ ${ENABLE_TEST_COVERAGE:="false"} == "true" ]]; then
       EXTRA_PYTEST_ARGS+=(
           "--cov=airflow/"
           "--cov-config=.coveragerc"
           "--cov-report=xml:/files/coverage-${TEST_TYPE}-${BACKEND}.xml"
       )
   fi
   ```
   
   I just repeated that, and results are the same. Steps to reproduce:
   
   Setup:
   
   1) `./breeze stop` (stop all containers)
   2) `./breeze --db-reset --backend sqlite` (starts breeze in basic sqlite configuration with reseting the db)
   3)  Run `docker stats` in separate terminal
   4) Run `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py` to "warm the caches" 
   
   Once you do it, it very reproducible:
   
   Test a) - without coverage:
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py`
   
   https://user-images.githubusercontent.com/595491/147656637-6ad880e1-9f2b-4364-915a-cd5c92aa58c0.mp4
   
   Memory tops ~ 156 MB and drops right after the tests fiinish.
   
   Test b) - with coverage:
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py --cov=airflow/ --cov-config=.coveragerc --cov-report=xml:/files/coverage-test.xml`
   
   https://user-images.githubusercontent.com/595491/147656618-c550ae90-521c-40cd-b842-32172a2ab94a.mp4
   
   During the test -  memory usage is very similar - but when the tests are finishing (I guess output report is prepared) the memory used goes quickly up to more than 230 MB  just before dropping (about 0:19 of the video). This effect is non-existing in case coverate is disabled.
   
   This is fully reproducible.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-987713100


   This is not a good idea for two reasons:
   
   * Gathering coverage during our tests takes a lot of memory - see #19523. Most of our PRs are run in Public GitHub Runners and gathering coverage will cause OOM errors on public runners.
   
   * We are using selective checks in our PRs. We detect what kind of changes are done in the PR and based on that we run subset of tests. That makes  lcoverage information wrong because those PRs show far less coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002546829


   (BTW. This is just a very small subset of the tests. When we run full tests suite, the excess memory used will be much bigger (but this one shows the "spikey" character of this). 
   
   The problem is that when the test are finished, we have a sudden spike of memory used (and even +80 MB which we see in this case is a lot). This is all before pytest has a chance to free the memory used for all the "legitimate" reasons.  
   
   We have rather limited memory on public runners so every 100 MB counts - the effect compounds when we also have database docker containers running and integration dockers running, We hevily optimized the memory used by our database dockers to make them fit the memory we have in public runners buit even with that, we have very little room left. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-987613486


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thomasrockhu-codecov commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
thomasrockhu-codecov commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002392632


   hey @potiuk sorry for the delay here (holidays). I'm having trouble reproducing the issue you mentioned. For both cases
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py`
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py --cov`
   
   I'm hitting about 200-230 MB of memory usage in both cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-987328197


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-988000747


   > I reproducibly run the test_kubermetes.py followed by test_backfill_job.py with/without coverage yielding 2-3 GB vs 700 MB memory used while test_backfill_job was running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002546829


   (BTW. This is just a very small subset of the tests. When we run full tests suite, the excess memory used will be much bigger (but this one shows the "spikey" character of this). 
   
   The problem is that when the test are finished, we have a sudden spike of memory used (and even +80 MB which we see in this case is a lot). This is all before pytest has a chance to free the memory used for all the "legitimate" reasons.  
   
   We have rather limited memory on public runners so every 100 MB counts - the effect compounds when we also have database docker containers running and integration dockers running, We heavily optimized the memory used by our database dockers to make them fit the memory we have in public runners buit even with that, we have very little room left. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-988005879


   It could be "generated" by some internals of pytest/coverage interaction - but for use stability is much more important than PR coverage and I've already lost quite a lot of time chasing other memory issues, so unless you could help us solving this problem, it is super-low priority for us.
   
   Just in case you'd like to take a look @thomasrockhu,  running tests in reproducible conditions is easy:
   1) checkout airflow
   2) run `./breeze` command and wait for the environment to bootstrap
   3) once you are in the `docker-compose` shell of airlfow test environmet
   4) run pytest commands
   
   In this case `pytest /tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py` - with/without coverage flag should show you the difference (just monitor memory used while the tests are running).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002543301


   Try with those args (and use ./breeze as it contains .coveragerc for example:
   
   ```
   if [[ ${ENABLE_TEST_COVERAGE:="false"} == "true" ]]; then
       EXTRA_PYTEST_ARGS+=(
           "--cov=airflow/"
           "--cov-config=.coveragerc"
           "--cov-report=xml:/files/coverage-${TEST_TYPE}-${BACKEND}.xml"
       )
   fi
   ```
   
   I just repeated that, and results are the same. Steps to reproduce:
   
   Setup:
   
   1) `./breeze stop` (stop all containers)
   2) `./breeze --db-reset --backend sqlite` (starts breeze in basic sqlite configuration with reseting the db)
   3)  Run `docker stats` in separate terminal
   4) Run `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py` to "warm the caches" 
   
   Once you do it, it very reproducible:
   
   Test a) - without coverage:
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py`
   
   https://user-images.githubusercontent.com/595491/147656637-6ad880e1-9f2b-4364-915a-cd5c92aa58c0.mp4
   
   Memory tops ~ 156 MB and drops right after the tests fiinish.
   
   Test b) - with coverage:
   
   `pytest ./tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py --cov=airflow/ --cov-config=.coveragerc --cov-report=xml:/files/coverage-test.xml`
   
   https://user-images.githubusercontent.com/595491/147656618-c550ae90-521c-40cd-b842-32172a2ab94a.mp4
   
   Durint the test memory usage is very similar - but when the tests are finishing (I guess output report is prepared) the memory used goes quickly up to more than 230 MB  just before dropping (about 0:19 of the video). This effect is non-existing in case coverate is disabled.
   
   This is fully reproducible.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thomasrockhu commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
thomasrockhu commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-987995122


   Ahhh that makes sense @potiuk. Would something like [carryforward flags](https://docs.codecov.com/docs/carryforward-flags) make sense? We built that for teams that don't run their entire test suite on every commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-988000138


   That might solve the 2nd problem indeed (good one!). But I am afraid the first problem is more serious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-988005879


   It could be "generated" by some internals of pytest/coverage interaction - but for use stability is much more important than PR coverage and I've already lost quite a lot of time chasing other memory issues, so unless you could help us solving this problem, it is super-low priority for us.
   
   Just in case you'd like to take a look @thomasrockhu,  running tests in reproducible conditions is easy:
   1) checkout airflow
   2) run `./breeze` command and wait for the environment to bootstrap
   3) once you are in the `docker-compose` shell of airlfow test environmet you can run pytest commands
   
   In this case `pytest /tests/providers/cncf/kubernetes/hooks/test_kubernetes.py ./tests/jobs/test_backfill_job.py` - with/without coverage flag should show you the difference (just monitor memory used while the tests are running).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002546829


   (BTW. This is just a very small subset of the tests. When we run full tests suite, the excess memory used will be much bigger (but this one shows the "spikey" character of this. The problem is that when the test are finished, we have a sudden spike of memory used (and even 80 MB which we see in this case is a lot). This is all before pytest has a chance to free the memory used for all the "legitimate" reasons.  
   
   We have rather limited memory on public runners so every 100 MB counts - the effect compounds when we also have database docker containers running and integration dockers running, We hevily optimized the memory used by our database dockers to make them fit the memory we have in public runners and we have little room left. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002546829


   (BTW. This is just a very small subset of the tests. When we run full tests suite, the excess memory used will be much bigger (but this one shows the "spikey" character of this. The problem is that when the test are finished, we have a sudden spike of memory used (and even 80 MB which we see in this case is a lot). This is all before pytest has a chance to free the memory used for all the "legitimate" reasons.  
   
   We have rather limited memory on public runners so every 100 MB counts - the effect compounds when we also have database docker containers running and integration dockers running, We hevily optimized the memory used by our database dockers to make them fit the memory we have in public runners buit even with that, we have very little room left. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-1002546829


   (BTW. This is just a very small subset of the tests. When we run full tests suite, the excess memory used will be much bigger (but this one shows the "spikey" character of this. The problem is that when the test are finished, we have a sudden spike of memory used (and even +80 MB which we see in this case is a lot). This is all before pytest has a chance to free the memory used for all the "legitimate" reasons.  
   
   We have rather limited memory on public runners so every 100 MB counts - the effect compounds when we also have database docker containers running and integration dockers running, We hevily optimized the memory used by our database dockers to make them fit the memory we have in public runners buit even with that, we have very little room left. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #20087: Upload coverage for PRs to main

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #20087:
URL: https://github.com/apache/airflow/pull/20087#issuecomment-987453871


   Related: https://github.com/apache/airflow/pull/19523 CC: @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org