You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/28 23:09:30 UTC

[GitHub] [airflow] potiuk opened a new pull request #15062: Parallelize build of documentation.

potiuk opened a new pull request #15062:
URL: https://github.com/apache/airflow/pull/15062


   This is far more complex than it should be because of
   autoapi problems with parallel execution. Unfortunately autoapi
   does not cope well when several autoapis are run in parallel on
   the same code - even if they are run in separate processes and
   for different packages. Autoapi uses common _doctree and _api
   directories generated in the source code and they override
   each other if two or more of them run in parallel.
   
   The solution in this PR is mostly applicable for CI environment.
   In this case we have docker images that have been already built
   using current sources so we can safely run separate docker
   containers without mapping the sources and run generation
   of documentation separtely and independently in each container.
   
   This seems to work really well, speeding up docs generation
   2x in public GitHub runners and 8x in self-hosted runners.
   
   Public runners:
   
   * 27m -> 15m
   
   Self-hosted runners:
   
   * 27m -> < 4m
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-808986689


   [The Workflow run](https://github.com/apache/airflow/actions/runs/696188174) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-809171652


   @ashb @kaxil @mik-laj. Good news. Seems that I managed to get doc build parallelization under control. It required dockerizing the build (autoapi relies on _api generated in the sources and the only way I could achieve paralelisation was to isolate sources of Airflow between the parallel docs via docker containers. 
   
   I will still have to work a little on output /formatting and stability but seems that with build parallelisation we can get vast speed improvements as well in doc building time:
   *  we have about 8 minutes now for Self-hosted runners (exluding time needed for setting up the venv which will be removed because of venv caching).
   * we have about 12 minutes for the GitHub Runners. I still have to make sure the GitHub Runners are not outliers  (and possibly I will figure out why the build in Self-hosted runnners is not even faster). 
   
   But anyhow this speedup is significant and we will have further optimisation of our CI infrastructure/cost.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-809171652


   @ashb @kaxil @mik-laj. Good news. Seems that I managed to get doc build parallelization under control. It required dockerizing the build (autoapi relies on _api generated in the sources and the only way I could achieve paralelisation was to isolate sources of Airflow between the parallel docs via docker containers)
   
   I will still have to work a little on output /formatting and stability but seems that with build parallelisation we can get vast speed improvements as well in doc building time:
   *  we have about 8 minutes now for Self-hosted runners (exluding time needed for setting up the venv which will be removed because of venv caching).
   * we have about 12 minutes for the GitHub Runners. I still have to make sure the GitHub Runners are not outliers  (and possibly I will figure out why the build in Self-hosted runnners is not even faster). 
   
   But anyhow this speedup is significant and we will have further optimisation of our CI infrastructure/cost.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-810029974


   Hey @mik-laj @kaxil @ashb others:
   
   I got really good results of the docs-build parallelisation and really nice and easy to use output. It will automatically adjust to the number of processors available (I am using mulltiprocessor.Pool for that).
   
   1) 2-processor parallel build of documentation on GitHub Runners: 27 m -> 13m 23s (50%)
   https://github.com/potiuk/airflow/runs/2223465529?check_suite_focus=true#step:7:1
   
   2) 8-processor self-hosted runnerS: 27m -> 9m 43s (65% improvement).
   https://github.com/apache/airflow/pull/15062/checks?check_run_id=2223546296#step:7:1
   
   This is how successful build looks like:
   
   ![Screenshot from 2021-03-30 10-22-46](https://user-images.githubusercontent.com/595491/112958518-d9084900-9142-11eb-9bfd-70ab84086532.png)
   
   This is how failed build looks like (you can see detailed log by unfolding the "failed" groups):
   
   https://github.com/apache/airflow/pull/15079/checks?check_run_id=2223547086#step:7:1
   
   ![Screenshot from 2021-03-30 10-31-25](https://user-images.githubusercontent.com/595491/112959152-6b105180-9143-11eb-91da-ee49e95f041d.png)
   ![Screenshot from 2021-03-30 10-33-29](https://user-images.githubusercontent.com/595491/112959157-6e0b4200-9143-11eb-96ee-24284fa3331f.png)
   
   I just fixed one last problem with missing "colors" in spell-check output but it should be good-to-go.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-808980471


   [The Workflow run](https://github.com/apache/airflow/actions/runs/696132291) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-810029974


   Hey @mik-laj @kaxil @ashb others:
   
   I got really good results of the docs-build parallelisation and really nice and easy to use output. It will automatically adjust to the number of processors available (I am using mulltiprocessor.Pool for that).
   
   1) 2-processor parallel build of documentation on GitHub Runners: 27 m -> 13m 23s (50%)
   https://github.com/potiuk/airflow/runs/2223465529?check_suite_focus=true#step:7:1
   
   2) 8-processor self-hosted runnerS: 27m -> 9m 43s (65% improvement).
   https://github.com/apache/airflow/pull/15062/checks?check_run_id=2223546296#step:7:1
   
   This is how successful build looks like:
   
   ![Screenshot from 2021-03-30 10-22-46](https://user-images.githubusercontent.com/595491/112958518-d9084900-9142-11eb-9bfd-70ab84086532.png)
   
   This is how failed build looks like (you can see detailed log by unfolding the "failed" groups):
   
   https://github.com/potiuk/airflow/runs/2223465529?check_suite_focus=true#step:7:1
   
   ![Screenshot from 2021-03-30 10-31-25](https://user-images.githubusercontent.com/595491/112959152-6b105180-9143-11eb-91da-ee49e95f041d.png)
   ![Screenshot from 2021-03-30 10-33-29](https://user-images.githubusercontent.com/595491/112959157-6e0b4200-9143-11eb-96ee-24284fa3331f.png)
   
   I just fixed one last problem with missing "colors" in spell-check output but it should be good-to-go.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-810029974


   Hey @mik-laj @kaxil @ashb others:
   
   I got really good results of the docs-build parallelisation and really nice and easy to use output. It will automatically adjust to the number of processors available (I am using mulltiprocessor.Pool for that).
   
   1) 2-processor parallel build of documentation on GitHub Runners: **27 m -> 13m 23s** (50% faster !!!)
   https://github.com/potiuk/airflow/runs/2223465529?check_suite_focus=true#step:7:1
   
   2) 8-processor self-hosted runners: **27m -> 9m 43s** (65% faster !!).
   https://github.com/apache/airflow/pull/15062/checks?check_run_id=2223546296#step:7:1
   
   This is how successful build looks like:
   
   ![Screenshot from 2021-03-30 10-22-46](https://user-images.githubusercontent.com/595491/112958518-d9084900-9142-11eb-9bfd-70ab84086532.png)
   
   This is how failed build looks like (you can see detailed log by unfolding the "failed" groups):
   
   https://github.com/apache/airflow/pull/15079/checks?check_run_id=2223547086#step:7:1
   
   ![Screenshot from 2021-03-30 10-31-25](https://user-images.githubusercontent.com/595491/112959152-6b105180-9143-11eb-91da-ee49e95f041d.png)
   ![Screenshot from 2021-03-30 10-33-29](https://user-images.githubusercontent.com/595491/112959157-6e0b4200-9143-11eb-96ee-24284fa3331f.png)
   
   I just fixed one last problem with missing "colors" in spell-check output but it should be good-to-go.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #15062:
URL: https://github.com/apache/airflow/pull/15062


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15062: Parallelize build of documentation.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15062:
URL: https://github.com/apache/airflow/pull/15062#issuecomment-810029974


   Hey @mik-laj @kaxil @ashb others:
   
   I got really good results of the docs-build parallelisation and really nice and easy to use output. It will automatically adjust to the number of processors available (I am using mulltiprocessor.Pool for that).
   
   1) 2-processor parallel build of documentation on GitHub Runners: 27 m -> 13m 23s (50%)
   https://github.com/potiuk/airflow/runs/2223465529?check_suite_focus=true#step:7:1
   
   2) 8-processor self-hosted runnerS: 27m -> 9m 43s (65% improvement).
   https://github.com/apache/airflow/pull/15062/checks?check_run_id=2223546296#step:7:1
   
   This is how successful build looks like:
   
   ![Screenshot from 2021-03-30 10-22-46](https://user-images.githubusercontent.com/595491/112958518-d9084900-9142-11eb-9bfd-70ab84086532.png)
   
   This is how failed build looks like (you can see detailed log by unfolding the "failed" groups):
   
   ![Screenshot from 2021-03-30 10-31-25](https://user-images.githubusercontent.com/595491/112959152-6b105180-9143-11eb-91da-ee49e95f041d.png)
   ![Screenshot from 2021-03-30 10-33-29](https://user-images.githubusercontent.com/595491/112959157-6e0b4200-9143-11eb-96ee-24284fa3331f.png)
   
   I just fixed one last problem with missing "colors" in spell-check output but it should be good-to-go.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org