You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/17 10:37:12 UTC

[GitHub] [airflow] potiuk commented on issue #12983: Add support for the new selective docs building in CI build-docs step

potiuk commented on issue #12983:
URL: https://github.com/apache/airflow/issues/12983#issuecomment-761768795


   All such cross references are already automatically checked (via pre-commit) and stored in https://github.com/apache/airflow/blob/master/airflow/providers/dependencies.json . 
   
   I think we should be then very consistent in what we propose as optimizations and base it on actual gains achieved for everyone (not only for committer who push their change). I want to think 'globally' in terms of regular contributor adding their code not only in "let's optimise the speed of changes that I usually do". 
   
   I returned back to the subject after @mik-laj  raised this: #13706
   
   Now I want us to consider what makes more sense:
   
   1) Optimizing #13706 - doc/*.rst documentation-only change
   
   2) Optimizing docs build per-provider - only one provider code+documentation changes
   
   We are optimizing not  for "now" (i.e. 2.0.1 release and 2.0.0 'cleanups') but also for the next  (say) 5 months of Airflow Builds. 
   
   Whatever we invest now in - will get return of that investment in the coming months. 
   
   Let's compare how often those builds happen (1)  vs (2).
   
   For (1) those are builds that are usually "cleanup and refactor kinds" - we have them now a bit more often because we are  cleaning up stuff - restructuring the docs a little, fixing last spell checking issues.  However in "regular" work. I'd argue changing only documentation is a bad smell. Documentation should usually be changed together with the code when that code happens. So I argue that those builds will be extremely rare in the future. And they will mostly not impact the people who are making 'substantial' changes - those who need fast feedback on their builds.
   
   On the other hand, I think we will have a lot (I think vast majority) of "one-provider-only" changes coming from contributors. Those are already quite optimized during the tests and we can (I plan to)  also easily add 'run tests only for this provider and dependent ones (using the `dependencies.json`). 
   
   This will cut down test time to ~ 5 minutes for typical providers. Now, full docs build takes 25 minutes. If we also limit it to only those providers that changed, this docs build will also take 5 minutes for most of the providers (we do not have to build "airflow" docs in this case  - there should be "0" references from airflow to particular providers docs.
   
   This basically means that a contributor will get feedback about 'one-provider' change 20 minutes faster. Multiply it by a number of "one-provider" changes. 
   
   I have no time to pull the 'hard" data - i.e. how many changes we have that are (1) "doc/*.rst" only vs. (2) "single-provider" only. But my intuition tells me that over the next months vast majority of the changes will be of the (2) kind. 
   
   I firmly believe we should optimize those things that have higher impact and  bring more benefits. I hate doing micro-optimisations, I always think "long-term"/"high impact" when I am doing it.
   
   WDYT @kaxil @mik-laj which of those are worth it? (1) or (2)? I argue (above) that (2) has much higher impact and brings more benefits to the community as whole. If you think we should do (1) rather than (2), I'd love to hear your line of thoughts an reasoning why you think it is better to do it.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org