You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/16 11:31:59 UTC

[GitHub] [airflow] potiuk edited a comment on issue #15933: Consider splitting Google Provider

potiuk edited a comment on issue #15933:
URL: https://github.com/apache/airflow/issues/15933#issuecomment-970181315


   I think *should* is the key problem here. The initial split is quite easy, but what would happen after few releases is scary like hell. "Keep backwards compatibiliity of commons" in this case is quite a bit of wishful thinking and tt's far easier said than done and it requires constant vigiliance and fixes of accidental incompatibilities that will eventually creep-in leading to uncontrollable growth in complexity.
   
   The problem is when people start refactoring and add code that will break the compatibility accidentally or when you want to do a refactor that will improve the common code but introduces compatibilty isses.. 
   
   We already saw examples of that with db_api: see the comment here: https://github.com/apache/airflow/blob/main/airflow/hooks/dbapi.py#L45
   This is just one class with few methods and I personally recall at least 3 cases where there were almost merged (or even merged) changes that would break the compatibility of existing, released providers (in all compatible versions) accidentally:
   Splitting Google provider is the same but order or magnitude worse potentially as there is much more common code than that one class.
   
   People making changes to such common code (and often even reviewwers/maintainers) might not realise the consequences of their changes on already released packages. Some changes will accidentally break compatibilities even if you are careful. It basically requires that all the relased google-providers are FUTURE compatible with alll the released version of the "common" package.
   
   Unless you have full test suite that can handle various cases, and make sure that the "common package" will work with ALL already released and compatible providers that people have, thiere is no way to "make sure" it's the case. And even if we can add such a test suite (which is possible to some extent just very costly on maintenance and running), this prevents you from doing more "bold" refactorings - which is generally very bad side-effect of such approach. I think ability to refector code is crucial to maintainability. 
   
   For example now, we have such a test suite for all providers - we make sure that they import without warnings on Airflow 2.1 in our CI. But this is not a full guarantee they will work with Airflow 2.1 - this test is just a "smoke" test - but it already caught at least 2 cases of seemingly "innocent" change that would make all such providers stop working on 2.1.
   
   Of course you can also introduce "breaking" changes in the common code (and release 2.0 package), but then this inevitably leads to one of two things: 
   
   1) you also release all dependent packages with "breaking" realease that is effectively equivalent to releasing a new "google provider" release today.
   
   2) you have to maintain compatibility in all the dependent packages (thy shoudl work with both 1.0 and 2.0 of the commons) which leads to messy code and will break eventually as you add more changes. It can only be maintained for short time and eventually it leads to 1) - you have to say at some point of time "provider google-x.N" only works with "commons-2.0" and above. Just maintaining those dependencies is a pain and you require dedicated people who would keep an eye on those dependencies.
   
   The question here is what is more costly (and for whom):
   
   a) complexity of maintenance of compatibilities between different versions of  common packages and released "google" packages, with potential ability for the user to upgrade only some parts 
   
   or 
   
   b) complexity of the users who have to adapt to potentially more frequently handling breaking changes with new "full gogle provider" release
   
   I tnink the a) one is something that will grow more and more complex for maintainers over time, where b) is kinda "stable" - it requires some regular effort from the users but in a long run it is esiear to handle by them. Also a) has one very uncomfortable,  for open-source project at least,  property. I immediately imagine many issues opened by the users "i want to install google-ads-6 and google-gcs-3 because this and that and they do not work together because they require different "commons". Just conversations about that and explaining what can and cannot be done will take a good chunk of time for maintainers who will know how it works. Explaining that in docs will be next to impossible I am afraid. Right now we avoid all those conversations by releasing a single google provider. The conversation is very simple: "if you want to use google ads which were added in provider 5.0 you need to also adapt all other google properties to the version that is there". End of story. By havi
 ng multiple versions, the amount of user stories here grows exponentially large.
   
   Also I think in a long term you will not avoid the "breaking all" releases anyway. Users will have to do it anyway - only it will cost them more because they will have to do it much less frequently (counter-intuitively). There is an old saying that if migration is painful - just do it more often. The "split providers" approach leads to potentially less frequent, but more painful upgrades to our users + adds effort on maintenance of it on committers. The "single provider" means that you potentially must do more frequent but less paindul upgrades  (say every month when you upgrade google provider).  But there is no solution with "no pain".
   
   I am not saying all this is impossible - technically splitting the google provider is possible. I just think that the person (or rather grouop) who commits doing it realises the consequences and takes the burden of organising it in the way that we do not land in "dependency" help. I personally currently have not enough courage to commit to it to be honest. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org