You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2024/02/21 16:42:20 UTC

[DISCUSS] Common.util provider?

Hello everyone,

How do we feel about introducing a common.util provider?

I know it's not been the original idea behind providers, but - after
testing common.sql and now also having common.io, seems like more and more
we would like to extract some common code that we would like providers to
use, but we refrain from it, because it will only be actually usable 6
months after we introduce some common code.

However, if we introduce common.util, this problem is generally gone - at
the expense of more complex maintenance and cross-provider dependencies.
We should be able to add a common util method and use it in a provider at
the same time.

Think Amazon provider using a new feature released in common.util >=1.2.0
and google provider >= 1.1.0. All manageable and we do it already for
common.sql. We know how to do it, we know what to avoid, we know we cannot
introduce backwards-incompatible changes, so we have to be very clear what
is and what is not a public API there, We could rather easily add tests to
prevent such backwards-incompatible changes. We even have a solution for
chicken-egg providers where we need to release two providers at the same
time if they depend on each other. Generally speaking it's quite workable
but adds a bit of overhead.

Examples that we could implement as "common.util":

- common versioning check with cache - where multiple providers could reuse
"do we have pendulum 2"
- more complex - some date management features (we have a few like
date_ranges/round_time). But there are many more.

I generally do not love the common "util" approach. It has a tendency to
become a bag of everything over time. but if we limit it to a set of small,
fully decoupled modules where each module is independent - it's OK. And we
already have it in "airflow.util" and we seem to be doing well.

WDYT? Is it worth it ?

J.

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Any other  ideas or suggestions here? Can someone explain how the
"polypill" approach would look like, maybe? How do we imagine this working?

Just to continue this discussion - another example.

Small thing that David wanted to add for changes in some sql providers:

@contextmanager
def suppress_and_warn(*exceptions: type[BaseException]):
    """Context manager that suppresses the given exceptions and logs a
warning message."""
    try:
        yield
    except exceptions as e:
        warnings.warn(f"Exception suppressed:
{e}\n{traceback.format_exc()}", category=UserWarning)


https://github.com/apache/airflow/pull/38707/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R115

This is a small thing - but adding it in `airflow` is problematic - because
it will only be released in 1.10, so we cannot use it in providers if we
do.
Currently - since it is used in sql providers, I suggested using
`common.sql` for that code (and add >= 1.12 for common-sql-providers for
those providers that use it). And I will write a separate email about a
proposed versioning approach there.

Do we have a good proposal on how we can solve similar things in the
future?
Do we want it at all? It has some challenges - yes it DRY's the code but it
also introduces coupling.

J.


On Sun, Mar 10, 2024 at 6:21 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Coming back to it - what about the "polypill" :)? What's different vs the
> "common.sql" way of doing it ? How do we think it can work ?
>
> On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> > The symbolic link approach seems to disregard all the external
>> providers, unless I misunderstand it.
>>
>> Not really. It just does not make it easy for the external providers to
>> use it "fast".  They can still - if they want to just manually copy those
>> utils from the latest version of Airflow if they want to use it. Almost by
>> definition, those will be small, independent modules that can be simply
>> copied as needed by whoever releases external providers - and they are also
>> free to copy any older version if they want. That is a nice feature that
>> makes them fully decoupled from the version of Airflow they are installed
>> in (same as community providers). Or - if they want they can just import
>> them from "airflow.provider_utils" - but then they have to add >= Airflow
>> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
>> to use symbolic links - because due to our policies and promises, we do not
>> want community providers to depend on latest version of Airflow in vast
>> majority of cases.
>>
>> So this approach is also fully usable by external providers, but it
>> requires some manual effort to copy the modules to their providers.
>>
>> > I like the polypill idea. A backport provider that brings new
>> interfaces to providers without the actual functionalities.
>>
>> I would love to hear more about this, I think the "common.util" was
>> exactly the kind of polyfill approach (with its own versioning
>> complexities) but maybe I do not understand how such a polyfill provider
>> would work. Say we want to add a new "urlparse" method usable for all
>> providers. Could you explain how it would work - say:
>>
>> * we add "urlparse" in Airflow 2.9
>> * some provider wants to use it in Airflow 2.7
>>
>> What providers, with what code/interfaces we would have to release in
>> this case and what dependencies such providers that want to use it (both
>> community and Airflow should have)? I **think** that would mean exactly the
>> "common.<something>" approach we already have with "io" and "sql", but
>> maybe I do not understand it :)
>>
>> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <tp...@astronomer.io.invalid>
>> wrote:
>>
>>> I like the polypill idea. A backport provider that brings new interfaces
>>> to providers without the actual functionalities.
>>>
>>>
>>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mo...@apache.org>
>>> wrote:
>>> >
>>> >> That's why I generally do
>>> > not like the "util" approach because common packaging introduces
>>> > unnecessary coupling (you have to upgrade independent utils together).
>>> >
>>> > From my experience with releasing OpenLineage where we do things
>>> similarly:
>>> > I think that's the advantage of it, but only _if_ you can release those
>>> > together.
>>> > With "build-in" providers it makes sense, but could be burdensome if
>>> > "external"
>>> > ones would depend on that functionality.
>>> >
>>> >> I know it's not been the original idea behind providers, but - after
>>> > testing common.sql and now also having common.io, seems like more and
>>> more
>>> > we would like to extract some common code that we would like providers
>>> to
>>> > use, but we refrain from it, because it will only be actually usable 6
>>> > months after we introduce some common code.
>>> >
>>> > So, maybe better approach would be to introduce the functionality into
>>> > core,
>>> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
>>> > to make sure providers could use that functionality now, where external
>>> > ones could depend on the Airflow ones?
>>> >
>>> > The symbolic link approach seems to disregard all the external
>>> providers,
>>> > unless
>>> > I misunderstand it.
>>> >
>>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>> >
>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>> Storage,
>>> >> reading SQL, etc.) should each have its own utility package, so they
>>> can be
>>> >> released independently without dependency problems popping up if we
>>> need to
>>> >> break compatibility to one purpose. But more providers are
>>> exponentially
>>> >> more difficult to maintain, so I’d settle for one utility provider
>>> for now
>>> >> and split further if needed in the future.
>>> >>
>>> >> Very much agree with this general statement. That's why I generally do
>>> >> not like the "util" approach because common packaging introduces
>>> >> unnecessary coupling (you have to upgrade independent utils
>>> together). And
>>> >> when we have a common set of things that seem to make sense to be
>>> released
>>> >> together when upgraded we should package them together in
>>> >> "common.<something concrete" (like we have with common.io and
>>> common.sql).
>>> >>
>>> >> However - in this case, I think what Jens proposed (and I am happy to
>>> try
>>> >> as well) is to attempt to use symbolic links - i.e. add the code in
>>> >> `airflow.util` but then create a symbolic link in the provider.  I
>>> tested
>>> >> it yesterday and it works as expected - i.e. such symbolic link is
>>> >> dereferenced and the provider package contains the python file, not
>>> >> symbolic link. That seems like a much more lightweight approach that
>>> will
>>> >> serve the purpose of "common.util" much better. The only thing we
>>> will have
>>> >> to take care of (and we can add it once the POC is successful) is to
>>> add
>>> >> some pre-commit protection that those kind of symbolically linked util
>>> >> modules are imported in providers, from inside of those provider, not
>>> from
>>> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned
>>> - not
>>> >> depend on anything in airflow code). We could create a new package
>>> for that
>>> >> in airlfow
>>> >> "airlfow.provider_utils" for example - and make sure (as next step)
>>> that
>>> >> anything from that package is never directly imported by any
>>> provider, and
>>> >> whenever provider uses it, it should be symbolic link inside that
>>> provider.
>>> >> That's all automatable and we can prevent mistakes via pre-commit.
>>> >>
>>> >> I think that might lead to a very lightweight approach where we
>>> introduce
>>> >> new common functionality which is immediately reusable in providers
>>> without
>>> >> the hassle of taking care about backwards compatibility, and managing
>>> the
>>> >> "common.util" provider. At the expense of a bit complex pre-commit
>>> that
>>> >> will guard the usage of it.
>>> >>
>>> >> Seems that it might be the "Eat cake and have it too" way that we've
>>> been
>>> >> looking for.
>>> >>
>>> >> J.
>>> >>
>>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung
>>> <tp...@astronomer.io.invalid>
>>> >> wrote:
>>> >>
>>> >>> It would help in the sense mentioned in previous posts, yes. But one
>>> >> thing
>>> >>> I want to point out is, for the provider to actually be helpful, it
>>> must
>>> >> be
>>> >>> treated a bit differently from normal providers, but more like a
>>> separate
>>> >>> third-party dependency. Specifically, the provider should not have a
>>> >>> dependency to Core Airflow, so it can be released and depended on
>>> more
>>> >>> flexibly.
>>> >>>
>>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>>> Storage,
>>> >>> reading SQL, etc.) should each have its own utility package, so they
>>> can
>>> >> be
>>> >>> released independently without dependency problems popping up if we
>>> need
>>> >> to
>>> >>> break compatibility to one purpose. But more providers are
>>> exponentially
>>> >>> more difficult to maintain, so I’d settle for one utility provider
>>> for
>>> >> now
>>> >>> and split further if needed in the future.
>>> >>>
>>> >>> TP
>>> >>>
>>> >>>
>>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>> Jens.Scheffler@de.bosch.com.INVALID> wrote:
>>> >>>>
>>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse
>>> and
>>> >>> validate URIs for datasets?
>>> >>>>
>>> >>>> Mit freundlichen Grüßen / Best regards
>>> >>>>
>>> >>>> Jens Scheffler
>>> >>>>
>>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen
>>> |
>>> >>> GERMANY | www.bosch.com
>>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> >>> Jens.Scheffler@de.bosch.com
>>> >>>>
>>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>> Markus
>>> >>> Forschner,
>>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>>> >>>>
>>> >>>> -----Original Message-----
>>> >>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
>>> >>>> To: dev@airflow.apache.org
>>> >>>> Subject: Re: [DISCUSS] Common.util provider?
>>> >>>>
>>> >>>> Yep. It could work with symbolic links. Tested it and with flit -
>>> both
>>> >>> wheel and sdist packaged code such symbolically linked file is
>>> >> dereferenced
>>> >>> and copy of the file is added there. It could be a nice way of doing
>>> it.
>>> >>>>
>>> >>>> Maybe then worth trying next time if someone has a need?
>>> >>>>
>>> >>>> J
>>> >>>>
>>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>>> >>>>
>>> >>>>>>>> As of additional dependency complexity between providers
>>> actually
>>> >>>>>>>> the
>>> >>>>> additional dependency I think creates more problems than the
>>> benefit…
>>> >>>>> would be cool if there would be an option to „inline“ common code
>>> from
>>> >>>>> a single place but keep individual providers fully independent…
>>> >>>>>
>>> >>>>>> Well, we already  do a lot of inlining, so if we think we should
>>> do
>>> >>>>>> more,
>>> >>>>> we have mechanisms for that. We have  pre-commits and release
>>> commands
>>> >>>>> that do a lot of that. Pre commits are inlining scripts in
>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>> files
>>> >>>>> and changelogs and index documentation .rst (partially) are
>>> generated
>>> >>>>> at release documentation preparation time, pyproject.toml for
>>> >>>>> providers are generated from common templates at package building
>>> time
>>> >>>>> and so on and so on :). So we can do more of that and generate
>>> common
>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>> But
>>> >>>>> again "can't have and eat cake" - this has the drawback that there
>>> are
>>> >>>>> extra steps involved and even if it's automated it does add
>>> friction
>>> >>>>> when you have to regenerate the code every time you change it and
>>> when
>>> >>>>> you change it in another place than where you use it.
>>> >>>>>
>>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we
>>> really
>>> >>>>> in-line and have a pre-commit aligning the redundancy of python in
>>> >>> folders.
>>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
>>> common
>>> >>>>> stuff and if we change a common line we probably do not need to
>>> affect
>>> >>>>> all providers.
>>> >>>>>
>>> >>>>> As long as no Windows users trying to code for airflow (do we need
>>> to
>>> >>>>> consider?) would it also work to use symlinks? Git can cope with
>>> this,
>>> >>>>> I don't know if the python toolchain can de-reference a copy and
>>> are
>>> >>>>> not packaging a symlink? Would be worth a test... would save the
>>> >>>>> pre-commit and we even could selectively include common bla into
>>> >>>>> providers as needed :-D
>>> >>>>>
>>> >>>>> Mit freundlichen Grüßen / Best regards
>>> >>>>>
>>> >>>>> Jens Scheffler
>>> >>>>>
>>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> >>>>> Jens.Scheffler@de.bosch.com
>>> >>>>>
>>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer,
>>> Dr.
>>> >>>>> Tanja Rückert
>>> >>>>>
>>> >>>>> -----Original Message-----
>>> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>>> >>>>> To: dev@airflow.apache.org
>>> >>>>> Subject: Re: [DISCUSS] Common.util provider?
>>> >>>>>
>>> >>>>>> if we have a common piece then we are locking all depending
>>> >>>>>> providers
>>> >>>>> (potentially) together if common code changes
>>> >>>>>
>>> >>>>> Yes, coupling in this case is the drawback of this solution. You
>>> can't
>>> >>>>> have cake and eat it too and in this case you trade DRY with
>>> coupling.
>>> >>>>>
>>> >>>>>> As of additional dependency complexity between providers actually
>>> >>>>>> the
>>> >>>>> additional dependency I think creates more problems than the
>>> benefit…
>>> >>>>> would be cool if there would be an option to „inline“ common code
>>> from
>>> >>>>> a single place but keep individual providers fully independent…
>>> >>>>>
>>> >>>>> Well, we already  do a lot of inlining, so if we think we should do
>>> >>>>> more, we have mechanisms for that. We have  pre-commits and release
>>> >>>>> commands that do a lot of that. Pre commits are inlining scripts in
>>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>>> files
>>> >>>>> and changelogs and index documentation .rst (partially) are
>>> generated
>>> >>>>> at release documentation preparation time, pyproject.toml for
>>> >>>>> providers are generated from common templates at package building
>>> time
>>> >>>>> and so on and so on :). So we can do more of that and generate
>>> common
>>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>>> But
>>> >>>>> again "can't have and eat cake" - this has the drawback that there
>>> are
>>> >>>>> extra steps involved and even if it's automated it does add
>>> friction
>>> >>>>> when you have to regenerate the code every time you change it and
>>> when
>>> >>>>> you change it in another place than where you use it.
>>> >>>>>
>>> >>>>> J.
>>> >>>>>
>>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> >>>>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>>> >>>>>
>>> >>>>>> Hi Jarek,
>>> >>>>>>
>>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the
>>> feeling
>>> >>>>>> that a lot of very similar code is repeated in all the providers.
>>> >>>>>> But during review yesterday I dropped the ides because if we have
>>> a
>>> >>>>>> common piece then we are locking all depending providers
>>> >>>>>> (potentially) together if common code changes.
>>> >>>>>> As of additional dependency complexity between providers actually
>>> >>>>>> the additional dependency I think creates more prblems than the
>>> >>>>>> benefit… would be cool if tehere would be an option to „inline“
>>> >>>>>> common code from a single place but keep individual providers
>>> fully
>>> >>>>>> independent…
>>> >>>>>>
>>> >>>>>> Jens
>>> >>>>>>
>>> >>>>>> Sent from Outlook for
>>> >>>>>> iOS<
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> >>>>>> 2F
>>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
>>> %7C98c88
>>> >>>>>> 97
>>> >>>>>>
>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>> >>>>>> %7
>>> >>>>>>
>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>> >>>>>> Ij
>>> >>>>>>
>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>> >>>>>> ________________________________
>>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>> >>>>>> To: dev@airflow.apache.org <de...@airflow.apache.org>
>>> >>>>>> Subject: [DISCUSS] Common.util provider?
>>> >>>>>>
>>> >>>>>> Hello everyone,
>>> >>>>>>
>>> >>>>>> How do we feel about introducing a common.util provider?
>>> >>>>>>
>>> >>>>>> I know it's not been the original idea behind providers, but -
>>> after
>>> >>>>>> testing common.sql and now also having common.io, seems like more
>>> >>>>>> and more we would like to extract some common code that we would
>>> >>>>>> like providers to use, but we refrain from it, because it will
>>> only
>>> >>>>>> be actually usable 6 months after we introduce some common code.
>>> >>>>>>
>>> >>>>>> However, if we introduce common.util, this problem is generally
>>> gone
>>> >>>>>> - at the expense of more complex maintenance and cross-provider
>>> >>>>> dependencies.
>>> >>>>>> We should be able to add a common util method and use it in a
>>> >>>>>> provider at the same time.
>>> >>>>>>
>>> >>>>>> Think Amazon provider using a new feature released in common.util
>>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>> >>>>>> already for common.sql. We know how to do it, we know what to
>>> avoid,
>>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
>>> >>>>>> have to be very clear what is and what is not a public API there,
>>> We
>>> >>>>>> could rather easily add tests to prevent such
>>> backwards-incompatible
>>> >>>>>> changes. We even have a solution for chicken-egg providers where
>>> we
>>> >>>>>> need to release two providers at the same time if they depend on
>>> >>>>>> each other. Generally speaking it's quite workable but adds a bit
>>> of
>>> >>> overhead.
>>> >>>>>>
>>> >>>>>> Examples that we could implement as "common.util":
>>> >>>>>>
>>> >>>>>> - common versioning check with cache - where multiple providers
>>> >>>>>> could reuse "do we have pendulum 2"
>>> >>>>>> - more complex - some date management features (we have a few like
>>> >>>>>> date_ranges/round_time). But there are many more.
>>> >>>>>>
>>> >>>>>> I generally do not love the common "util" approach. It has a
>>> >>>>>> tendency to become a bag of everything over time. but if we limit
>>> it
>>> >>>>>> to a set of small, fully decoupled modules where each module is
>>> >>>>>> independent - it's OK. And we already have it in "airflow.util"
>>> and
>>> >>>>>> we seem to be
>>> >>>>> doing well.
>>> >>>>>>
>>> >>>>>> WDYT? Is it worth it ?
>>> >>>>>>
>>> >>>>>> J.
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>>> >>>> For additional commands, e-mail: dev-help@airflow.apache.org
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>>> For additional commands, e-mail: dev-help@airflow.apache.org
>>>
>>>

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Coming back to it - what about the "polypill" :)? What's different vs the
"common.sql" way of doing it ? How do we think it can work ?

On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > The symbolic link approach seems to disregard all the external
> providers, unless I misunderstand it.
>
> Not really. It just does not make it easy for the external providers to
> use it "fast".  They can still - if they want to just manually copy those
> utils from the latest version of Airflow if they want to use it. Almost by
> definition, those will be small, independent modules that can be simply
> copied as needed by whoever releases external providers - and they are also
> free to copy any older version if they want. That is a nice feature that
> makes them fully decoupled from the version of Airflow they are installed
> in (same as community providers). Or - if they want they can just import
> them from "airflow.provider_utils" - but then they have to add >= Airflow
> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
> to use symbolic links - because due to our policies and promises, we do not
> want community providers to depend on latest version of Airflow in vast
> majority of cases.
>
> So this approach is also fully usable by external providers, but it
> requires some manual effort to copy the modules to their providers.
>
> > I like the polypill idea. A backport provider that brings new interfaces
> to providers without the actual functionalities.
>
> I would love to hear more about this, I think the "common.util" was
> exactly the kind of polyfill approach (with its own versioning
> complexities) but maybe I do not understand how such a polyfill provider
> would work. Say we want to add a new "urlparse" method usable for all
> providers. Could you explain how it would work - say:
>
> * we add "urlparse" in Airflow 2.9
> * some provider wants to use it in Airflow 2.7
>
> What providers, with what code/interfaces we would have to release in this
> case and what dependencies such providers that want to use it (both
> community and Airflow should have)? I **think** that would mean exactly the
> "common.<something>" approach we already have with "io" and "sql", but
> maybe I do not understand it :)
>
> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <tp...@astronomer.io.invalid>
> wrote:
>
>> I like the polypill idea. A backport provider that brings new interfaces
>> to providers without the actual functionalities.
>>
>>
>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mo...@apache.org>
>> wrote:
>> >
>> >> That's why I generally do
>> > not like the "util" approach because common packaging introduces
>> > unnecessary coupling (you have to upgrade independent utils together).
>> >
>> > From my experience with releasing OpenLineage where we do things
>> similarly:
>> > I think that's the advantage of it, but only _if_ you can release those
>> > together.
>> > With "build-in" providers it makes sense, but could be burdensome if
>> > "external"
>> > ones would depend on that functionality.
>> >
>> >> I know it's not been the original idea behind providers, but - after
>> > testing common.sql and now also having common.io, seems like more and
>> more
>> > we would like to extract some common code that we would like providers
>> to
>> > use, but we refrain from it, because it will only be actually usable 6
>> > months after we introduce some common code.
>> >
>> > So, maybe better approach would be to introduce the functionality into
>> > core,
>> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
>> > to make sure providers could use that functionality now, where external
>> > ones could depend on the Airflow ones?
>> >
>> > The symbolic link approach seems to disregard all the external
>> providers,
>> > unless
>> > I misunderstand it.
>> >
>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>> >
>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>> Storage,
>> >> reading SQL, etc.) should each have its own utility package, so they
>> can be
>> >> released independently without dependency problems popping up if we
>> need to
>> >> break compatibility to one purpose. But more providers are
>> exponentially
>> >> more difficult to maintain, so I’d settle for one utility provider for
>> now
>> >> and split further if needed in the future.
>> >>
>> >> Very much agree with this general statement. That's why I generally do
>> >> not like the "util" approach because common packaging introduces
>> >> unnecessary coupling (you have to upgrade independent utils together).
>> And
>> >> when we have a common set of things that seem to make sense to be
>> released
>> >> together when upgraded we should package them together in
>> >> "common.<something concrete" (like we have with common.io and
>> common.sql).
>> >>
>> >> However - in this case, I think what Jens proposed (and I am happy to
>> try
>> >> as well) is to attempt to use symbolic links - i.e. add the code in
>> >> `airflow.util` but then create a symbolic link in the provider.  I
>> tested
>> >> it yesterday and it works as expected - i.e. such symbolic link is
>> >> dereferenced and the provider package contains the python file, not
>> >> symbolic link. That seems like a much more lightweight approach that
>> will
>> >> serve the purpose of "common.util" much better. The only thing we will
>> have
>> >> to take care of (and we can add it once the POC is successful) is to
>> add
>> >> some pre-commit protection that those kind of symbolically linked util
>> >> modules are imported in providers, from inside of those provider, not
>> from
>> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned
>> - not
>> >> depend on anything in airflow code). We could create a new package for
>> that
>> >> in airlfow
>> >> "airlfow.provider_utils" for example - and make sure (as next step)
>> that
>> >> anything from that package is never directly imported by any provider,
>> and
>> >> whenever provider uses it, it should be symbolic link inside that
>> provider.
>> >> That's all automatable and we can prevent mistakes via pre-commit.
>> >>
>> >> I think that might lead to a very lightweight approach where we
>> introduce
>> >> new common functionality which is immediately reusable in providers
>> without
>> >> the hassle of taking care about backwards compatibility, and managing
>> the
>> >> "common.util" provider. At the expense of a bit complex pre-commit that
>> >> will guard the usage of it.
>> >>
>> >> Seems that it might be the "Eat cake and have it too" way that we've
>> been
>> >> looking for.
>> >>
>> >> J.
>> >>
>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung
>> <tp...@astronomer.io.invalid>
>> >> wrote:
>> >>
>> >>> It would help in the sense mentioned in previous posts, yes. But one
>> >> thing
>> >>> I want to point out is, for the provider to actually be helpful, it
>> must
>> >> be
>> >>> treated a bit differently from normal providers, but more like a
>> separate
>> >>> third-party dependency. Specifically, the provider should not have a
>> >>> dependency to Core Airflow, so it can be released and depended on more
>> >>> flexibly.
>> >>>
>> >>> Ideally utilities for each purpose (parsing URI, reading Object
>> Storage,
>> >>> reading SQL, etc.) should each have its own utility package, so they
>> can
>> >> be
>> >>> released independently without dependency problems popping up if we
>> need
>> >> to
>> >>> break compatibility to one purpose. But more providers are
>> exponentially
>> >>> more difficult to maintain, so I’d settle for one utility provider for
>> >> now
>> >>> and split further if needed in the future.
>> >>>
>> >>> TP
>> >>>
>> >>>
>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>> Jens.Scheffler@de.bosch.com.INVALID> wrote:
>> >>>>
>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and
>> >>> validate URIs for datasets?
>> >>>>
>> >>>> Mit freundlichen Grüßen / Best regards
>> >>>>
>> >>>> Jens Scheffler
>> >>>>
>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
>> >>> GERMANY | www.bosch.com
>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>> >>> Jens.Scheffler@de.bosch.com
>> >>>>
>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>> Markus
>> >>> Forschner,
>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
>> >>>> To: dev@airflow.apache.org
>> >>>> Subject: Re: [DISCUSS] Common.util provider?
>> >>>>
>> >>>> Yep. It could work with symbolic links. Tested it and with flit -
>> both
>> >>> wheel and sdist packaged code such symbolically linked file is
>> >> dereferenced
>> >>> and copy of the file is added there. It could be a nice way of doing
>> it.
>> >>>>
>> >>>> Maybe then worth trying next time if someone has a need?
>> >>>>
>> >>>> J
>> >>>>
>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>> >>>>
>> >>>>>>>> As of additional dependency complexity between providers actually
>> >>>>>>>> the
>> >>>>> additional dependency I think creates more problems than the
>> benefit…
>> >>>>> would be cool if there would be an option to „inline“ common code
>> from
>> >>>>> a single place but keep individual providers fully independent…
>> >>>>>
>> >>>>>> Well, we already  do a lot of inlining, so if we think we should do
>> >>>>>> more,
>> >>>>> we have mechanisms for that. We have  pre-commits and release
>> commands
>> >>>>> that do a lot of that. Pre commits are inlining scripts in
>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>> files
>> >>>>> and changelogs and index documentation .rst (partially) are
>> generated
>> >>>>> at release documentation preparation time, pyproject.toml for
>> >>>>> providers are generated from common templates at package building
>> time
>> >>>>> and so on and so on :). So we can do more of that and generate
>> common
>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>> But
>> >>>>> again "can't have and eat cake" - this has the drawback that there
>> are
>> >>>>> extra steps involved and even if it's automated it does add friction
>> >>>>> when you have to regenerate the code every time you change it and
>> when
>> >>>>> you change it in another place than where you use it.
>> >>>>>
>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we
>> really
>> >>>>> in-line and have a pre-commit aligning the redundancy of python in
>> >>> folders.
>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
>> common
>> >>>>> stuff and if we change a common line we probably do not need to
>> affect
>> >>>>> all providers.
>> >>>>>
>> >>>>> As long as no Windows users trying to code for airflow (do we need
>> to
>> >>>>> consider?) would it also work to use symlinks? Git can cope with
>> this,
>> >>>>> I don't know if the python toolchain can de-reference a copy and are
>> >>>>> not packaging a symlink? Would be worth a test... would save the
>> >>>>> pre-commit and we even could selectively include common bla into
>> >>>>> providers as needed :-D
>> >>>>>
>> >>>>> Mit freundlichen Grüßen / Best regards
>> >>>>>
>> >>>>> Jens Scheffler
>> >>>>>
>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>> >>>>> Jens.Scheffler@de.bosch.com
>> >>>>>
>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer,
>> Dr.
>> >>>>> Tanja Rückert
>> >>>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>> >>>>> To: dev@airflow.apache.org
>> >>>>> Subject: Re: [DISCUSS] Common.util provider?
>> >>>>>
>> >>>>>> if we have a common piece then we are locking all depending
>> >>>>>> providers
>> >>>>> (potentially) together if common code changes
>> >>>>>
>> >>>>> Yes, coupling in this case is the drawback of this solution. You
>> can't
>> >>>>> have cake and eat it too and in this case you trade DRY with
>> coupling.
>> >>>>>
>> >>>>>> As of additional dependency complexity between providers actually
>> >>>>>> the
>> >>>>> additional dependency I think creates more problems than the
>> benefit…
>> >>>>> would be cool if there would be an option to „inline“ common code
>> from
>> >>>>> a single place but keep individual providers fully independent…
>> >>>>>
>> >>>>> Well, we already  do a lot of inlining, so if we think we should do
>> >>>>> more, we have mechanisms for that. We have  pre-commits and release
>> >>>>> commands that do a lot of that. Pre commits are inlining scripts in
>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py
>> files
>> >>>>> and changelogs and index documentation .rst (partially) are
>> generated
>> >>>>> at release documentation preparation time, pyproject.toml for
>> >>>>> providers are generated from common templates at package building
>> time
>> >>>>> and so on and so on :). So we can do more of that and generate
>> common
>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts.
>> But
>> >>>>> again "can't have and eat cake" - this has the drawback that there
>> are
>> >>>>> extra steps involved and even if it's automated it does add friction
>> >>>>> when you have to regenerate the code every time you change it and
>> when
>> >>>>> you change it in another place than where you use it.
>> >>>>>
>> >>>>> J.
>> >>>>>
>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>> >>>>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>> >>>>>
>> >>>>>> Hi Jarek,
>> >>>>>>
>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
>> >>>>>> that a lot of very similar code is repeated in all the providers.
>> >>>>>> But during review yesterday I dropped the ides because if we have a
>> >>>>>> common piece then we are locking all depending providers
>> >>>>>> (potentially) together if common code changes.
>> >>>>>> As of additional dependency complexity between providers actually
>> >>>>>> the additional dependency I think creates more prblems than the
>> >>>>>> benefit… would be cool if tehere would be an option to „inline“
>> >>>>>> common code from a single place but keep individual providers fully
>> >>>>>> independent…
>> >>>>>>
>> >>>>>> Jens
>> >>>>>>
>> >>>>>> Sent from Outlook for
>> >>>>>> iOS<
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>> >>>>>> 2F
>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
>> %7C98c88
>> >>>>>> 97
>> >>>>>>
>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>> >>>>>> %7
>> >>>>>>
>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>> >>>>>> Ij
>> >>>>>>
>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>> >>>>>> ________________________________
>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>> >>>>>> To: dev@airflow.apache.org <de...@airflow.apache.org>
>> >>>>>> Subject: [DISCUSS] Common.util provider?
>> >>>>>>
>> >>>>>> Hello everyone,
>> >>>>>>
>> >>>>>> How do we feel about introducing a common.util provider?
>> >>>>>>
>> >>>>>> I know it's not been the original idea behind providers, but -
>> after
>> >>>>>> testing common.sql and now also having common.io, seems like more
>> >>>>>> and more we would like to extract some common code that we would
>> >>>>>> like providers to use, but we refrain from it, because it will only
>> >>>>>> be actually usable 6 months after we introduce some common code.
>> >>>>>>
>> >>>>>> However, if we introduce common.util, this problem is generally
>> gone
>> >>>>>> - at the expense of more complex maintenance and cross-provider
>> >>>>> dependencies.
>> >>>>>> We should be able to add a common util method and use it in a
>> >>>>>> provider at the same time.
>> >>>>>>
>> >>>>>> Think Amazon provider using a new feature released in common.util
>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>> >>>>>> already for common.sql. We know how to do it, we know what to
>> avoid,
>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
>> >>>>>> have to be very clear what is and what is not a public API there,
>> We
>> >>>>>> could rather easily add tests to prevent such
>> backwards-incompatible
>> >>>>>> changes. We even have a solution for chicken-egg providers where we
>> >>>>>> need to release two providers at the same time if they depend on
>> >>>>>> each other. Generally speaking it's quite workable but adds a bit
>> of
>> >>> overhead.
>> >>>>>>
>> >>>>>> Examples that we could implement as "common.util":
>> >>>>>>
>> >>>>>> - common versioning check with cache - where multiple providers
>> >>>>>> could reuse "do we have pendulum 2"
>> >>>>>> - more complex - some date management features (we have a few like
>> >>>>>> date_ranges/round_time). But there are many more.
>> >>>>>>
>> >>>>>> I generally do not love the common "util" approach. It has a
>> >>>>>> tendency to become a bag of everything over time. but if we limit
>> it
>> >>>>>> to a set of small, fully decoupled modules where each module is
>> >>>>>> independent - it's OK. And we already have it in "airflow.util" and
>> >>>>>> we seem to be
>> >>>>> doing well.
>> >>>>>>
>> >>>>>> WDYT? Is it worth it ?
>> >>>>>>
>> >>>>>> J.
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>> >>>> For additional commands, e-mail: dev-help@airflow.apache.org
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>> For additional commands, e-mail: dev-help@airflow.apache.org
>>
>>

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
> The symbolic link approach seems to disregard all the external providers,
unless I misunderstand it.

Not really. It just does not make it easy for the external providers to use
it "fast".  They can still - if they want to just manually copy those utils
from the latest version of Airflow if they want to use it. Almost by
definition, those will be small, independent modules that can be simply
copied as needed by whoever releases external providers - and they are also
free to copy any older version if they want. That is a nice feature that
makes them fully decoupled from the version of Airflow they are installed
in (same as community providers). Or - if they want they can just import
them from "airflow.provider_utils" - but then they have to add >= Airflow
2.9 if that util appeared in Airflow 2.9 (which is the main reason we want
to use symbolic links - because due to our policies and promises, we do not
want community providers to depend on latest version of Airflow in vast
majority of cases.

So this approach is also fully usable by external providers, but it
requires some manual effort to copy the modules to their providers.

> I like the polypill idea. A backport provider that brings new interfaces
to providers without the actual functionalities.

I would love to hear more about this, I think the "common.util" was exactly
the kind of polyfill approach (with its own versioning complexities) but
maybe I do not understand how such a polyfill provider would work. Say we
want to add a new "urlparse" method usable for all providers. Could you
explain how it would work - say:

* we add "urlparse" in Airflow 2.9
* some provider wants to use it in Airflow 2.7

What providers, with what code/interfaces we would have to release in this
case and what dependencies such providers that want to use it (both
community and Airflow should have)? I **think** that would mean exactly the
"common.<something>" approach we already have with "io" and "sql", but
maybe I do not understand it :)

On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <tp...@astronomer.io.invalid>
wrote:

> I like the polypill idea. A backport provider that brings new interfaces
> to providers without the actual functionalities.
>
>
> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mo...@apache.org>
> wrote:
> >
> >> That's why I generally do
> > not like the "util" approach because common packaging introduces
> > unnecessary coupling (you have to upgrade independent utils together).
> >
> > From my experience with releasing OpenLineage where we do things
> similarly:
> > I think that's the advantage of it, but only _if_ you can release those
> > together.
> > With "build-in" providers it makes sense, but could be burdensome if
> > "external"
> > ones would depend on that functionality.
> >
> >> I know it's not been the original idea behind providers, but - after
> > testing common.sql and now also having common.io, seems like more and
> more
> > we would like to extract some common code that we would like providers to
> > use, but we refrain from it, because it will only be actually usable 6
> > months after we introduce some common code.
> >
> > So, maybe better approach would be to introduce the functionality into
> > core,
> > and use common.X provider as "polyfill" (to borrow JS nomenclature)
> > to make sure providers could use that functionality now, where external
> > ones could depend on the Airflow ones?
> >
> > The symbolic link approach seems to disregard all the external providers,
> > unless
> > I misunderstand it.
> >
> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
> >
> >>> Ideally utilities for each purpose (parsing URI, reading Object
> Storage,
> >> reading SQL, etc.) should each have its own utility package, so they
> can be
> >> released independently without dependency problems popping up if we
> need to
> >> break compatibility to one purpose. But more providers are exponentially
> >> more difficult to maintain, so I’d settle for one utility provider for
> now
> >> and split further if needed in the future.
> >>
> >> Very much agree with this general statement. That's why I generally do
> >> not like the "util" approach because common packaging introduces
> >> unnecessary coupling (you have to upgrade independent utils together).
> And
> >> when we have a common set of things that seem to make sense to be
> released
> >> together when upgraded we should package them together in
> >> "common.<something concrete" (like we have with common.io and
> common.sql).
> >>
> >> However - in this case, I think what Jens proposed (and I am happy to
> try
> >> as well) is to attempt to use symbolic links - i.e. add the code in
> >> `airflow.util` but then create a symbolic link in the provider.  I
> tested
> >> it yesterday and it works as expected - i.e. such symbolic link is
> >> dereferenced and the provider package contains the python file, not
> >> symbolic link. That seems like a much more lightweight approach that
> will
> >> serve the purpose of "common.util" much better. The only thing we will
> have
> >> to take care of (and we can add it once the POC is successful) is to add
> >> some pre-commit protection that those kind of symbolically linked util
> >> modules are imported in providers, from inside of those provider, not
> from
> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned -
> not
> >> depend on anything in airflow code). We could create a new package for
> that
> >> in airlfow
> >> "airlfow.provider_utils" for example - and make sure (as next step) that
> >> anything from that package is never directly imported by any provider,
> and
> >> whenever provider uses it, it should be symbolic link inside that
> provider.
> >> That's all automatable and we can prevent mistakes via pre-commit.
> >>
> >> I think that might lead to a very lightweight approach where we
> introduce
> >> new common functionality which is immediately reusable in providers
> without
> >> the hassle of taking care about backwards compatibility, and managing
> the
> >> "common.util" provider. At the expense of a bit complex pre-commit that
> >> will guard the usage of it.
> >>
> >> Seems that it might be the "Eat cake and have it too" way that we've
> been
> >> looking for.
> >>
> >> J.
> >>
> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <tp@astronomer.io.invalid
> >
> >> wrote:
> >>
> >>> It would help in the sense mentioned in previous posts, yes. But one
> >> thing
> >>> I want to point out is, for the provider to actually be helpful, it
> must
> >> be
> >>> treated a bit differently from normal providers, but more like a
> separate
> >>> third-party dependency. Specifically, the provider should not have a
> >>> dependency to Core Airflow, so it can be released and depended on more
> >>> flexibly.
> >>>
> >>> Ideally utilities for each purpose (parsing URI, reading Object
> Storage,
> >>> reading SQL, etc.) should each have its own utility package, so they
> can
> >> be
> >>> released independently without dependency problems popping up if we
> need
> >> to
> >>> break compatibility to one purpose. But more providers are
> exponentially
> >>> more difficult to maintain, so I’d settle for one utility provider for
> >> now
> >>> and split further if needed in the future.
> >>>
> >>> TP
> >>>
> >>>
> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
> >>> Jens.Scheffler@de.bosch.com.INVALID> wrote:
> >>>>
> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and
> >>> validate URIs for datasets?
> >>>>
> >>>> Mit freundlichen Grüßen / Best regards
> >>>>
> >>>> Jens Scheffler
> >>>>
> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> >>> GERMANY | www.bosch.com
> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> >>> Jens.Scheffler@de.bosch.com
> >>>>
> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
> Markus
> >>> Forschner,
> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> >>>>
> >>>> -----Original Message-----
> >>>> From: Jarek Potiuk <ja...@potiuk.com>
> >>>> Sent: Donnerstag, 22. Februar 2024 00:53
> >>>> To: dev@airflow.apache.org
> >>>> Subject: Re: [DISCUSS] Common.util provider?
> >>>>
> >>>> Yep. It could work with symbolic links. Tested it and with flit - both
> >>> wheel and sdist packaged code such symbolically linked file is
> >> dereferenced
> >>> and copy of the file is added there. It could be a nice way of doing
> it.
> >>>>
> >>>> Maybe then worth trying next time if someone has a need?
> >>>>
> >>>> J
> >>>>
> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
> >>> Jens.Scheffler@de.bosch.com.invalid> wrote:
> >>>>
> >>>>>>>> As of additional dependency complexity between providers actually
> >>>>>>>> the
> >>>>> additional dependency I think creates more problems than the benefit…
> >>>>> would be cool if there would be an option to „inline“ common code
> from
> >>>>> a single place but keep individual providers fully independent…
> >>>>>
> >>>>>> Well, we already  do a lot of inlining, so if we think we should do
> >>>>>> more,
> >>>>> we have mechanisms for that. We have  pre-commits and release
> commands
> >>>>> that do a lot of that. Pre commits are inlining scripts in
> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> >>>>> and changelogs and index documentation .rst (partially) are generated
> >>>>> at release documentation preparation time, pyproject.toml for
> >>>>> providers are generated from common templates at package building
> time
> >>>>> and so on and so on :). So we can do more of that and generate common
> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
> >>>>> again "can't have and eat cake" - this has the drawback that there
> are
> >>>>> extra steps involved and even if it's automated it does add friction
> >>>>> when you have to regenerate the code every time you change it and
> when
> >>>>> you change it in another place than where you use it.
> >>>>>
> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we really
> >>>>> in-line and have a pre-commit aligning the redundancy of python in
> >>> folders.
> >>>>> Might need to be an opt-in if only 10 of 85 providers are using
> common
> >>>>> stuff and if we change a common line we probably do not need to
> affect
> >>>>> all providers.
> >>>>>
> >>>>> As long as no Windows users trying to code for airflow (do we need to
> >>>>> consider?) would it also work to use symlinks? Git can cope with
> this,
> >>>>> I don't know if the python toolchain can de-reference a copy and are
> >>>>> not packaging a symlink? Would be worth a test... would save the
> >>>>> pre-commit and we even could selectively include common bla into
> >>>>> providers as needed :-D
> >>>>>
> >>>>> Mit freundlichen Grüßen / Best regards
> >>>>>
> >>>>> Jens Scheffler
> >>>>>
> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> >>>>> Jens.Scheffler@de.bosch.com
> >>>>>
> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer,
> Dr.
> >>>>> Tanja Rückert
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Jarek Potiuk <ja...@potiuk.com>
> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18
> >>>>> To: dev@airflow.apache.org
> >>>>> Subject: Re: [DISCUSS] Common.util provider?
> >>>>>
> >>>>>> if we have a common piece then we are locking all depending
> >>>>>> providers
> >>>>> (potentially) together if common code changes
> >>>>>
> >>>>> Yes, coupling in this case is the drawback of this solution. You
> can't
> >>>>> have cake and eat it too and in this case you trade DRY with
> coupling.
> >>>>>
> >>>>>> As of additional dependency complexity between providers actually
> >>>>>> the
> >>>>> additional dependency I think creates more problems than the benefit…
> >>>>> would be cool if there would be an option to „inline“ common code
> from
> >>>>> a single place but keep individual providers fully independent…
> >>>>>
> >>>>> Well, we already  do a lot of inlining, so if we think we should do
> >>>>> more, we have mechanisms for that. We have  pre-commits and release
> >>>>> commands that do a lot of that. Pre commits are inlining scripts in
> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> >>>>> and changelogs and index documentation .rst (partially) are generated
> >>>>> at release documentation preparation time, pyproject.toml for
> >>>>> providers are generated from common templates at package building
> time
> >>>>> and so on and so on :). So we can do more of that and generate common
> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
> >>>>> again "can't have and eat cake" - this has the drawback that there
> are
> >>>>> extra steps involved and even if it's automated it does add friction
> >>>>> when you have to regenerate the code every time you change it and
> when
> >>>>> you change it in another place than where you use it.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
> >>>>> Jens.Scheffler@de.bosch.com.invalid> wrote:
> >>>>>
> >>>>>> Hi Jarek,
> >>>>>>
> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
> >>>>>> that a lot of very similar code is repeated in all the providers.
> >>>>>> But during review yesterday I dropped the ides because if we have a
> >>>>>> common piece then we are locking all depending providers
> >>>>>> (potentially) together if common code changes.
> >>>>>> As of additional dependency complexity between providers actually
> >>>>>> the additional dependency I think creates more prblems than the
> >>>>>> benefit… would be cool if tehere would be an option to „inline“
> >>>>>> common code from a single place but keep individual providers fully
> >>>>>> independent…
> >>>>>>
> >>>>>> Jens
> >>>>>>
> >>>>>> Sent from Outlook for
> >>>>>> iOS<
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>>>>> 2F
> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com
> %7C98c88
> >>>>>> 97
> >>>>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
> >>>>>> %7
> >>>>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> >>>>>> Ij
> >>>>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> >>>>>> ________________________________
> >>>>>> From: Jarek Potiuk <ja...@potiuk.com>
> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
> >>>>>> To: dev@airflow.apache.org <de...@airflow.apache.org>
> >>>>>> Subject: [DISCUSS] Common.util provider?
> >>>>>>
> >>>>>> Hello everyone,
> >>>>>>
> >>>>>> How do we feel about introducing a common.util provider?
> >>>>>>
> >>>>>> I know it's not been the original idea behind providers, but - after
> >>>>>> testing common.sql and now also having common.io, seems like more
> >>>>>> and more we would like to extract some common code that we would
> >>>>>> like providers to use, but we refrain from it, because it will only
> >>>>>> be actually usable 6 months after we introduce some common code.
> >>>>>>
> >>>>>> However, if we introduce common.util, this problem is generally gone
> >>>>>> - at the expense of more complex maintenance and cross-provider
> >>>>> dependencies.
> >>>>>> We should be able to add a common util method and use it in a
> >>>>>> provider at the same time.
> >>>>>>
> >>>>>> Think Amazon provider using a new feature released in common.util
> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
> >>>>>> already for common.sql. We know how to do it, we know what to avoid,
> >>>>>> we know we cannot introduce backwards-incompatible changes, so we
> >>>>>> have to be very clear what is and what is not a public API there, We
> >>>>>> could rather easily add tests to prevent such backwards-incompatible
> >>>>>> changes. We even have a solution for chicken-egg providers where we
> >>>>>> need to release two providers at the same time if they depend on
> >>>>>> each other. Generally speaking it's quite workable but adds a bit of
> >>> overhead.
> >>>>>>
> >>>>>> Examples that we could implement as "common.util":
> >>>>>>
> >>>>>> - common versioning check with cache - where multiple providers
> >>>>>> could reuse "do we have pendulum 2"
> >>>>>> - more complex - some date management features (we have a few like
> >>>>>> date_ranges/round_time). But there are many more.
> >>>>>>
> >>>>>> I generally do not love the common "util" approach. It has a
> >>>>>> tendency to become a bag of everything over time. but if we limit it
> >>>>>> to a set of small, fully decoupled modules where each module is
> >>>>>> independent - it's OK. And we already have it in "airflow.util" and
> >>>>>> we seem to be
> >>>>> doing well.
> >>>>>>
> >>>>>> WDYT? Is it worth it ?
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> >>>> For additional commands, e-mail: dev-help@airflow.apache.org
> >>>
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org
>
>

Re: [DISCUSS] Common.util provider?

Posted by Tzu-ping Chung <tp...@astronomer.io.INVALID>.
I like the polypill idea. A backport provider that brings new interfaces to providers without the actual functionalities.


> On 22 Feb 2024, at 20:41, Maciej Obuchowski <mo...@apache.org> wrote:
> 
>> That's why I generally do
> not like the "util" approach because common packaging introduces
> unnecessary coupling (you have to upgrade independent utils together).
> 
> From my experience with releasing OpenLineage where we do things similarly:
> I think that's the advantage of it, but only _if_ you can release those
> together.
> With "build-in" providers it makes sense, but could be burdensome if
> "external"
> ones would depend on that functionality.
> 
>> I know it's not been the original idea behind providers, but - after
> testing common.sql and now also having common.io, seems like more and more
> we would like to extract some common code that we would like providers to
> use, but we refrain from it, because it will only be actually usable 6
> months after we introduce some common code.
> 
> So, maybe better approach would be to introduce the functionality into
> core,
> and use common.X provider as "polyfill" (to borrow JS nomenclature)
> to make sure providers could use that functionality now, where external
> ones could depend on the Airflow ones?
> 
> The symbolic link approach seems to disregard all the external providers,
> unless
> I misunderstand it.
> 
> czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
> 
>>> Ideally utilities for each purpose (parsing URI, reading Object Storage,
>> reading SQL, etc.) should each have its own utility package, so they can be
>> released independently without dependency problems popping up if we need to
>> break compatibility to one purpose. But more providers are exponentially
>> more difficult to maintain, so I’d settle for one utility provider for now
>> and split further if needed in the future.
>> 
>> Very much agree with this general statement. That's why I generally do
>> not like the "util" approach because common packaging introduces
>> unnecessary coupling (you have to upgrade independent utils together). And
>> when we have a common set of things that seem to make sense to be released
>> together when upgraded we should package them together in
>> "common.<something concrete" (like we have with common.io and common.sql).
>> 
>> However - in this case, I think what Jens proposed (and I am happy to try
>> as well) is to attempt to use symbolic links - i.e. add the code in
>> `airflow.util` but then create a symbolic link in the provider.  I tested
>> it yesterday and it works as expected - i.e. such symbolic link is
>> dereferenced and the provider package contains the python file, not
>> symbolic link. That seems like a much more lightweight approach that will
>> serve the purpose of "common.util" much better. The only thing we will have
>> to take care of (and we can add it once the POC is successful) is to add
>> some pre-commit protection that those kind of symbolically linked util
>> modules are imported in providers, from inside of those provider, not from
>> airlfow, and make sure they are "standalone" (i.e. - as you mentioned - not
>> depend on anything in airflow code). We could create a new package for that
>> in airlfow
>> "airlfow.provider_utils" for example - and make sure (as next step) that
>> anything from that package is never directly imported by any provider, and
>> whenever provider uses it, it should be symbolic link inside that provider.
>> That's all automatable and we can prevent mistakes via pre-commit.
>> 
>> I think that might lead to a very lightweight approach where we introduce
>> new common functionality which is immediately reusable in providers without
>> the hassle of taking care about backwards compatibility, and managing the
>> "common.util" provider. At the expense of a bit complex pre-commit that
>> will guard the usage of it.
>> 
>> Seems that it might be the "Eat cake and have it too" way that we've been
>> looking for.
>> 
>> J.
>> 
>> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <tp...@astronomer.io.invalid>
>> wrote:
>> 
>>> It would help in the sense mentioned in previous posts, yes. But one
>> thing
>>> I want to point out is, for the provider to actually be helpful, it must
>> be
>>> treated a bit differently from normal providers, but more like a separate
>>> third-party dependency. Specifically, the provider should not have a
>>> dependency to Core Airflow, so it can be released and depended on more
>>> flexibly.
>>> 
>>> Ideally utilities for each purpose (parsing URI, reading Object Storage,
>>> reading SQL, etc.) should each have its own utility package, so they can
>> be
>>> released independently without dependency problems popping up if we need
>> to
>>> break compatibility to one purpose. But more providers are exponentially
>>> more difficult to maintain, so I’d settle for one utility provider for
>> now
>>> and split further if needed in the future.
>>> 
>>> TP
>>> 
>>> 
>>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> Jens.Scheffler@de.bosch.com.INVALID> wrote:
>>>> 
>>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and
>>> validate URIs for datasets?
>>>> 
>>>> Mit freundlichen Grüßen / Best regards
>>>> 
>>>> Jens Scheffler
>>>> 
>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
>>> GERMANY | www.bosch.com
>>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> Jens.Scheffler@de.bosch.com
>>>> 
>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
>>> Forschner,
>>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>>>> 
>>>> -----Original Message-----
>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>> Sent: Donnerstag, 22. Februar 2024 00:53
>>>> To: dev@airflow.apache.org
>>>> Subject: Re: [DISCUSS] Common.util provider?
>>>> 
>>>> Yep. It could work with symbolic links. Tested it and with flit - both
>>> wheel and sdist packaged code such symbolically linked file is
>> dereferenced
>>> and copy of the file is added there. It could be a nice way of doing it.
>>>> 
>>>> Maybe then worth trying next time if someone has a need?
>>>> 
>>>> J
>>>> 
>>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>>>> 
>>>>>>>> As of additional dependency complexity between providers actually
>>>>>>>> the
>>>>> additional dependency I think creates more problems than the benefit…
>>>>> would be cool if there would be an option to „inline“ common code from
>>>>> a single place but keep individual providers fully independent…
>>>>> 
>>>>>> Well, we already  do a lot of inlining, so if we think we should do
>>>>>> more,
>>>>> we have mechanisms for that. We have  pre-commits and release commands
>>>>> that do a lot of that. Pre commits are inlining scripts in
>>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
>>>>> and changelogs and index documentation .rst (partially) are generated
>>>>> at release documentation preparation time, pyproject.toml for
>>>>> providers are generated from common templates at package building time
>>>>> and so on and so on :). So we can do more of that and generate common
>>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
>>>>> again "can't have and eat cake" - this has the drawback that there are
>>>>> extra steps involved and even if it's automated it does add friction
>>>>> when you have to regenerate the code every time you change it and when
>>>>> you change it in another place than where you use it.
>>>>> 
>>>>> Yes, also thought a moment about pre-commit. I#d be okay if we really
>>>>> in-line and have a pre-commit aligning the redundancy of python in
>>> folders.
>>>>> Might need to be an opt-in if only 10 of 85 providers are using common
>>>>> stuff and if we change a common line we probably do not need to affect
>>>>> all providers.
>>>>> 
>>>>> As long as no Windows users trying to code for airflow (do we need to
>>>>> consider?) would it also work to use symlinks? Git can cope with this,
>>>>> I don't know if the python toolchain can de-reference a copy and are
>>>>> not packaging a symlink? Would be worth a test... would save the
>>>>> pre-commit and we even could selectively include common bla into
>>>>> providers as needed :-D
>>>>> 
>>>>> Mit freundlichen Grüßen / Best regards
>>>>> 
>>>>> Jens Scheffler
>>>>> 
>>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>>>> Jens.Scheffler@de.bosch.com
>>>>> 
>>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr.
>>>>> Tanja Rückert
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>>>>> To: dev@airflow.apache.org
>>>>> Subject: Re: [DISCUSS] Common.util provider?
>>>>> 
>>>>>> if we have a common piece then we are locking all depending
>>>>>> providers
>>>>> (potentially) together if common code changes
>>>>> 
>>>>> Yes, coupling in this case is the drawback of this solution. You can't
>>>>> have cake and eat it too and in this case you trade DRY with coupling.
>>>>> 
>>>>>> As of additional dependency complexity between providers actually
>>>>>> the
>>>>> additional dependency I think creates more problems than the benefit…
>>>>> would be cool if there would be an option to „inline“ common code from
>>>>> a single place but keep individual providers fully independent…
>>>>> 
>>>>> Well, we already  do a lot of inlining, so if we think we should do
>>>>> more, we have mechanisms for that. We have  pre-commits and release
>>>>> commands that do a lot of that. Pre commits are inlining scripts in
>>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
>>>>> and changelogs and index documentation .rst (partially) are generated
>>>>> at release documentation preparation time, pyproject.toml for
>>>>> providers are generated from common templates at package building time
>>>>> and so on and so on :). So we can do more of that and generate common
>>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
>>>>> again "can't have and eat cake" - this has the drawback that there are
>>>>> extra steps involved and even if it's automated it does add friction
>>>>> when you have to regenerate the code every time you change it and when
>>>>> you change it in another place than where you use it.
>>>>> 
>>>>> J.
>>>>> 
>>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>>>>> 
>>>>>> Hi Jarek,
>>>>>> 
>>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
>>>>>> that a lot of very similar code is repeated in all the providers.
>>>>>> But during review yesterday I dropped the ides because if we have a
>>>>>> common piece then we are locking all depending providers
>>>>>> (potentially) together if common code changes.
>>>>>> As of additional dependency complexity between providers actually
>>>>>> the additional dependency I think creates more prblems than the
>>>>>> benefit… would be cool if tehere would be an option to „inline“
>>>>>> common code from a single place but keep individual providers fully
>>>>>> independent…
>>>>>> 
>>>>>> Jens
>>>>>> 
>>>>>> Sent from Outlook for
>>>>>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>>>>> 2F
>>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
>>>>>> 97
>>>>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>>>>> %7
>>>>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>>>>> Ij
>>>>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>>>>> ________________________________
>>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>>>>> To: dev@airflow.apache.org <de...@airflow.apache.org>
>>>>>> Subject: [DISCUSS] Common.util provider?
>>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> How do we feel about introducing a common.util provider?
>>>>>> 
>>>>>> I know it's not been the original idea behind providers, but - after
>>>>>> testing common.sql and now also having common.io, seems like more
>>>>>> and more we would like to extract some common code that we would
>>>>>> like providers to use, but we refrain from it, because it will only
>>>>>> be actually usable 6 months after we introduce some common code.
>>>>>> 
>>>>>> However, if we introduce common.util, this problem is generally gone
>>>>>> - at the expense of more complex maintenance and cross-provider
>>>>> dependencies.
>>>>>> We should be able to add a common util method and use it in a
>>>>>> provider at the same time.
>>>>>> 
>>>>>> Think Amazon provider using a new feature released in common.util
>>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>>>>> already for common.sql. We know how to do it, we know what to avoid,
>>>>>> we know we cannot introduce backwards-incompatible changes, so we
>>>>>> have to be very clear what is and what is not a public API there, We
>>>>>> could rather easily add tests to prevent such backwards-incompatible
>>>>>> changes. We even have a solution for chicken-egg providers where we
>>>>>> need to release two providers at the same time if they depend on
>>>>>> each other. Generally speaking it's quite workable but adds a bit of
>>> overhead.
>>>>>> 
>>>>>> Examples that we could implement as "common.util":
>>>>>> 
>>>>>> - common versioning check with cache - where multiple providers
>>>>>> could reuse "do we have pendulum 2"
>>>>>> - more complex - some date management features (we have a few like
>>>>>> date_ranges/round_time). But there are many more.
>>>>>> 
>>>>>> I generally do not love the common "util" approach. It has a
>>>>>> tendency to become a bag of everything over time. but if we limit it
>>>>>> to a set of small, fully decoupled modules where each module is
>>>>>> independent - it's OK. And we already have it in "airflow.util" and
>>>>>> we seem to be
>>>>> doing well.
>>>>>> 
>>>>>> WDYT? Is it worth it ?
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>>>> For additional commands, e-mail: dev-help@airflow.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
For additional commands, e-mail: dev-help@airflow.apache.org


Re: [DISCUSS] Common.util provider?

Posted by Maciej Obuchowski <mo...@apache.org>.
>That's why I generally do
not like the "util" approach because common packaging introduces
unnecessary coupling (you have to upgrade independent utils together).

From my experience with releasing OpenLineage where we do things similarly:
I think that's the advantage of it, but only _if_ you can release those
together.
With "build-in" providers it makes sense, but could be burdensome if
"external"
ones would depend on that functionality.

>I know it's not been the original idea behind providers, but - after
testing common.sql and now also having common.io, seems like more and more
we would like to extract some common code that we would like providers to
use, but we refrain from it, because it will only be actually usable 6
months after we introduce some common code.

So, maybe better approach would be to introduce the functionality into
core,
and use common.X provider as "polyfill" (to borrow JS nomenclature)
to make sure providers could use that functionality now, where external
ones could depend on the Airflow ones?

The symbolic link approach seems to disregard all the external providers,
unless
I misunderstand it.

czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):

> > Ideally utilities for each purpose (parsing URI, reading Object Storage,
> reading SQL, etc.) should each have its own utility package, so they can be
> released independently without dependency problems popping up if we need to
> break compatibility to one purpose. But more providers are exponentially
> more difficult to maintain, so I’d settle for one utility provider for now
> and split further if needed in the future.
>
> Very much agree with this general statement. That's why I generally do
> not like the "util" approach because common packaging introduces
> unnecessary coupling (you have to upgrade independent utils together). And
> when we have a common set of things that seem to make sense to be released
> together when upgraded we should package them together in
> "common.<something concrete" (like we have with common.io and common.sql).
>
> However - in this case, I think what Jens proposed (and I am happy to try
> as well) is to attempt to use symbolic links - i.e. add the code in
> `airflow.util` but then create a symbolic link in the provider.  I tested
> it yesterday and it works as expected - i.e. such symbolic link is
> dereferenced and the provider package contains the python file, not
> symbolic link. That seems like a much more lightweight approach that will
> serve the purpose of "common.util" much better. The only thing we will have
> to take care of (and we can add it once the POC is successful) is to add
> some pre-commit protection that those kind of symbolically linked util
> modules are imported in providers, from inside of those provider, not from
> airlfow, and make sure they are "standalone" (i.e. - as you mentioned - not
> depend on anything in airflow code). We could create a new package for that
> in airlfow
> "airlfow.provider_utils" for example - and make sure (as next step) that
> anything from that package is never directly imported by any provider, and
> whenever provider uses it, it should be symbolic link inside that provider.
> That's all automatable and we can prevent mistakes via pre-commit.
>
> I think that might lead to a very lightweight approach where we introduce
> new common functionality which is immediately reusable in providers without
> the hassle of taking care about backwards compatibility, and managing the
> "common.util" provider. At the expense of a bit complex pre-commit that
> will guard the usage of it.
>
> Seems that it might be the "Eat cake and have it too" way that we've been
> looking for.
>
> J.
>
> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <tp...@astronomer.io.invalid>
> wrote:
>
> > It would help in the sense mentioned in previous posts, yes. But one
> thing
> > I want to point out is, for the provider to actually be helpful, it must
> be
> > treated a bit differently from normal providers, but more like a separate
> > third-party dependency. Specifically, the provider should not have a
> > dependency to Core Airflow, so it can be released and depended on more
> > flexibly.
> >
> > Ideally utilities for each purpose (parsing URI, reading Object Storage,
> > reading SQL, etc.) should each have its own utility package, so they can
> be
> > released independently without dependency problems popping up if we need
> to
> > break compatibility to one purpose. But more providers are exponentially
> > more difficult to maintain, so I’d settle for one utility provider for
> now
> > and split further if needed in the future.
> >
> > TP
> >
> >
> > > On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
> > Jens.Scheffler@de.bosch.com.INVALID> wrote:
> > >
> > > @Uranusjr would this help as a pilot in your AIP-60 code to parse and
> > validate URIs for datasets?
> > >
> > > Mit freundlichen Grüßen / Best regards
> > >
> > > Jens Scheffler
> > >
> > > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > GERMANY | www.bosch.com
> > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > Jens.Scheffler@de.bosch.com
> > >
> > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> > Forschner,
> > > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> > >
> > > -----Original Message-----
> > > From: Jarek Potiuk <ja...@potiuk.com>
> > > Sent: Donnerstag, 22. Februar 2024 00:53
> > > To: dev@airflow.apache.org
> > > Subject: Re: [DISCUSS] Common.util provider?
> > >
> > > Yep. It could work with symbolic links. Tested it and with flit - both
> > wheel and sdist packaged code such symbolically linked file is
> dereferenced
> > and copy of the file is added there. It could be a nice way of doing it.
> > >
> > > Maybe then worth trying next time if someone has a need?
> > >
> > > J
> > >
> > > On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
> > Jens.Scheffler@de.bosch.com.invalid> wrote:
> > >
> > >>>>> As of additional dependency complexity between providers actually
> > >>>>> the
> > >> additional dependency I think creates more problems than the benefit…
> > >> would be cool if there would be an option to „inline“ common code from
> > >> a single place but keep individual providers fully independent…
> > >>
> > >>> Well, we already  do a lot of inlining, so if we think we should do
> > >>> more,
> > >> we have mechanisms for that. We have  pre-commits and release commands
> > >> that do a lot of that. Pre commits are inlining scripts in
> > >> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> > >> and changelogs and index documentation .rst (partially) are generated
> > >> at release documentation preparation time, pyproject.toml for
> > >> providers are generated from common templates at package building time
> > >> and so on and so on :). So we can do more of that and generate common
> > >> code, it's just a matter of adding pre-commits or breeze scripts. But
> > >> again "can't have and eat cake" - this has the drawback that there are
> > >> extra steps involved and even if it's automated it does add friction
> > >> when you have to regenerate the code every time you change it and when
> > >> you change it in another place than where you use it.
> > >>
> > >> Yes, also thought a moment about pre-commit. I#d be okay if we really
> > >> in-line and have a pre-commit aligning the redundancy of python in
> > folders.
> > >> Might need to be an opt-in if only 10 of 85 providers are using common
> > >> stuff and if we change a common line we probably do not need to affect
> > >> all providers.
> > >>
> > >> As long as no Windows users trying to code for airflow (do we need to
> > >> consider?) would it also work to use symlinks? Git can cope with this,
> > >> I don't know if the python toolchain can de-reference a copy and are
> > >> not packaging a symlink? Would be worth a test... would save the
> > >> pre-commit and we even could selectively include common bla into
> > >> providers as needed :-D
> > >>
> > >> Mit freundlichen Grüßen / Best regards
> > >>
> > >> Jens Scheffler
> > >>
> > >> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
> > >> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
> > >> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > >> Jens.Scheffler@de.bosch.com
> > >>
> > >> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > >> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > >> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
> > >> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr.
> > >> Tanja Rückert
> > >>
> > >> -----Original Message-----
> > >> From: Jarek Potiuk <ja...@potiuk.com>
> > >> Sent: Mittwoch, 21. Februar 2024 21:18
> > >> To: dev@airflow.apache.org
> > >> Subject: Re: [DISCUSS] Common.util provider?
> > >>
> > >>> if we have a common piece then we are locking all depending
> > >>> providers
> > >> (potentially) together if common code changes
> > >>
> > >> Yes, coupling in this case is the drawback of this solution. You can't
> > >> have cake and eat it too and in this case you trade DRY with coupling.
> > >>
> > >>> As of additional dependency complexity between providers actually
> > >>> the
> > >> additional dependency I think creates more problems than the benefit…
> > >> would be cool if there would be an option to „inline“ common code from
> > >> a single place but keep individual providers fully independent…
> > >>
> > >> Well, we already  do a lot of inlining, so if we think we should do
> > >> more, we have mechanisms for that. We have  pre-commits and release
> > >> commands that do a lot of that. Pre commits are inlining scripts in
> > >> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> > >> and changelogs and index documentation .rst (partially) are generated
> > >> at release documentation preparation time, pyproject.toml for
> > >> providers are generated from common templates at package building time
> > >> and so on and so on :). So we can do more of that and generate common
> > >> code, it's just a matter of adding pre-commits or breeze scripts. But
> > >> again "can't have and eat cake" - this has the drawback that there are
> > >> extra steps involved and even if it's automated it does add friction
> > >> when you have to regenerate the code every time you change it and when
> > >> you change it in another place than where you use it.
> > >>
> > >> J.
> > >>
> > >> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
> > >> Jens.Scheffler@de.bosch.com.invalid> wrote:
> > >>
> > >>> Hi Jarek,
> > >>>
> > >>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
> > >>> that a lot of very similar code is repeated in all the providers.
> > >>> But during review yesterday I dropped the ides because if we have a
> > >>> common piece then we are locking all depending providers
> > >>> (potentially) together if common code changes.
> > >>> As of additional dependency complexity between providers actually
> > >>> the additional dependency I think creates more prblems than the
> > >>> benefit… would be cool if tehere would be an option to „inline“
> > >>> common code from a single place but keep individual providers fully
> > >>> independent…
> > >>>
> > >>> Jens
> > >>>
> > >>> Sent from Outlook for
> > >>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
> > >>> 2F
> > >>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
> > >>> 97
> > >>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
> > >>> %7
> > >>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> > >>> Ij
> > >>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
> > >>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> > >>> ________________________________
> > >>> From: Jarek Potiuk <ja...@potiuk.com>
> > >>> Sent: Wednesday, February 21, 2024 5:42:20 PM
> > >>> To: dev@airflow.apache.org <de...@airflow.apache.org>
> > >>> Subject: [DISCUSS] Common.util provider?
> > >>>
> > >>> Hello everyone,
> > >>>
> > >>> How do we feel about introducing a common.util provider?
> > >>>
> > >>> I know it's not been the original idea behind providers, but - after
> > >>> testing common.sql and now also having common.io, seems like more
> > >>> and more we would like to extract some common code that we would
> > >>> like providers to use, but we refrain from it, because it will only
> > >>> be actually usable 6 months after we introduce some common code.
> > >>>
> > >>> However, if we introduce common.util, this problem is generally gone
> > >>> - at the expense of more complex maintenance and cross-provider
> > >> dependencies.
> > >>> We should be able to add a common util method and use it in a
> > >>> provider at the same time.
> > >>>
> > >>> Think Amazon provider using a new feature released in common.util
> > >>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
> > >>> already for common.sql. We know how to do it, we know what to avoid,
> > >>> we know we cannot introduce backwards-incompatible changes, so we
> > >>> have to be very clear what is and what is not a public API there, We
> > >>> could rather easily add tests to prevent such backwards-incompatible
> > >>> changes. We even have a solution for chicken-egg providers where we
> > >>> need to release two providers at the same time if they depend on
> > >>> each other. Generally speaking it's quite workable but adds a bit of
> > overhead.
> > >>>
> > >>> Examples that we could implement as "common.util":
> > >>>
> > >>> - common versioning check with cache - where multiple providers
> > >>> could reuse "do we have pendulum 2"
> > >>> - more complex - some date management features (we have a few like
> > >>> date_ranges/round_time). But there are many more.
> > >>>
> > >>> I generally do not love the common "util" approach. It has a
> > >>> tendency to become a bag of everything over time. but if we limit it
> > >>> to a set of small, fully decoupled modules where each module is
> > >>> independent - it's OK. And we already have it in "airflow.util" and
> > >>> we seem to be
> > >> doing well.
> > >>>
> > >>> WDYT? Is it worth it ?
> > >>>
> > >>> J.
> > >>>
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > For additional commands, e-mail: dev-help@airflow.apache.org
> >
> >
>

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
> Ideally utilities for each purpose (parsing URI, reading Object Storage,
reading SQL, etc.) should each have its own utility package, so they can be
released independently without dependency problems popping up if we need to
break compatibility to one purpose. But more providers are exponentially
more difficult to maintain, so I’d settle for one utility provider for now
and split further if needed in the future.

Very much agree with this general statement. That's why I generally do
not like the "util" approach because common packaging introduces
unnecessary coupling (you have to upgrade independent utils together). And
when we have a common set of things that seem to make sense to be released
together when upgraded we should package them together in
"common.<something concrete" (like we have with common.io and common.sql).

However - in this case, I think what Jens proposed (and I am happy to try
as well) is to attempt to use symbolic links - i.e. add the code in
`airflow.util` but then create a symbolic link in the provider.  I tested
it yesterday and it works as expected - i.e. such symbolic link is
dereferenced and the provider package contains the python file, not
symbolic link. That seems like a much more lightweight approach that will
serve the purpose of "common.util" much better. The only thing we will have
to take care of (and we can add it once the POC is successful) is to add
some pre-commit protection that those kind of symbolically linked util
modules are imported in providers, from inside of those provider, not from
airlfow, and make sure they are "standalone" (i.e. - as you mentioned - not
depend on anything in airflow code). We could create a new package for that
in airlfow
"airlfow.provider_utils" for example - and make sure (as next step) that
anything from that package is never directly imported by any provider, and
whenever provider uses it, it should be symbolic link inside that provider.
That's all automatable and we can prevent mistakes via pre-commit.

I think that might lead to a very lightweight approach where we introduce
new common functionality which is immediately reusable in providers without
the hassle of taking care about backwards compatibility, and managing the
"common.util" provider. At the expense of a bit complex pre-commit that
will guard the usage of it.

Seems that it might be the "Eat cake and have it too" way that we've been
looking for.

J.

On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <tp...@astronomer.io.invalid>
wrote:

> It would help in the sense mentioned in previous posts, yes. But one thing
> I want to point out is, for the provider to actually be helpful, it must be
> treated a bit differently from normal providers, but more like a separate
> third-party dependency. Specifically, the provider should not have a
> dependency to Core Airflow, so it can be released and depended on more
> flexibly.
>
> Ideally utilities for each purpose (parsing URI, reading Object Storage,
> reading SQL, etc.) should each have its own utility package, so they can be
> released independently without dependency problems popping up if we need to
> break compatibility to one purpose. But more providers are exponentially
> more difficult to maintain, so I’d settle for one utility provider for now
> and split further if needed in the future.
>
> TP
>
>
> > On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
> Jens.Scheffler@de.bosch.com.INVALID> wrote:
> >
> > @Uranusjr would this help as a pilot in your AIP-60 code to parse and
> validate URIs for datasets?
> >
> > Mit freundlichen Grüßen / Best regards
> >
> > Jens Scheffler
> >
> > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> GERMANY | www.bosch.com
> > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> Jens.Scheffler@de.bosch.com
> >
> > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> Forschner,
> > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> >
> > -----Original Message-----
> > From: Jarek Potiuk <ja...@potiuk.com>
> > Sent: Donnerstag, 22. Februar 2024 00:53
> > To: dev@airflow.apache.org
> > Subject: Re: [DISCUSS] Common.util provider?
> >
> > Yep. It could work with symbolic links. Tested it and with flit - both
> wheel and sdist packaged code such symbolically linked file is dereferenced
> and copy of the file is added there. It could be a nice way of doing it.
> >
> > Maybe then worth trying next time if someone has a need?
> >
> > J
> >
> > On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
> Jens.Scheffler@de.bosch.com.invalid> wrote:
> >
> >>>>> As of additional dependency complexity between providers actually
> >>>>> the
> >> additional dependency I think creates more problems than the benefit…
> >> would be cool if there would be an option to „inline“ common code from
> >> a single place but keep individual providers fully independent…
> >>
> >>> Well, we already  do a lot of inlining, so if we think we should do
> >>> more,
> >> we have mechanisms for that. We have  pre-commits and release commands
> >> that do a lot of that. Pre commits are inlining scripts in
> >> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> >> and changelogs and index documentation .rst (partially) are generated
> >> at release documentation preparation time, pyproject.toml for
> >> providers are generated from common templates at package building time
> >> and so on and so on :). So we can do more of that and generate common
> >> code, it's just a matter of adding pre-commits or breeze scripts. But
> >> again "can't have and eat cake" - this has the drawback that there are
> >> extra steps involved and even if it's automated it does add friction
> >> when you have to regenerate the code every time you change it and when
> >> you change it in another place than where you use it.
> >>
> >> Yes, also thought a moment about pre-commit. I#d be okay if we really
> >> in-line and have a pre-commit aligning the redundancy of python in
> folders.
> >> Might need to be an opt-in if only 10 of 85 providers are using common
> >> stuff and if we change a common line we probably do not need to affect
> >> all providers.
> >>
> >> As long as no Windows users trying to code for airflow (do we need to
> >> consider?) would it also work to use symlinks? Git can cope with this,
> >> I don't know if the python toolchain can de-reference a copy and are
> >> not packaging a symlink? Would be worth a test... would save the
> >> pre-commit and we even could selectively include common bla into
> >> providers as needed :-D
> >>
> >> Mit freundlichen Grüßen / Best regards
> >>
> >> Jens Scheffler
> >>
> >> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
> >> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
> >> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> >> Jens.Scheffler@de.bosch.com
> >>
> >> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> >> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> >> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
> >> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr.
> >> Tanja Rückert
> >>
> >> -----Original Message-----
> >> From: Jarek Potiuk <ja...@potiuk.com>
> >> Sent: Mittwoch, 21. Februar 2024 21:18
> >> To: dev@airflow.apache.org
> >> Subject: Re: [DISCUSS] Common.util provider?
> >>
> >>> if we have a common piece then we are locking all depending
> >>> providers
> >> (potentially) together if common code changes
> >>
> >> Yes, coupling in this case is the drawback of this solution. You can't
> >> have cake and eat it too and in this case you trade DRY with coupling.
> >>
> >>> As of additional dependency complexity between providers actually
> >>> the
> >> additional dependency I think creates more problems than the benefit…
> >> would be cool if there would be an option to „inline“ common code from
> >> a single place but keep individual providers fully independent…
> >>
> >> Well, we already  do a lot of inlining, so if we think we should do
> >> more, we have mechanisms for that. We have  pre-commits and release
> >> commands that do a lot of that. Pre commits are inlining scripts in
> >> Dockerfiles, shortening PyPI readme . The providers __init__.py files
> >> and changelogs and index documentation .rst (partially) are generated
> >> at release documentation preparation time, pyproject.toml for
> >> providers are generated from common templates at package building time
> >> and so on and so on :). So we can do more of that and generate common
> >> code, it's just a matter of adding pre-commits or breeze scripts. But
> >> again "can't have and eat cake" - this has the drawback that there are
> >> extra steps involved and even if it's automated it does add friction
> >> when you have to regenerate the code every time you change it and when
> >> you change it in another place than where you use it.
> >>
> >> J.
> >>
> >> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
> >> Jens.Scheffler@de.bosch.com.invalid> wrote:
> >>
> >>> Hi Jarek,
> >>>
> >>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
> >>> that a lot of very similar code is repeated in all the providers.
> >>> But during review yesterday I dropped the ides because if we have a
> >>> common piece then we are locking all depending providers
> >>> (potentially) together if common code changes.
> >>> As of additional dependency complexity between providers actually
> >>> the additional dependency I think creates more prblems than the
> >>> benefit… would be cool if tehere would be an option to „inline“
> >>> common code from a single place but keep individual providers fully
> >>> independent…
> >>>
> >>> Jens
> >>>
> >>> Sent from Outlook for
> >>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>> 2F
> >>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
> >>> 97
> >>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
> >>> %7
> >>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> >>> Ij
> >>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
> >>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> >>> ________________________________
> >>> From: Jarek Potiuk <ja...@potiuk.com>
> >>> Sent: Wednesday, February 21, 2024 5:42:20 PM
> >>> To: dev@airflow.apache.org <de...@airflow.apache.org>
> >>> Subject: [DISCUSS] Common.util provider?
> >>>
> >>> Hello everyone,
> >>>
> >>> How do we feel about introducing a common.util provider?
> >>>
> >>> I know it's not been the original idea behind providers, but - after
> >>> testing common.sql and now also having common.io, seems like more
> >>> and more we would like to extract some common code that we would
> >>> like providers to use, but we refrain from it, because it will only
> >>> be actually usable 6 months after we introduce some common code.
> >>>
> >>> However, if we introduce common.util, this problem is generally gone
> >>> - at the expense of more complex maintenance and cross-provider
> >> dependencies.
> >>> We should be able to add a common util method and use it in a
> >>> provider at the same time.
> >>>
> >>> Think Amazon provider using a new feature released in common.util
> >>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
> >>> already for common.sql. We know how to do it, we know what to avoid,
> >>> we know we cannot introduce backwards-incompatible changes, so we
> >>> have to be very clear what is and what is not a public API there, We
> >>> could rather easily add tests to prevent such backwards-incompatible
> >>> changes. We even have a solution for chicken-egg providers where we
> >>> need to release two providers at the same time if they depend on
> >>> each other. Generally speaking it's quite workable but adds a bit of
> overhead.
> >>>
> >>> Examples that we could implement as "common.util":
> >>>
> >>> - common versioning check with cache - where multiple providers
> >>> could reuse "do we have pendulum 2"
> >>> - more complex - some date management features (we have a few like
> >>> date_ranges/round_time). But there are many more.
> >>>
> >>> I generally do not love the common "util" approach. It has a
> >>> tendency to become a bag of everything over time. but if we limit it
> >>> to a set of small, fully decoupled modules where each module is
> >>> independent - it's OK. And we already have it in "airflow.util" and
> >>> we seem to be
> >> doing well.
> >>>
> >>> WDYT? Is it worth it ?
> >>>
> >>> J.
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > For additional commands, e-mail: dev-help@airflow.apache.org
>
>

Re: [DISCUSS] Common.util provider?

Posted by Tzu-ping Chung <tp...@astronomer.io.INVALID>.
It would help in the sense mentioned in previous posts, yes. But one thing I want to point out is, for the provider to actually be helpful, it must be treated a bit differently from normal providers, but more like a separate third-party dependency. Specifically, the provider should not have a dependency to Core Airflow, so it can be released and depended on more flexibly.

Ideally utilities for each purpose (parsing URI, reading Object Storage, reading SQL, etc.) should each have its own utility package, so they can be released independently without dependency problems popping up if we need to break compatibility to one purpose. But more providers are exponentially more difficult to maintain, so I’d settle for one utility provider for now and split further if needed in the future.

TP


> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <Je...@de.bosch.com.INVALID> wrote:
> 
> @Uranusjr would this help as a pilot in your AIP-60 code to parse and validate URIs for datasets?
> 
> Mit freundlichen Grüßen / Best regards
> 
> Jens Scheffler
> 
> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | www.bosch.com
> Tel. +49 711 811-91508 | Mobil +49 160 90417410 | Jens.Scheffler@de.bosch.com
> 
> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; 
> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus Forschner, 
> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> 
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com> 
> Sent: Donnerstag, 22. Februar 2024 00:53
> To: dev@airflow.apache.org
> Subject: Re: [DISCUSS] Common.util provider?
> 
> Yep. It could work with symbolic links. Tested it and with flit - both wheel and sdist packaged code such symbolically linked file is dereferenced and copy of the file is added there. It could be a nice way of doing it.
> 
> Maybe then worth trying next time if someone has a need?
> 
> J
> 
> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <Je...@de.bosch.com.invalid> wrote:
> 
>>>>> As of additional dependency complexity between providers actually 
>>>>> the
>> additional dependency I think creates more problems than the benefit… 
>> would be cool if there would be an option to „inline“ common code from 
>> a single place but keep individual providers fully independent…
>> 
>>> Well, we already  do a lot of inlining, so if we think we should do 
>>> more,
>> we have mechanisms for that. We have  pre-commits and release commands 
>> that do a lot of that. Pre commits are inlining scripts in 
>> Dockerfiles, shortening PyPI readme . The providers __init__.py files 
>> and changelogs and index documentation .rst (partially) are generated 
>> at release documentation preparation time, pyproject.toml for 
>> providers are generated from common templates at package building time 
>> and so on and so on :). So we can do more of that and generate common 
>> code, it's just a matter of adding pre-commits or breeze scripts. But 
>> again "can't have and eat cake" - this has the drawback that there are 
>> extra steps involved and even if it's automated it does add friction 
>> when you have to regenerate the code every time you change it and when 
>> you change it in another place than where you use it.
>> 
>> Yes, also thought a moment about pre-commit. I#d be okay if we really 
>> in-line and have a pre-commit aligning the redundancy of python in folders.
>> Might need to be an opt-in if only 10 of 85 providers are using common 
>> stuff and if we change a common line we probably do not need to affect 
>> all providers.
>> 
>> As long as no Windows users trying to code for airflow (do we need to
>> consider?) would it also work to use symlinks? Git can cope with this, 
>> I don't know if the python toolchain can de-reference a copy and are 
>> not packaging a symlink? Would be worth a test... would save the 
>> pre-commit and we even could selectively include common bla into 
>> providers as needed :-D
>> 
>> Mit freundlichen Grüßen / Best regards
>> 
>> Jens Scheffler
>> 
>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH | 
>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | 
>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | 
>> Jens.Scheffler@de.bosch.com
>> 
>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. 
>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. 
>> Tanja Rückert
>> 
>> -----Original Message-----
>> From: Jarek Potiuk <ja...@potiuk.com>
>> Sent: Mittwoch, 21. Februar 2024 21:18
>> To: dev@airflow.apache.org
>> Subject: Re: [DISCUSS] Common.util provider?
>> 
>>> if we have a common piece then we are locking all depending 
>>> providers
>> (potentially) together if common code changes
>> 
>> Yes, coupling in this case is the drawback of this solution. You can't 
>> have cake and eat it too and in this case you trade DRY with coupling.
>> 
>>> As of additional dependency complexity between providers actually 
>>> the
>> additional dependency I think creates more problems than the benefit… 
>> would be cool if there would be an option to „inline“ common code from 
>> a single place but keep individual providers fully independent…
>> 
>> Well, we already  do a lot of inlining, so if we think we should do 
>> more, we have mechanisms for that. We have  pre-commits and release 
>> commands that do a lot of that. Pre commits are inlining scripts in 
>> Dockerfiles, shortening PyPI readme . The providers __init__.py files 
>> and changelogs and index documentation .rst (partially) are generated 
>> at release documentation preparation time, pyproject.toml for 
>> providers are generated from common templates at package building time 
>> and so on and so on :). So we can do more of that and generate common 
>> code, it's just a matter of adding pre-commits or breeze scripts. But 
>> again "can't have and eat cake" - this has the drawback that there are 
>> extra steps involved and even if it's automated it does add friction 
>> when you have to regenerate the code every time you change it and when 
>> you change it in another place than where you use it.
>> 
>> J.
>> 
>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < 
>> Jens.Scheffler@de.bosch.com.invalid> wrote:
>> 
>>> Hi Jarek,
>>> 
>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling 
>>> that a lot of very similar code is repeated in all the providers. 
>>> But during review yesterday I dropped the ides because if we have a 
>>> common piece then we are locking all depending providers 
>>> (potentially) together if common code changes.
>>> As of additional dependency complexity between providers actually 
>>> the additional dependency I think creates more prblems than the 
>>> benefit… would be cool if tehere would be an option to „inline“ 
>>> common code from a single place but keep individual providers fully 
>>> independent…
>>> 
>>> Jens
>>> 
>>> Sent from Outlook for
>>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> 2F
>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
>>> 97
>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>> %7 
>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>> Ij 
>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>> ________________________________
>>> From: Jarek Potiuk <ja...@potiuk.com>
>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>> To: dev@airflow.apache.org <de...@airflow.apache.org>
>>> Subject: [DISCUSS] Common.util provider?
>>> 
>>> Hello everyone,
>>> 
>>> How do we feel about introducing a common.util provider?
>>> 
>>> I know it's not been the original idea behind providers, but - after 
>>> testing common.sql and now also having common.io, seems like more 
>>> and more we would like to extract some common code that we would 
>>> like providers to use, but we refrain from it, because it will only 
>>> be actually usable 6 months after we introduce some common code.
>>> 
>>> However, if we introduce common.util, this problem is generally gone 
>>> - at the expense of more complex maintenance and cross-provider
>> dependencies.
>>> We should be able to add a common util method and use it in a 
>>> provider at the same time.
>>> 
>>> Think Amazon provider using a new feature released in common.util
>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>> already for common.sql. We know how to do it, we know what to avoid, 
>>> we know we cannot introduce backwards-incompatible changes, so we 
>>> have to be very clear what is and what is not a public API there, We 
>>> could rather easily add tests to prevent such backwards-incompatible 
>>> changes. We even have a solution for chicken-egg providers where we 
>>> need to release two providers at the same time if they depend on 
>>> each other. Generally speaking it's quite workable but adds a bit of overhead.
>>> 
>>> Examples that we could implement as "common.util":
>>> 
>>> - common versioning check with cache - where multiple providers 
>>> could reuse "do we have pendulum 2"
>>> - more complex - some date management features (we have a few like 
>>> date_ranges/round_time). But there are many more.
>>> 
>>> I generally do not love the common "util" approach. It has a 
>>> tendency to become a bag of everything over time. but if we limit it 
>>> to a set of small, fully decoupled modules where each module is 
>>> independent - it's OK. And we already have it in "airflow.util" and 
>>> we seem to be
>> doing well.
>>> 
>>> WDYT? Is it worth it ?
>>> 
>>> J.
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org


RE: [DISCUSS] Common.util provider?

Posted by "Scheffler Jens (XC-AS/EAE-ADA-T)" <Je...@de.bosch.com.INVALID>.
@Uranusjr would this help as a pilot in your AIP-60 code to parse and validate URIs for datasets?

Mit freundlichen Grüßen / Best regards

Jens Scheffler

Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | www.bosch.com
Tel. +49 711 811-91508 | Mobil +49 160 90417410 | Jens.Scheffler@de.bosch.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; 
Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus Forschner, 
Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert

-----Original Message-----
From: Jarek Potiuk <ja...@potiuk.com> 
Sent: Donnerstag, 22. Februar 2024 00:53
To: dev@airflow.apache.org
Subject: Re: [DISCUSS] Common.util provider?

Yep. It could work with symbolic links. Tested it and with flit - both wheel and sdist packaged code such symbolically linked file is dereferenced and copy of the file is added there. It could be a nice way of doing it.

Maybe then worth trying next time if someone has a need?

J

On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <Je...@de.bosch.com.invalid> wrote:

> >>> As of additional dependency complexity between providers actually 
> >>> the
> additional dependency I think creates more problems than the benefit… 
> would be cool if there would be an option to „inline“ common code from 
> a single place but keep individual providers fully independent…
>
> >Well, we already  do a lot of inlining, so if we think we should do 
> >more,
> we have mechanisms for that. We have  pre-commits and release commands 
> that do a lot of that. Pre commits are inlining scripts in 
> Dockerfiles, shortening PyPI readme . The providers __init__.py files 
> and changelogs and index documentation .rst (partially) are generated 
> at release documentation preparation time, pyproject.toml for 
> providers are generated from common templates at package building time 
> and so on and so on :). So we can do more of that and generate common 
> code, it's just a matter of adding pre-commits or breeze scripts. But 
> again "can't have and eat cake" - this has the drawback that there are 
> extra steps involved and even if it's automated it does add friction 
> when you have to regenerate the code every time you change it and when 
> you change it in another place than where you use it.
>
> Yes, also thought a moment about pre-commit. I#d be okay if we really 
> in-line and have a pre-commit aligning the redundancy of python in folders.
> Might need to be an opt-in if only 10 of 85 providers are using common 
> stuff and if we change a common line we probably do not need to affect 
> all providers.
>
> As long as no Windows users trying to code for airflow (do we need to
> consider?) would it also work to use symlinks? Git can cope with this, 
> I don't know if the python toolchain can de-reference a copy and are 
> not packaging a symlink? Would be worth a test... would save the 
> pre-commit and we even could selectively include common bla into 
> providers as needed :-D
>
> Mit freundlichen Grüßen / Best regards
>
> Jens Scheffler
>
> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH | 
> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | 
> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | 
> Jens.Scheffler@de.bosch.com
>
> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. 
> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. 
> Tanja Rückert
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Mittwoch, 21. Februar 2024 21:18
> To: dev@airflow.apache.org
> Subject: Re: [DISCUSS] Common.util provider?
>
> > if we have a common piece then we are locking all depending 
> > providers
> (potentially) together if common code changes
>
> Yes, coupling in this case is the drawback of this solution. You can't 
> have cake and eat it too and in this case you trade DRY with coupling.
>
> > As of additional dependency complexity between providers actually 
> > the
> additional dependency I think creates more problems than the benefit… 
> would be cool if there would be an option to „inline“ common code from 
> a single place but keep individual providers fully independent…
>
> Well, we already  do a lot of inlining, so if we think we should do 
> more, we have mechanisms for that. We have  pre-commits and release 
> commands that do a lot of that. Pre commits are inlining scripts in 
> Dockerfiles, shortening PyPI readme . The providers __init__.py files 
> and changelogs and index documentation .rst (partially) are generated 
> at release documentation preparation time, pyproject.toml for 
> providers are generated from common templates at package building time 
> and so on and so on :). So we can do more of that and generate common 
> code, it's just a matter of adding pre-commits or breeze scripts. But 
> again "can't have and eat cake" - this has the drawback that there are 
> extra steps involved and even if it's automated it does add friction 
> when you have to regenerate the code every time you change it and when 
> you change it in another place than where you use it.
>
> J.
>
> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < 
> Jens.Scheffler@de.bosch.com.invalid> wrote:
>
> > Hi Jarek,
> >
> > At reviewing the PR from uranusjr for AIP-60 I also had the feeling 
> > that a lot of very similar code is repeated in all the providers. 
> > But during review yesterday I dropped the ides because if we have a 
> > common piece then we are locking all depending providers 
> > (potentially) together if common code changes.
> > As of additional dependency complexity between providers actually 
> > the additional dependency I think creates more prblems than the 
> > benefit… would be cool if tehere would be an option to „inline“ 
> > common code from a single place but keep individual providers fully 
> > independent…
> >
> > Jens
> >
> > Sent from Outlook for
> > iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
> > 2F
> > aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
> > 97
> > 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
> > %7 
> > C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> > Ij 
> > oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
> > WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> > ________________________________
> > From: Jarek Potiuk <ja...@potiuk.com>
> > Sent: Wednesday, February 21, 2024 5:42:20 PM
> > To: dev@airflow.apache.org <de...@airflow.apache.org>
> > Subject: [DISCUSS] Common.util provider?
> >
> > Hello everyone,
> >
> > How do we feel about introducing a common.util provider?
> >
> > I know it's not been the original idea behind providers, but - after 
> > testing common.sql and now also having common.io, seems like more 
> > and more we would like to extract some common code that we would 
> > like providers to use, but we refrain from it, because it will only 
> > be actually usable 6 months after we introduce some common code.
> >
> > However, if we introduce common.util, this problem is generally gone 
> > - at the expense of more complex maintenance and cross-provider
> dependencies.
> > We should be able to add a common util method and use it in a 
> > provider at the same time.
> >
> > Think Amazon provider using a new feature released in common.util
> > >=1.2.0 and google provider >= 1.1.0. All manageable and we do it
> > already for common.sql. We know how to do it, we know what to avoid, 
> > we know we cannot introduce backwards-incompatible changes, so we 
> > have to be very clear what is and what is not a public API there, We 
> > could rather easily add tests to prevent such backwards-incompatible 
> > changes. We even have a solution for chicken-egg providers where we 
> > need to release two providers at the same time if they depend on 
> > each other. Generally speaking it's quite workable but adds a bit of overhead.
> >
> > Examples that we could implement as "common.util":
> >
> > - common versioning check with cache - where multiple providers 
> > could reuse "do we have pendulum 2"
> > - more complex - some date management features (we have a few like 
> > date_ranges/round_time). But there are many more.
> >
> > I generally do not love the common "util" approach. It has a 
> > tendency to become a bag of everything over time. but if we limit it 
> > to a set of small, fully decoupled modules where each module is 
> > independent - it's OK. And we already have it in "airflow.util" and 
> > we seem to be
> doing well.
> >
> > WDYT? Is it worth it ?
> >
> > J.
> >
>

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Yep. It could work with symbolic links. Tested it and with flit - both
wheel and sdist packaged code such symbolically linked file is dereferenced
and copy of the file is added there. It could be a nice way of doing it.

Maybe then worth trying next time if someone has a need?

J

On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T)
<Je...@de.bosch.com.invalid> wrote:

> >>> As of additional dependency complexity between providers actually the
> additional dependency I think creates more problems than the benefit…
> would be cool if there would be an option to „inline“ common code from a
> single place but keep individual providers fully independent…
>
> >Well, we already  do a lot of inlining, so if we think we should do more,
> we have mechanisms for that. We have  pre-commits and release commands that
> do a lot of that. Pre commits are inlining scripts in Dockerfiles,
> shortening PyPI readme . The providers __init__.py files and changelogs and
> index documentation .rst (partially) are generated at release documentation
> preparation time, pyproject.toml for providers are generated from common
> templates at package building time and so on and so on :). So we can do
> more of that and generate common code, it's just a matter of adding
> pre-commits or breeze scripts. But again "can't have and eat cake" - this
> has the drawback that there are extra steps involved and even if it's
> automated it does add friction when you have to regenerate the code every
> time you change it and when you change it in another place than where you
> use it.
>
> Yes, also thought a moment about pre-commit. I#d be okay if we really
> in-line and have a pre-commit aligning the redundancy of python in folders.
> Might need to be an opt-in if only 10 of 85 providers are using common
> stuff and if we change a common line we probably do not need to affect all
> providers.
>
> As long as no Windows users trying to code for airflow (do we need to
> consider?) would it also work to use symlinks? Git can cope with this, I
> don't know if the python toolchain can de-reference a copy and are not
> packaging a symlink? Would be worth a test... would save the pre-commit and
> we even could selectively include common bla into providers as needed :-D
>
> Mit freundlichen Grüßen / Best regards
>
> Jens Scheffler
>
> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> GERMANY | www.bosch.com
> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> Jens.Scheffler@de.bosch.com
>
> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> Forschner,
> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Mittwoch, 21. Februar 2024 21:18
> To: dev@airflow.apache.org
> Subject: Re: [DISCUSS] Common.util provider?
>
> > if we have a common piece then we are locking all depending providers
> (potentially) together if common code changes
>
> Yes, coupling in this case is the drawback of this solution. You can't
> have cake and eat it too and in this case you trade DRY with coupling.
>
> > As of additional dependency complexity between providers actually the
> additional dependency I think creates more problems than the benefit…
> would be cool if there would be an option to „inline“ common code from a
> single place but keep individual providers fully independent…
>
> Well, we already  do a lot of inlining, so if we think we should do more,
> we have mechanisms for that. We have  pre-commits and release commands that
> do a lot of that. Pre commits are inlining scripts in Dockerfiles,
> shortening PyPI readme . The providers __init__.py files and changelogs and
> index documentation .rst (partially) are generated at release documentation
> preparation time, pyproject.toml for providers are generated from common
> templates at package building time and so on and so on :). So we can do
> more of that and generate common code, it's just a matter of adding
> pre-commits or breeze scripts. But again "can't have and eat cake" - this
> has the drawback that there are extra steps involved and even if it's
> automated it does add friction when you have to regenerate the code every
> time you change it and when you change it in another place than where you
> use it.
>
> J.
>
> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
> Jens.Scheffler@de.bosch.com.invalid> wrote:
>
> > Hi Jarek,
> >
> > At reviewing the PR from uranusjr for AIP-60 I also had the feeling
> > that a lot of very similar code is repeated in all the providers. But
> > during review yesterday I dropped the ides because if we have a common
> > piece then we are locking all depending providers (potentially)
> > together if common code changes.
> > As of additional dependency complexity between providers actually the
> > additional dependency I think creates more prblems than the benefit…
> > would be cool if tehere would be an option to „inline“ common code
> > from a single place but keep individual providers fully independent…
> >
> > Jens
> >
> > Sent from Outlook for
> > iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c8897
> > 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7
> > C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
> > oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNnWB
> > SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> > ________________________________
> > From: Jarek Potiuk <ja...@potiuk.com>
> > Sent: Wednesday, February 21, 2024 5:42:20 PM
> > To: dev@airflow.apache.org <de...@airflow.apache.org>
> > Subject: [DISCUSS] Common.util provider?
> >
> > Hello everyone,
> >
> > How do we feel about introducing a common.util provider?
> >
> > I know it's not been the original idea behind providers, but - after
> > testing common.sql and now also having common.io, seems like more and
> > more we would like to extract some common code that we would like
> > providers to use, but we refrain from it, because it will only be
> > actually usable 6 months after we introduce some common code.
> >
> > However, if we introduce common.util, this problem is generally gone -
> > at the expense of more complex maintenance and cross-provider
> dependencies.
> > We should be able to add a common util method and use it in a provider
> > at the same time.
> >
> > Think Amazon provider using a new feature released in common.util
> > >=1.2.0 and google provider >= 1.1.0. All manageable and we do it
> > already for common.sql. We know how to do it, we know what to avoid,
> > we know we cannot introduce backwards-incompatible changes, so we have
> > to be very clear what is and what is not a public API there, We could
> > rather easily add tests to prevent such backwards-incompatible
> > changes. We even have a solution for chicken-egg providers where we
> > need to release two providers at the same time if they depend on each
> > other. Generally speaking it's quite workable but adds a bit of overhead.
> >
> > Examples that we could implement as "common.util":
> >
> > - common versioning check with cache - where multiple providers could
> > reuse "do we have pendulum 2"
> > - more complex - some date management features (we have a few like
> > date_ranges/round_time). But there are many more.
> >
> > I generally do not love the common "util" approach. It has a tendency
> > to become a bag of everything over time. but if we limit it to a set
> > of small, fully decoupled modules where each module is independent -
> > it's OK. And we already have it in "airflow.util" and we seem to be
> doing well.
> >
> > WDYT? Is it worth it ?
> >
> > J.
> >
>

RE: [DISCUSS] Common.util provider?

Posted by "Scheffler Jens (XC-AS/EAE-ADA-T)" <Je...@de.bosch.com.INVALID>.
>>> As of additional dependency complexity between providers actually the
additional dependency I think creates more problems than the benefit… would be cool if there would be an option to „inline“ common code from a single place but keep individual providers fully independent…

>Well, we already  do a lot of inlining, so if we think we should do more, we have mechanisms for that. We have  pre-commits and release commands that do a lot of that. Pre commits are inlining scripts in Dockerfiles, shortening PyPI readme . The providers __init__.py files and changelogs and index documentation .rst (partially) are generated at release documentation preparation time, pyproject.toml for providers are generated from common templates at package building time and so on and so on :). So we can do more of that and generate common code, it's just a matter of adding pre-commits or breeze scripts. But again "can't have and eat cake" - this has the drawback that there are extra steps involved and even if it's automated it does add friction when you have to regenerate the code every time you change it and when you change it in another place than where you use it.

Yes, also thought a moment about pre-commit. I#d be okay if we really in-line and have a pre-commit aligning the redundancy of python in folders. Might need to be an opt-in if only 10 of 85 providers are using common stuff and if we change a common line we probably do not need to affect all providers.

As long as no Windows users trying to code for airflow (do we need to consider?) would it also work to use symlinks? Git can cope with this, I don't know if the python toolchain can de-reference a copy and are not packaging a symlink? Would be worth a test... would save the pre-commit and we even could selectively include common bla into providers as needed :-D

Mit freundlichen Grüßen / Best regards

Jens Scheffler

Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | www.bosch.com
Tel. +49 711 811-91508 | Mobil +49 160 90417410 | Jens.Scheffler@de.bosch.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; 
Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus Forschner, 
Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert

-----Original Message-----
From: Jarek Potiuk <ja...@potiuk.com> 
Sent: Mittwoch, 21. Februar 2024 21:18
To: dev@airflow.apache.org
Subject: Re: [DISCUSS] Common.util provider?

> if we have a common piece then we are locking all depending providers
(potentially) together if common code changes

Yes, coupling in this case is the drawback of this solution. You can't have cake and eat it too and in this case you trade DRY with coupling.

> As of additional dependency complexity between providers actually the
additional dependency I think creates more problems than the benefit… would be cool if there would be an option to „inline“ common code from a single place but keep individual providers fully independent…

Well, we already  do a lot of inlining, so if we think we should do more, we have mechanisms for that. We have  pre-commits and release commands that do a lot of that. Pre commits are inlining scripts in Dockerfiles, shortening PyPI readme . The providers __init__.py files and changelogs and index documentation .rst (partially) are generated at release documentation preparation time, pyproject.toml for providers are generated from common templates at package building time and so on and so on :). So we can do more of that and generate common code, it's just a matter of adding pre-commits or breeze scripts. But again "can't have and eat cake" - this has the drawback that there are extra steps involved and even if it's automated it does add friction when you have to regenerate the code every time you change it and when you change it in another place than where you use it.

J.

On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <Je...@de.bosch.com.invalid> wrote:

> Hi Jarek,
>
> At reviewing the PR from uranusjr for AIP-60 I also had the feeling 
> that a lot of very similar code is repeated in all the providers. But 
> during review yesterday I dropped the ides because if we have a common 
> piece then we are locking all depending providers (potentially) 
> together if common code changes.
> As of additional dependency complexity between providers actually the 
> additional dependency I think creates more prblems than the benefit… 
> would be cool if tehere would be an option to „inline“ common code 
> from a single place but keep individual providers fully independent…
>
> Jens
>
> Sent from Outlook for 
> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c8897
> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7
> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNnWB
> SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
> ________________________________
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Wednesday, February 21, 2024 5:42:20 PM
> To: dev@airflow.apache.org <de...@airflow.apache.org>
> Subject: [DISCUSS] Common.util provider?
>
> Hello everyone,
>
> How do we feel about introducing a common.util provider?
>
> I know it's not been the original idea behind providers, but - after 
> testing common.sql and now also having common.io, seems like more and 
> more we would like to extract some common code that we would like 
> providers to use, but we refrain from it, because it will only be 
> actually usable 6 months after we introduce some common code.
>
> However, if we introduce common.util, this problem is generally gone - 
> at the expense of more complex maintenance and cross-provider dependencies.
> We should be able to add a common util method and use it in a provider 
> at the same time.
>
> Think Amazon provider using a new feature released in common.util 
> >=1.2.0 and google provider >= 1.1.0. All manageable and we do it 
> already for common.sql. We know how to do it, we know what to avoid, 
> we know we cannot introduce backwards-incompatible changes, so we have 
> to be very clear what is and what is not a public API there, We could 
> rather easily add tests to prevent such backwards-incompatible 
> changes. We even have a solution for chicken-egg providers where we 
> need to release two providers at the same time if they depend on each 
> other. Generally speaking it's quite workable but adds a bit of overhead.
>
> Examples that we could implement as "common.util":
>
> - common versioning check with cache - where multiple providers could 
> reuse "do we have pendulum 2"
> - more complex - some date management features (we have a few like 
> date_ranges/round_time). But there are many more.
>
> I generally do not love the common "util" approach. It has a tendency 
> to become a bag of everything over time. but if we limit it to a set 
> of small, fully decoupled modules where each module is independent - 
> it's OK. And we already have it in "airflow.util" and we seem to be doing well.
>
> WDYT? Is it worth it ?
>
> J.
>

Re: [DISCUSS] Common.util provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
> if we have a common piece then we are locking all depending providers
(potentially) together if common code changes

Yes, coupling in this case is the drawback of this solution. You can't have
cake and eat it too and in this case you trade DRY with coupling.

> As of additional dependency complexity between providers actually the
additional dependency I think creates more problems than the benefit… would
be cool if there would be an option to „inline“ common code from a single
place but keep individual providers fully independent…

Well, we already  do a lot of inlining, so if we think we should do more,
we have mechanisms for that. We have  pre-commits and release commands that
do a lot of that. Pre commits are inlining scripts in Dockerfiles,
shortening PyPI readme . The providers __init__.py files and changelogs and
index documentation .rst (partially) are generated at release
documentation preparation time, pyproject.toml for providers are generated
from common templates at package building time and so on and so on :). So
we can do more of that and generate common code, it's just a matter of
adding pre-commits or breeze scripts. But again "can't have and eat cake" -
this has the drawback that there are extra steps involved and even if it's
automated it does add friction when you have to regenerate the code every
time you change it and when you change it in another place than where you
use it.

J.

On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T)
<Je...@de.bosch.com.invalid> wrote:

> Hi Jarek,
>
> At reviewing the PR from uranusjr for AIP-60 I also had the feeling that a
> lot of very similar code is repeated in all the providers. But during
> review yesterday I dropped the ides because if we have a common piece then
> we are locking all depending providers (potentially) together if common
> code changes.
> As of additional dependency complexity between providers actually the
> additional dependency I think creates more prblems than the benefit… would
> be cool if tehere would be an option to „inline“ common code from a single
> place but keep individual providers fully independent…
>
> Jens
>
> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Wednesday, February 21, 2024 5:42:20 PM
> To: dev@airflow.apache.org <de...@airflow.apache.org>
> Subject: [DISCUSS] Common.util provider?
>
> Hello everyone,
>
> How do we feel about introducing a common.util provider?
>
> I know it's not been the original idea behind providers, but - after
> testing common.sql and now also having common.io, seems like more and more
> we would like to extract some common code that we would like providers to
> use, but we refrain from it, because it will only be actually usable 6
> months after we introduce some common code.
>
> However, if we introduce common.util, this problem is generally gone - at
> the expense of more complex maintenance and cross-provider dependencies.
> We should be able to add a common util method and use it in a provider at
> the same time.
>
> Think Amazon provider using a new feature released in common.util >=1.2.0
> and google provider >= 1.1.0. All manageable and we do it already for
> common.sql. We know how to do it, we know what to avoid, we know we cannot
> introduce backwards-incompatible changes, so we have to be very clear what
> is and what is not a public API there, We could rather easily add tests to
> prevent such backwards-incompatible changes. We even have a solution for
> chicken-egg providers where we need to release two providers at the same
> time if they depend on each other. Generally speaking it's quite workable
> but adds a bit of overhead.
>
> Examples that we could implement as "common.util":
>
> - common versioning check with cache - where multiple providers could reuse
> "do we have pendulum 2"
> - more complex - some date management features (we have a few like
> date_ranges/round_time). But there are many more.
>
> I generally do not love the common "util" approach. It has a tendency to
> become a bag of everything over time. but if we limit it to a set of small,
> fully decoupled modules where each module is independent - it's OK. And we
> already have it in "airflow.util" and we seem to be doing well.
>
> WDYT? Is it worth it ?
>
> J.
>

Re: [DISCUSS] Common.util provider?

Posted by "Scheffler Jens (XC-AS/EAE-ADA-T)" <Je...@de.bosch.com.INVALID>.
Hi Jarek,

At reviewing the PR from uranusjr for AIP-60 I also had the feeling that a lot of very similar code is repeated in all the providers. But during review yesterday I dropped the ides because if we have a common piece then we are locking all depending providers (potentially) together if common code changes.
As of additional dependency complexity between providers actually the additional dependency I think creates more prblems than the benefit… would be cool if tehere would be an option to „inline“ common code from a single place but keep individual providers fully independent…

Jens

Sent from Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Wednesday, February 21, 2024 5:42:20 PM
To: dev@airflow.apache.org <de...@airflow.apache.org>
Subject: [DISCUSS] Common.util provider?

Hello everyone,

How do we feel about introducing a common.util provider?

I know it's not been the original idea behind providers, but - after
testing common.sql and now also having common.io, seems like more and more
we would like to extract some common code that we would like providers to
use, but we refrain from it, because it will only be actually usable 6
months after we introduce some common code.

However, if we introduce common.util, this problem is generally gone - at
the expense of more complex maintenance and cross-provider dependencies.
We should be able to add a common util method and use it in a provider at
the same time.

Think Amazon provider using a new feature released in common.util >=1.2.0
and google provider >= 1.1.0. All manageable and we do it already for
common.sql. We know how to do it, we know what to avoid, we know we cannot
introduce backwards-incompatible changes, so we have to be very clear what
is and what is not a public API there, We could rather easily add tests to
prevent such backwards-incompatible changes. We even have a solution for
chicken-egg providers where we need to release two providers at the same
time if they depend on each other. Generally speaking it's quite workable
but adds a bit of overhead.

Examples that we could implement as "common.util":

- common versioning check with cache - where multiple providers could reuse
"do we have pendulum 2"
- more complex - some date management features (we have a few like
date_ranges/round_time). But there are many more.

I generally do not love the common "util" approach. It has a tendency to
become a bag of everything over time. but if we limit it to a set of small,
fully decoupled modules where each module is independent - it's OK. And we
already have it in "airflow.util" and we seem to be doing well.

WDYT? Is it worth it ?

J.