You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/04/04 13:39:34 UTC

[DISCUSS] Approach for new providers of the community

Hey all,

We seem to have an influx of new providers coming our way:

* Delta Sharing:
https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
* Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
* Versatile Data Kit:
https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0

I think it might be a good idea to bring the discussion in one place
(here) and decide on what our approach is for accepting new providers
(the original discussion from Andon was focused mostly about VDK's
case, but maybe we could work out a general approach and "guidelines"
- what approach is best so that we do not have to discuss it
separately for each proposal, but we have some more (or less) clear
rules on when we think it's good to accept providers as community.

Generally speaking we have two approaches:
* providers managed by the Apache Airflow community
* providers managed by 3rd-parties

I think my email here, nicely summarizes what is in
https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n

I tried to look for earlier devlist discussions about the subject
(maybe someone can find it :), I think we have never formalized nor
written down but I do recall some (slack??) discussions about it from
the past.

While we have no control/influence (and we do not want to have) for
3rd-party providers, we definitely have both for the community-managed
ones - and there should be some rules defined to decide when we are
"ok" to accept a provider. Not always having "more" providers in the
"community" area is better. More often than not, code is a liability
more often than an asset.

From those discussions I had I recall points such us:

* likelihood of the provider being used by many users
* possibility to test/support the providers by maintainers or
dedicated "stakeholders"
* quality of the code and following our expectations (docs/how to
guides, unit/system test)
* competing (?) with Airflow - there could be some providers of
"competing" products maybe (I am not sure if this is a concern of
ours) which we simply might decide to not maintain in the community

I am happy to write it down and propose such rules revolving around
those - but I would like to hear what people think first.

What are your thoughts here?

J

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hello everyone,

I think we have a series of things that make it difficult to focus on
such  long term discussions  - 2.3.0 was out, many  people are busy
with 2.3.1 which is going to focus on "teething" problems and we have
Airflow Summit next week (yay!) and I know how many people in our
community are either busy preparing the local events or their talks
:).

I have some ideas and proposals on how we can approach the subject and
would like to continue the discussion (I would still love to hear more
voices), but I think it would be great if we can resume the discussion
after the Summit.

But - Summit is not only a "disruption" - it's also an opportunity to
make the discussion better. I think the summit with the local events
is a great opportunity to discuss this in person - and at least in 13
separate locations  :).

So I have a kind request to everyone - let's talk about it at the
local events. I will be in both - London and Warsaw, so if you happen
to be there - happy to share my thoughts with anyone interested  and
hear what you have to say :) - and I encourage similar discussions
elsewhere.

I think the decision on how we approach providers in the future is a
very important one and we should take it very seriously and we should
give anyone a chance to participate. It will define a bit the future
of the whole Airflow Ecosystem.

J.

On Tue, Apr 26, 2022 at 12:43 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> I think this is a different story (and different discussion).
> And I think we should have good reasons to split the repo. I think we
> do have it but for different reasons many people think we will get
> there sooner rather than later - but I think we should not hijack the
> discussion for it.
> This discussion is more for governance of providers rather than which
> repo they are.
>
> Unless I am mistaken - moving providers to separate repo does not
> really solve any of the "should we have more or less community
> providers". It's really a technical split of code, but If we have
> separate repo and we still add more providers from community we will
> still have to make sure all of them can be installed, run the tests
> the code, make sure they run with Airflow (released and main) and make
> sure that airflow changes do not break it.
>
> It means about the same amount of safeguards and protection, CI
> overhead we have now - only the code will be somewhere else, but the
> amount of CI tests, when they are executing, who is allowed to merge
> the code, approval process will remain the same as long as this will
> be "apache Airflow PMC" project.
>
> J.
>
> On Tue, Apr 26, 2022 at 12:21 AM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > Hey all,
> >
> > Another alternative is separating out core providers from the Core Airflow Repo into a separate repo within the Apache Org itself, maybe: apache-airflow-providers.
> >
> > That will not decrease the maintenance from the Committers but the Core work and release will be completely separate and untangled from Apache Airflow repo and can move at a faster pace.
> >
> > The benefit and compromise for the community is that all the providers are still officially maintained and released by the committers. However, over time we can invite more committers who show active participation in apache-airflow-providers repo too.
> >
> > This is a compromise to the arguments about Providers being integral to the success of Airflow and as such should be maintained and released officially.
> >
> > Regards,
> > Kaxil
> >
> > On Mon, 25 Apr 2022 at 19:17, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> > 1. https://registry.astronomer.io/
> >> > 2. Using the new classifier https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
> >>
> >> Yep. precisely what I thought to place at the top of the ecosystem page.
> >>
> >> > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <fe...@amazon.com.INVALID> wrote:
> >> >>
> >> >> I still think that easy inclusion with a defined pruning process is best, but it's looking like that is the minority opinion.  In which case, IFF we are going to be keeping them separate then I definitely agree that there needs to be a fast/easy/convenient way to find them.
> >> >> ________________________________
> >> >> From: Jarek Potiuk <ja...@potiuk.com>
> >> >> Sent: Monday, April 25, 2022 7:17 AM
> >> >> To: dev@airflow.apache.org
> >> >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
> >> >>
> >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >> >>
> >> >>
> >> >>
> >> >> Just to come back to it (please everyone a little patience - I think
> >> >> some people have not chimed in yet due to 2.3.0 "focus" so this
> >> >> discussion might take a little more time.
> >> >>
> >> >> My current thinking on it so far:
> >> >>
> >> >> * I am not really in the camp of "lets not add any more providers at
> >> >> all" and also not in the "let's accept all that are good quality code
> >> >> providers". I think there are a few providers which "after fulfilling
> >> >> all the criteria" could be added - mostly open-source standards,
> >> >> generic, established technologies - but it should be rather limited
> >> >> and rare event.
> >> >>
> >> >> * when there is a proprietary service which has not too broad reach
> >> >> and it's not likely that we will have some committers who will be
> >> >> maintaining it - becauyse they are users - the default option should
> >> >> be to make a standalone per-service providers. the difficulty here is
> >> >> to set the right "non-quality" criteria - but I think we really want
> >> >> to limit any new code to maintain. Here maybe we can have some more
> >> >> concrete criteria proposed - so that we do not have to vote
> >> >> individually on each proposed providers - and so that those who want
> >> >> to propose a provider could check themselves by reading the criteria,
> >> >> what's best for them.
> >> >>
> >> >> * we might improve our "providers" list at the "ecosystem" to make
> >> >> providers stand out a bit more (maybe simply put them on top and make
> >> >> a clearly visible section). We are not going to maintain and keep the
> >> >> nice "registry" similar to Astronomer's one (we could even actually
> >> >> make the link to the Astronomer registry more prominent as the way to
> >> >> "search" for providers on our Ecosystem Page. We could also add a link
> >> >> to Pypi with the "aifrflow provider" classifier at the ecosystem page
> >> >> as another way of searching for providers. All that is perfectly fine,
> >> >> I think with the ASF Policies and spirit. And it will be good for
> >> >> discovery.
> >> >>
> >> >> WDYT?
> >> >>
> >> >> J.
> >> >>
> >> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
> >> >>>
> >> >>>
> >> >>> Hello!
> >> >>>
> >> >>> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
> >> >>>
> >> >>> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
> >> >>>
> >> >>> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
> >> >>>
> >> >>> Thanks,
> >> >>> Samhita
> >> >>>
> >> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
> >> >>>>
> >> >>>>
> >> >>>> Hello all
> >> >>>>
> >> >>>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
> >> >>>>
> >> >>>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
> >> >>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
> >> >>>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
> >> >>>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
> >> >>>>
> >> >>>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
> >> >>>>
> >> >>>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
> >> >>>>>
> >> >>>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
> >> >>>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
> >> >>>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
> >> >>>>>
> >> >>>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
> >> >>>>>
> >> >>>>> When it comes to accepting new providers - I like John's suggestions:
> >> >>>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
> >> >>>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
> >> >>>>>
> >> >>>>> Regards, Rafal.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
> >> >>>>>>
> >> >>>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
> >> >>>>>>
> >> >>>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
> >> >>>>>>
> >> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
> >> >>>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
> >> >>>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
> >> >>>>>>
> >> >>>>>> Let me see it from all the different POVs:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From 1) Maintainer POV
> >> >>>>>>
> >> >>>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
> >> >>>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
> >> >>>>>>
> >> >>>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
> >> >>>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
> >> >>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From  2) User POV
> >> >>>>>>
> >> >>>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
> >> >>>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
> >> >>>>>>
> >> >>>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
> >> >>>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
> >> >>>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
> >> >>>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
> >> >>>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
> >> >>>>>>
> >> >>>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
> >> >>>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From 3) "Service providers" POV
> >> >>>>>>
> >> >>>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
> >> >>>>>>
> >> >>>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
> >> >>>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
> >> >>>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
> >> >>>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
> >> >>>>>>
> >> >>>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
> >> >>>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
> >> >>>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
> >> >>>>>>
> >> >>>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
> >> >>>>>>
> >> >>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
> >> >>>>>>
> >> >>>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
> >> >>>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
> >> >>>>>>
> >> >>>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
> >> >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
> >> >>>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
> >> >>>>>>
> >> >>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
> >> >>>>>>
> >> >>>>>> J.
> >> >>>>>>
> >> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Ash what is your recommendation for the users should we follow your suggestion?
> >> >>>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
> >> >>>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
> >> >>>>>>> This means that users will need to create their own constraints mechanism and maintain it.
> >> >>>>>>>
> >> >>>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
> >> >>>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
> >> >>>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
> >> >>>>>>>
> >> >>>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
> >> >>>>>>>
> >> >>>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
> >> >>>>>>>>
> >> >>>>>>>> Collin McNulty
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
> >> >>>>>>>>>
> >> >>>>>>>>> 1. They are owners so they can merge and release whenever they need.
> >> >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
> >> >>>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
> >> >>>>>>>>>
> >> >>>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
> >> >>>>>>>>>
> >> >>>>>>>>> Cheers,
> >> >>>>>>>>> Tomsk
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> With best wishes,                    Alex Ott
> >> >>>> http://alexott.net/
> >> >>>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
I think this is a different story (and different discussion).
And I think we should have good reasons to split the repo. I think we
do have it but for different reasons many people think we will get
there sooner rather than later - but I think we should not hijack the
discussion for it.
This discussion is more for governance of providers rather than which
repo they are.

Unless I am mistaken - moving providers to separate repo does not
really solve any of the "should we have more or less community
providers". It's really a technical split of code, but If we have
separate repo and we still add more providers from community we will
still have to make sure all of them can be installed, run the tests
the code, make sure they run with Airflow (released and main) and make
sure that airflow changes do not break it.

It means about the same amount of safeguards and protection, CI
overhead we have now - only the code will be somewhere else, but the
amount of CI tests, when they are executing, who is allowed to merge
the code, approval process will remain the same as long as this will
be "apache Airflow PMC" project.

J.

On Tue, Apr 26, 2022 at 12:21 AM Kaxil Naik <ka...@gmail.com> wrote:
>
> Hey all,
>
> Another alternative is separating out core providers from the Core Airflow Repo into a separate repo within the Apache Org itself, maybe: apache-airflow-providers.
>
> That will not decrease the maintenance from the Committers but the Core work and release will be completely separate and untangled from Apache Airflow repo and can move at a faster pace.
>
> The benefit and compromise for the community is that all the providers are still officially maintained and released by the committers. However, over time we can invite more committers who show active participation in apache-airflow-providers repo too.
>
> This is a compromise to the arguments about Providers being integral to the success of Airflow and as such should be maintained and released officially.
>
> Regards,
> Kaxil
>
> On Mon, 25 Apr 2022 at 19:17, Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> > 1. https://registry.astronomer.io/
>> > 2. Using the new classifier https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>>
>> Yep. precisely what I thought to place at the top of the ecosystem page.
>>
>> > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <fe...@amazon.com.INVALID> wrote:
>> >>
>> >> I still think that easy inclusion with a defined pruning process is best, but it's looking like that is the minority opinion.  In which case, IFF we are going to be keeping them separate then I definitely agree that there needs to be a fast/easy/convenient way to find them.
>> >> ________________________________
>> >> From: Jarek Potiuk <ja...@potiuk.com>
>> >> Sent: Monday, April 25, 2022 7:17 AM
>> >> To: dev@airflow.apache.org
>> >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
>> >>
>> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>> >>
>> >>
>> >>
>> >> Just to come back to it (please everyone a little patience - I think
>> >> some people have not chimed in yet due to 2.3.0 "focus" so this
>> >> discussion might take a little more time.
>> >>
>> >> My current thinking on it so far:
>> >>
>> >> * I am not really in the camp of "lets not add any more providers at
>> >> all" and also not in the "let's accept all that are good quality code
>> >> providers". I think there are a few providers which "after fulfilling
>> >> all the criteria" could be added - mostly open-source standards,
>> >> generic, established technologies - but it should be rather limited
>> >> and rare event.
>> >>
>> >> * when there is a proprietary service which has not too broad reach
>> >> and it's not likely that we will have some committers who will be
>> >> maintaining it - becauyse they are users - the default option should
>> >> be to make a standalone per-service providers. the difficulty here is
>> >> to set the right "non-quality" criteria - but I think we really want
>> >> to limit any new code to maintain. Here maybe we can have some more
>> >> concrete criteria proposed - so that we do not have to vote
>> >> individually on each proposed providers - and so that those who want
>> >> to propose a provider could check themselves by reading the criteria,
>> >> what's best for them.
>> >>
>> >> * we might improve our "providers" list at the "ecosystem" to make
>> >> providers stand out a bit more (maybe simply put them on top and make
>> >> a clearly visible section). We are not going to maintain and keep the
>> >> nice "registry" similar to Astronomer's one (we could even actually
>> >> make the link to the Astronomer registry more prominent as the way to
>> >> "search" for providers on our Ecosystem Page. We could also add a link
>> >> to Pypi with the "aifrflow provider" classifier at the ecosystem page
>> >> as another way of searching for providers. All that is perfectly fine,
>> >> I think with the ASF Policies and spirit. And it will be good for
>> >> discovery.
>> >>
>> >> WDYT?
>> >>
>> >> J.
>> >>
>> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
>> >>>
>> >>>
>> >>> Hello!
>> >>>
>> >>> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
>> >>>
>> >>> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
>> >>>
>> >>> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
>> >>>
>> >>> Thanks,
>> >>> Samhita
>> >>>
>> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>> Hello all
>> >>>>
>> >>>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
>> >>>>
>> >>>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
>> >>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
>> >>>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
>> >>>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
>> >>>>
>> >>>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
>> >>>>
>> >>>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
>> >>>>
>> >>>>
>> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
>> >>>>>
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
>> >>>>>
>> >>>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
>> >>>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
>> >>>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
>> >>>>>
>> >>>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
>> >>>>>
>> >>>>> When it comes to accepting new providers - I like John's suggestions:
>> >>>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
>> >>>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
>> >>>>>
>> >>>>> Regards, Rafal.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
>> >>>>>>
>> >>>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
>> >>>>>>
>> >>>>>>
>> >>>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
>> >>>>>>
>> >>>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
>> >>>>>>
>> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
>> >>>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
>> >>>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
>> >>>>>>
>> >>>>>> Let me see it from all the different POVs:
>> >>>>>>
>> >>>>>>
>> >>>>>> From 1) Maintainer POV
>> >>>>>>
>> >>>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
>> >>>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
>> >>>>>>
>> >>>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
>> >>>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
>> >>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
>> >>>>>>
>> >>>>>>
>> >>>>>> From  2) User POV
>> >>>>>>
>> >>>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
>> >>>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
>> >>>>>>
>> >>>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
>> >>>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
>> >>>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
>> >>>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
>> >>>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
>> >>>>>>
>> >>>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
>> >>>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
>> >>>>>>
>> >>>>>>
>> >>>>>> From 3) "Service providers" POV
>> >>>>>>
>> >>>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
>> >>>>>>
>> >>>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
>> >>>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
>> >>>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
>> >>>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
>> >>>>>>
>> >>>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
>> >>>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
>> >>>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
>> >>>>>>
>> >>>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
>> >>>>>>
>> >>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
>> >>>>>>
>> >>>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
>> >>>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
>> >>>>>>
>> >>>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
>> >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
>> >>>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
>> >>>>>>
>> >>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
>> >>>>>>
>> >>>>>> J.
>> >>>>>>
>> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Ash what is your recommendation for the users should we follow your suggestion?
>> >>>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
>> >>>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
>> >>>>>>> This means that users will need to create their own constraints mechanism and maintain it.
>> >>>>>>>
>> >>>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
>> >>>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
>> >>>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
>> >>>>>>>
>> >>>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
>> >>>>>>>
>> >>>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
>> >>>>>>>>
>> >>>>>>>> Collin McNulty
>> >>>>>>>>
>> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
>> >>>>>>>>>
>> >>>>>>>>> 1. They are owners so they can merge and release whenever they need.
>> >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
>> >>>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
>> >>>>>>>>>
>> >>>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
>> >>>>>>>>>
>> >>>>>>>>> Cheers,
>> >>>>>>>>> Tomsk
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> With best wishes,                    Alex Ott
>> >>>> http://alexott.net/
>> >>>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Kaxil Naik <ka...@gmail.com>.
Hey all,

Another alternative is separating out core providers from the Core Airflow
Repo into a separate repo within the Apache Org itself, maybe:
apache-airflow-providers.

That will not decrease the maintenance from the Committers but the Core
work and release will be completely separate and untangled from Apache
Airflow repo and can move at a faster pace.

The benefit and compromise for the community is that all the providers are
still officially maintained and released by the committers. However, over
time we can invite more committers who show active participation in
apache-airflow-providers repo too.

This is a compromise to the arguments about Providers being integral to the
success of Airflow and as such should be maintained and released officially.

Regards,
Kaxil

On Mon, 25 Apr 2022 at 19:17, Jarek Potiuk <ja...@potiuk.com> wrote:

> > 1. https://registry.astronomer.io/
> > 2. Using the new classifier
> https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>
> Yep. precisely what I thought to place at the top of the ecosystem page.
>
> > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis"
> <fe...@amazon.com.INVALID> wrote:
> >>
> >> I still think that easy inclusion with a defined pruning process is
> best, but it's looking like that is the minority opinion.  In which case,
> IFF we are going to be keeping them separate then I definitely agree that
> there needs to be a fast/easy/convenient way to find them.
> >> ________________________________
> >> From: Jarek Potiuk <ja...@potiuk.com>
> >> Sent: Monday, April 25, 2022 7:17 AM
> >> To: dev@airflow.apache.org
> >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the
> community
> >>
> >> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >>
> >>
> >>
> >> Just to come back to it (please everyone a little patience - I think
> >> some people have not chimed in yet due to 2.3.0 "focus" so this
> >> discussion might take a little more time.
> >>
> >> My current thinking on it so far:
> >>
> >> * I am not really in the camp of "lets not add any more providers at
> >> all" and also not in the "let's accept all that are good quality code
> >> providers". I think there are a few providers which "after fulfilling
> >> all the criteria" could be added - mostly open-source standards,
> >> generic, established technologies - but it should be rather limited
> >> and rare event.
> >>
> >> * when there is a proprietary service which has not too broad reach
> >> and it's not likely that we will have some committers who will be
> >> maintaining it - becauyse they are users - the default option should
> >> be to make a standalone per-service providers. the difficulty here is
> >> to set the right "non-quality" criteria - but I think we really want
> >> to limit any new code to maintain. Here maybe we can have some more
> >> concrete criteria proposed - so that we do not have to vote
> >> individually on each proposed providers - and so that those who want
> >> to propose a provider could check themselves by reading the criteria,
> >> what's best for them.
> >>
> >> * we might improve our "providers" list at the "ecosystem" to make
> >> providers stand out a bit more (maybe simply put them on top and make
> >> a clearly visible section). We are not going to maintain and keep the
> >> nice "registry" similar to Astronomer's one (we could even actually
> >> make the link to the Astronomer registry more prominent as the way to
> >> "search" for providers on our Ecosystem Page. We could also add a link
> >> to Pypi with the "aifrflow provider" classifier at the ecosystem page
> >> as another way of searching for providers. All that is perfectly fine,
> >> I think with the ASF Policies and spirit. And it will be good for
> >> discovery.
> >>
> >> WDYT?
> >>
> >> J.
> >>
> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
> >>>
> >>>
> >>> Hello!
> >>>
> >>> The reason behind submitting Flyte provider to the Airflow repository
> is because we felt it'd be effortless for the Airflow users to use the
> integration. Moreover, since it'd be under the umbrella of Airflow, we
> estimated that the Airflow users would not hesitate from using the provider.
> >>>
> >>> We could definitely have this as a standalone provider, but the
> easy-to-get-started incentive of Airflow providers seemed like a better
> option.
> >>>
> >>> If there's a sophisticated plan in place for having standalone
> providers in PyPI, we're up for it.
> >>>
> >>> Thanks,
> >>> Samhita
> >>>
> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
> >>>>
> >>>>
> >>>> Hello all
> >>>>
> >>>> I want to try to explain a motivation behind submission of the Delta
> Sharing provider:
> >>>>
> >>>> Let me start with the fact that the original issue was created
> against Airflow repository, and it was accepted as potential new
> functionality. And discussion about new providers has started almost on the
> day when PR was submitted :-)
> >>>> Delta Sharing is the OSS project under umbrella of the Linux
> Foundation that defines a protocol and reference implementations. It was
> started by the Databricks, but has other contributors as well - that's why
> it wasn't pushed into a Databricks provider, as it's not specific to
> Databricks.
> >>>> Another thought about submitting it as a separate provider was to get
> more people interested in this functionality and build additional
> integrations on top of it.
> >>>> Another important aspect of having providers in the Airflow
> repository is that they are tested together with changes in the core of the
> Airflow.
> >>>>
> >>>> I completely understand the concerns about more maintenance effort,
> but my personal point of view (about it below) is similar to Rafal's &
> John's - if there are well defined criteria & plans for decommissioning or
> something like, then providers could be part of the releases, etc.
> >>>>
> >>>> I just want to add that although I'm employed by Databricks, I'm not
> a part of the development team - I'm in the field team who work with
> customers, sees how they are using different tools, seeing pain points,
> etc.  Most of work so far was done on my own time - I'm doing some
> coordination, but most of new functionality (AAD tokens support, Repos,
> Databricks SQL operators, etc.) is coming from seeing customers using
> Airflow together with Databricks.
> >>>>
> >>>>
> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <
> rafalbiegacz@google.com.invalid> wrote:
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I think that we will need to find some middle ground here - we are
> trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I
> would also add another 4th dimension - Airflow Service Provider, :).
> >>>>>
> >>>>> Airflow users - whether they do self-managed Airflow or use "managed
> Airflow" provided by others are beneficients of the fact that Airflow has a
> decent portfolio of providers.
> >>>>> It's not only a guarantee that these providers should work fine and
> they meet Airflow coding/testing standards. It's also a kind of guarantee,
> that once they start using Airflow
> >>>>> with providers backed by the Airflow community they won't be on
> their own when it comes to troubleshooting/updating/etc. It will be much
> easier for them to convince their companies to use Airflow for production
> use cases as the Airflow platform (core + providers) is tested/maintained
> by the Airflow community.
> >>>>>
> >>>>> Keeping providers within the Airflow repository generates
> integration and maintenance work on the Airflow community side. On the
> other hand, if this work is not done within the community then this effort
> would need to be done by all users to a certain extent. So from this
> perspective it's more optimal for the community to do it so users can use
> off-the-shelf Airflow for the majority of their use cases
> >>>>>
> >>>>> When it comes to accepting new providers - I like John's suggestions:
> >>>>> a) well defined standard that needs to be met by providers - passing
> the "provider qualification" would be some effort so each service provider
> would need to decide if it wouldn't be easier to maintain their provider on
> their own.
> >>>>> b) well define lifecycle for providers - which would allow to
> identify providers that are obsolete/not popular any more and make them
> obsolete.
> >>>>>
> >>>>> Regards, Rafal.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>> I've been thinking about it - to make up my mind a little. The good
> thing for me is that I have no strong opinion and I can rather easily see
> (or so I think) of both sides.
> >>>>>>
> >>>>>> TL;DR; I think we need an explanation from the "Service Providers"
> - what they want to achieve by contributing providers to the community and
> see if we can achieve similar results differently.
> >>>>>>
> >>>>>>
> >>>>>> Obviously I am a bit biased from the maintainer point of view, but
> since I cooperate with various stakeholders i spoke to some of them just
> see their point of view and this is what I got:
> >>>>>>
> >>>>>> Seems that we have really three  types of stakeholders that are
> really interested in "providers":
> >>>>>>
> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to
> take care about its future and development and "grand vision" of where we
> want to be in few years
> >>>>>> 2) "Users" - those who use Airflow and integration with the Service
> Provider
> >>>>>> 3) "Service providers" - those who run the services that Airflow
> integrates with - via providers (that group might also contain those
> stakeholders that run Airflow "as a service")
> >>>>>>
> >>>>>> Let me see it from all the different POVs:
> >>>>>>
> >>>>>>
> >>>>>> From 1) Maintainer POV
> >>>>>>
> >>>>>> More providers mean slower growth of the platform overall as the
> more providers we add and manage as a community, the less time we can spend
> on improving Airflow as a core.
> >>>>>> Also the vision I think we all share is that Airflow is not a
> "standalone orchestrator" any more - due to its popularity, reach and
> power, it became an "orchestrating platform" and this is the vision that
> keeps us - maintainers - busy.
> >>>>>>
> >>>>>> Over the last 2 years pretty much everything we do - make Airflow
> "more extensible". You can add custom "secrets managers". "timetables",
> "defferers" etc. "Customizability" is now built-in and "theme" of being a
> modern platform.
> >>>>>> Hell - we even recently added "Airflow Provider" trove classified
> in PyPI:
> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
> and the main justification in the discussion was that we expect MORE
> 3rd-parties to use it, rather than relying on "apache-airflow-provider"
> package name.
> >>>>>> So from maintainer POV - having 3rd-party providers as "extensions"
> to Airlow makes perfect sense and is the way to go.
> >>>>>>
> >>>>>>
> >>>>>> From  2) User POV
> >>>>>>
> >>>>>> Users want to use Airflow with all the integrations they use
> together. But only with those that they actually use. Similarly as
> maintainers - supporting and needing all 70+ providers is something they
> usually do not REALLY care about.
> >>>>>> They literally care about the few providers they use. We even
> taught the users that they can upgrade and install providers separately
> from the core. So they already know they can mix and match Airflow +
> Providers to get what they want.
> >>>>>>
> >>>>>> And they do use it - even if they use our image, the image only
> contains a handful of the providers and when they need to install
> >>>>>> new providers - they just install it from PyPI. And for that the
> difference of "community providers" vs. 3rd party providers - except the
> stamp of approval of the ASF, is not really visible.
> >>>>>> Surely they can use [extras] to install the providers but that is
> just a convenience and is definitely not needed by the users.
> >>>>>> For example when they build a custom image they usually extend
> Airflow and simply 'pip install <PROVIDER>'
> >>>>>> As long as someone makes sure that the provider can be installed on
> certain versions of Airflow - it does not matter.
> >>>>>>
> >>>>>> Also from the users perspective Airflow became "popular" enough
> that it no longer needed "more integrations" to be more "appealing" for the
> users.
> >>>>>> They already use Airflow. They like it (hopefully) and the fact
> that this or that provider is part of the community makes no difference any
> more.
> >>>>>>
> >>>>>>
> >>>>>> From 3) "Service providers" POV
> >>>>>>
> >>>>>> Here I am not sure. It's not very clear what service providers get
> from being part of the "community providers".
> >>>>>>
> >>>>>> I hear that some big service (cloud providers) find it cool that we
> give it the ASF "Stamp of Approval". And they are willing to pay the price
> of a slower merge process, dependence on the community and following strict
> rules of the ASF.
> >>>>>> And the community also is happy to pay the price of maintaining
> those (including the dependencies which Elad mention) to make sure that all
> the community providers work in concert - because those "Services" are
> hugely popular and we "want" as a community to invest there.
> >>>>>> But maintaining those  deps in sync is a huge effort and it will
> become even worse - the more we add. On the other hand for 3rd party
> providers it will be EASIER to keep up.
> >>>>>> They don't have to care about all the community providers to work
> together, they can choose a subset. And when they release their libraries
> they can take care about making sure the dependencies are not broken.
> >>>>>>
> >>>>>> There are other "drawbacks" for being a "community" provider. For
> example we have the rule that we support the min-Airflow version for
> providers from the community 12 months after Airflow release.
> >>>>>> This means that users of Airflow 2.1 will not receive updates for
> the providers after 21st of May. This is the price to pay for
> community-managed providers. We will not release bug fixes in providers or
> changes for Airflow 2.1 users after 21st of May.
> >>>>>> But if you manage your own provider - you still can support 2.0 or
> even 1.10 if you want.
> >>>>>>
> >>>>>> I cannot really see why a Service Provider would want to become an
> Airflow Community Provider.
> >>>>>>
> >>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data
> Kit, and Cloudera people think and why they think this is the best choice.
> >>>>>>
> >>>>>> I think when we understand what the  "Service Providers" want to
> achieve this way, maybe we will be able to come up with some middle ground
> and at least set some rules when it makes sense and when it does not make
> sense.
> >>>>>> Maybe 'contributing provider' is the way to achieve something else
> and we simply do not realize that in the new "Airflow as a Platform" world,
> all the stakeholders can achieve very similar results using different
> approaches.
> >>>>>>
> >>>>>> * For example we could think about how we can make it easier for
> Airflow users to discover and install their providers - without actually
> taking ownership of the code by the community.
> >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider
> pass a "compliance check" as suggested above
> >>>>>> * Or maybe we could introduce a "breeze" extension to be able to
> install and test provider in the "latest airflow" so that the service
> providers could check it before we even release airflow and dependencies
> >>>>>>
> >>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I
> think) - could you tell us (every one of you separately) - what are your
> goals when you came up with the "contribute the new provider" idea?
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org>
> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Ash what is your recommendation for the users should we follow
> your suggestion?
> >>>>>>> This means that the big big big joy of using airflow constraints
> and getting a working environment with all required providers will be no
> more.
> >>>>>>> So users will get a working "Vanilla" Airflow and then will need
> to figure out how they are going to tackle independent providers that may
> not be able to coexist one with another.
> >>>>>>> This means that users will need to create their own constraints
> mechanism and maintain it.
> >>>>>>>
> >>>>>>> From my perspective this increases the complexity of getting
> Airflow to be production ready.
> >>>>>>> I know that we say providers vs core but I think that from users
> perspective providers are an integral part of Airflow.
> >>>>>>> Having the best scheduler and the best UI is not enough. Providers
> are a crucial part that complete the set.
> >>>>>>>
> >>>>>>> Maybe eventually there should be something like a provider store
> where there can be official providers and 3rd party providers.
> >>>>>>>
> >>>>>>> This may be even greater discussion than what we are having here.
> It feels more like Airflow as a product vs Airflow as an ecosystem.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
> <co...@astronomer.io.invalid> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I
> think in an ideal world even the providers currently part of the Airflow
> repo would be managed separately. (I'm not actually suggesting removing any
> providers.) I don't think it's a matter of gatekeeping, I just think it's
> actually kind of odd to have providers in the same repo as core Airflow,
> and it increases confusion about Airflow versions vs provider package
> versions.
> >>>>>>>>
> >>>>>>>> Collin McNulty
> >>>>>>>>
> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <
> turbaszek@apache.org> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining
> the packages may streamline many aspects for providers/companies.
> >>>>>>>>>
> >>>>>>>>> 1. They are owners so they can merge and release whenever they
> need.
> >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the
> resources needed for running them.
> >>>>>>>>> 3. The development of the package can be incorporated into their
> company processes - not every company is used to OSS mode.
> >>>>>>>>>
> >>>>>>>>> Whatever way we go - we should have some basics guidelines and
> requirements (for example to brand a provider as “recommended by community”
> or something).
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Tomsk
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> With best wishes,                    Alex Ott
> >>>> http://alexott.net/
> >>>> Twitter: alexott_en (English), alexott (Russian)
>

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
> 1. https://registry.astronomer.io/
> 2. Using the new classifier https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider

Yep. precisely what I thought to place at the top of the ecosystem page.

> On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <fe...@amazon.com.INVALID> wrote:
>>
>> I still think that easy inclusion with a defined pruning process is best, but it's looking like that is the minority opinion.  In which case, IFF we are going to be keeping them separate then I definitely agree that there needs to be a fast/easy/convenient way to find them.
>> ________________________________
>> From: Jarek Potiuk <ja...@potiuk.com>
>> Sent: Monday, April 25, 2022 7:17 AM
>> To: dev@airflow.apache.org
>> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
>>
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> Just to come back to it (please everyone a little patience - I think
>> some people have not chimed in yet due to 2.3.0 "focus" so this
>> discussion might take a little more time.
>>
>> My current thinking on it so far:
>>
>> * I am not really in the camp of "lets not add any more providers at
>> all" and also not in the "let's accept all that are good quality code
>> providers". I think there are a few providers which "after fulfilling
>> all the criteria" could be added - mostly open-source standards,
>> generic, established technologies - but it should be rather limited
>> and rare event.
>>
>> * when there is a proprietary service which has not too broad reach
>> and it's not likely that we will have some committers who will be
>> maintaining it - becauyse they are users - the default option should
>> be to make a standalone per-service providers. the difficulty here is
>> to set the right "non-quality" criteria - but I think we really want
>> to limit any new code to maintain. Here maybe we can have some more
>> concrete criteria proposed - so that we do not have to vote
>> individually on each proposed providers - and so that those who want
>> to propose a provider could check themselves by reading the criteria,
>> what's best for them.
>>
>> * we might improve our "providers" list at the "ecosystem" to make
>> providers stand out a bit more (maybe simply put them on top and make
>> a clearly visible section). We are not going to maintain and keep the
>> nice "registry" similar to Astronomer's one (we could even actually
>> make the link to the Astronomer registry more prominent as the way to
>> "search" for providers on our Ecosystem Page. We could also add a link
>> to Pypi with the "aifrflow provider" classifier at the ecosystem page
>> as another way of searching for providers. All that is perfectly fine,
>> I think with the ASF Policies and spirit. And it will be good for
>> discovery.
>>
>> WDYT?
>>
>> J.
>>
>> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
>>>
>>>
>>> Hello!
>>>
>>> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
>>>
>>> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
>>>
>>> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
>>>
>>> Thanks,
>>> Samhita
>>>
>>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
>>>>
>>>>
>>>> Hello all
>>>>
>>>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
>>>>
>>>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
>>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
>>>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
>>>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
>>>>
>>>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
>>>>
>>>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
>>>>
>>>>
>>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
>>>>>
>>>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
>>>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
>>>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
>>>>>
>>>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
>>>>>
>>>>> When it comes to accepting new providers - I like John's suggestions:
>>>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
>>>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
>>>>>
>>>>> Regards, Rafal.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>
>>>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
>>>>>>
>>>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
>>>>>>
>>>>>>
>>>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
>>>>>>
>>>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
>>>>>>
>>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
>>>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
>>>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
>>>>>>
>>>>>> Let me see it from all the different POVs:
>>>>>>
>>>>>>
>>>>>> From 1) Maintainer POV
>>>>>>
>>>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
>>>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
>>>>>>
>>>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
>>>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
>>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
>>>>>>
>>>>>>
>>>>>> From  2) User POV
>>>>>>
>>>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
>>>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
>>>>>>
>>>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
>>>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
>>>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
>>>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
>>>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
>>>>>>
>>>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
>>>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
>>>>>>
>>>>>>
>>>>>> From 3) "Service providers" POV
>>>>>>
>>>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
>>>>>>
>>>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
>>>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
>>>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
>>>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
>>>>>>
>>>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
>>>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
>>>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
>>>>>>
>>>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
>>>>>>
>>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
>>>>>>
>>>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
>>>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
>>>>>>
>>>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
>>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
>>>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
>>>>>>
>>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
>>>>>>
>>>>>> J.
>>>>>>
>>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Ash what is your recommendation for the users should we follow your suggestion?
>>>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
>>>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
>>>>>>> This means that users will need to create their own constraints mechanism and maintain it.
>>>>>>>
>>>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
>>>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
>>>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
>>>>>>>
>>>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
>>>>>>>
>>>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
>>>>>>>>
>>>>>>>> Collin McNulty
>>>>>>>>
>>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
>>>>>>>>>
>>>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
>>>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
>>>>>>>>>
>>>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Tomsk
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> With best wishes,                    Alex Ott
>>>> http://alexott.net/
>>>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Ash Berlin-Taylor <as...@apache.org>.
Two fast/easy ways to find them

1. https://registry.astronomer.io/
2. Using the new classifier https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider

On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <fe...@amazon.com.INVALID> wrote:
>I still think that easy inclusion with a defined pruning process is best, but it's looking like that is the minority opinion.  In which case, IFF we are going to be keeping them separate then I definitely agree that there needs to be a fast/easy/convenient way to find them.
>
>
>________________________________________
>From: Jarek Potiuk <ja...@potiuk.com>
>Sent: Monday, April 25, 2022 7:17 AM
>To: dev@airflow.apache.org
>Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
>
>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>Just to come back to it (please everyone a little patience - I think
>some people have not chimed in yet due to 2.3.0 "focus" so this
>discussion might take a little more time.
>
>My current thinking on it so far:
>
>* I am not really in the camp of "lets not add any more providers at
>all" and also not in the "let's accept all that are good quality code
>providers". I think there are a few providers which "after fulfilling
>all the criteria" could be added - mostly open-source standards,
>generic, established technologies - but it should be rather limited
>and rare event.
>
>* when there is a proprietary service which has not too broad reach
>and it's not likely that we will have some committers who will be
>maintaining it - becauyse they are users - the default option should
>be to make a standalone per-service providers. the difficulty here is
>to set the right "non-quality" criteria - but I think we really want
>to limit any new code to maintain. Here maybe we can have some more
>concrete criteria proposed - so that we do not have to vote
>individually on each proposed providers - and so that those who want
>to propose a provider could check themselves by reading the criteria,
>what's best for them.
>
>* we might improve our "providers" list at the "ecosystem" to make
>providers stand out a bit more (maybe simply put them on top and make
>a clearly visible section). We are not going to maintain and keep the
>nice "registry" similar to Astronomer's one (we could even actually
>make the link to the Astronomer registry more prominent as the way to
>"search" for providers on our Ecosystem Page. We could also add a link
>to Pypi with the "aifrflow provider" classifier at the ecosystem page
>as another way of searching for providers. All that is perfectly fine,
>I think with the ASF Policies and spirit. And it will be good for
>discovery.
>
>WDYT?
>
>J.
>
>On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
>>
>> Hello!
>>
>> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
>>
>> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
>>
>> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
>>
>> Thanks,
>> Samhita
>>
>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
>>>
>>> Hello all
>>>
>>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
>>>
>>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
>>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
>>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
>>>
>>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
>>>
>>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
>>>
>>>
>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
>>>>
>>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
>>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
>>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
>>>>
>>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
>>>>
>>>> When it comes to accepting new providers - I like John's suggestions:
>>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
>>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
>>>>
>>>> Regards, Rafal.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>
>>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
>>>>>
>>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
>>>>>
>>>>>
>>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
>>>>>
>>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
>>>>>
>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
>>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
>>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
>>>>>
>>>>> Let me see it from all the different POVs:
>>>>>
>>>>>
>>>>> From 1) Maintainer POV
>>>>>
>>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
>>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
>>>>>
>>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
>>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
>>>>>
>>>>>
>>>>> From  2) User POV
>>>>>
>>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
>>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
>>>>>
>>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
>>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
>>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
>>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
>>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
>>>>>
>>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
>>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
>>>>>
>>>>>
>>>>> From 3) "Service providers" POV
>>>>>
>>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
>>>>>
>>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
>>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
>>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
>>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
>>>>>
>>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
>>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
>>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
>>>>>
>>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
>>>>>
>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
>>>>>
>>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
>>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
>>>>>
>>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
>>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
>>>>>
>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
>>>>>
>>>>> J.
>>>>>
>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>>>>>
>>>>>> Ash what is your recommendation for the users should we follow your suggestion?
>>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
>>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
>>>>>> This means that users will need to create their own constraints mechanism and maintain it.
>>>>>>
>>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
>>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
>>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
>>>>>>
>>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
>>>>>>
>>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
>>>>>>>
>>>>>>> Collin McNulty
>>>>>>>
>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
>>>>>>>>
>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
>>>>>>>>
>>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
>>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
>>>>>>>>
>>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tomsk
>>>
>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by "Ferruzzi, Dennis" <fe...@amazon.com.INVALID>.
I still think that easy inclusion with a defined pruning process is best, but it's looking like that is the minority opinion.  In which case, IFF we are going to be keeping them separate then I definitely agree that there needs to be a fast/easy/convenient way to find them.


________________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Monday, April 25, 2022 7:17 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Just to come back to it (please everyone a little patience - I think
some people have not chimed in yet due to 2.3.0 "focus" so this
discussion might take a little more time.

My current thinking on it so far:

* I am not really in the camp of "lets not add any more providers at
all" and also not in the "let's accept all that are good quality code
providers". I think there are a few providers which "after fulfilling
all the criteria" could be added - mostly open-source standards,
generic, established technologies - but it should be rather limited
and rare event.

* when there is a proprietary service which has not too broad reach
and it's not likely that we will have some committers who will be
maintaining it - becauyse they are users - the default option should
be to make a standalone per-service providers. the difficulty here is
to set the right "non-quality" criteria - but I think we really want
to limit any new code to maintain. Here maybe we can have some more
concrete criteria proposed - so that we do not have to vote
individually on each proposed providers - and so that those who want
to propose a provider could check themselves by reading the criteria,
what's best for them.

* we might improve our "providers" list at the "ecosystem" to make
providers stand out a bit more (maybe simply put them on top and make
a clearly visible section). We are not going to maintain and keep the
nice "registry" similar to Astronomer's one (we could even actually
make the link to the Astronomer registry more prominent as the way to
"search" for providers on our Ecosystem Page. We could also add a link
to Pypi with the "aifrflow provider" classifier at the ecosystem page
as another way of searching for providers. All that is perfectly fine,
I think with the ASF Policies and spirit. And it will be good for
discovery.

WDYT?

J.

On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
>
> Hello!
>
> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
>
> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
>
> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
>
> Thanks,
> Samhita
>
> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
>>
>> Hello all
>>
>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
>>
>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
>>
>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
>>
>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
>>
>>
>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
>>>
>>> Hi,
>>>
>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
>>>
>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
>>>
>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
>>>
>>> When it comes to accepting new providers - I like John's suggestions:
>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
>>>
>>> Regards, Rafal.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
>>>>
>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
>>>>
>>>>
>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
>>>>
>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
>>>>
>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
>>>>
>>>> Let me see it from all the different POVs:
>>>>
>>>>
>>>> From 1) Maintainer POV
>>>>
>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
>>>>
>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
>>>>
>>>>
>>>> From  2) User POV
>>>>
>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
>>>>
>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
>>>>
>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
>>>>
>>>>
>>>> From 3) "Service providers" POV
>>>>
>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
>>>>
>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
>>>>
>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
>>>>
>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
>>>>
>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
>>>>
>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
>>>>
>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
>>>>
>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
>>>>
>>>> J.
>>>>
>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>>>>
>>>>> Ash what is your recommendation for the users should we follow your suggestion?
>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
>>>>> This means that users will need to create their own constraints mechanism and maintain it.
>>>>>
>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
>>>>>
>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
>>>>>
>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
>>>>>>
>>>>>> Collin McNulty
>>>>>>
>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
>>>>>>>
>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
>>>>>>>
>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
>>>>>>>
>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Tomsk
>>
>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
Just to come back to it (please everyone a little patience - I think
some people have not chimed in yet due to 2.3.0 "focus" so this
discussion might take a little more time.

My current thinking on it so far:

* I am not really in the camp of "lets not add any more providers at
all" and also not in the "let's accept all that are good quality code
providers". I think there are a few providers which "after fulfilling
all the criteria" could be added - mostly open-source standards,
generic, established technologies - but it should be rather limited
and rare event.

* when there is a proprietary service which has not too broad reach
and it's not likely that we will have some committers who will be
maintaining it - becauyse they are users - the default option should
be to make a standalone per-service providers. the difficulty here is
to set the right "non-quality" criteria - but I think we really want
to limit any new code to maintain. Here maybe we can have some more
concrete criteria proposed - so that we do not have to vote
individually on each proposed providers - and so that those who want
to propose a provider could check themselves by reading the criteria,
what's best for them.

* we might improve our "providers" list at the "ecosystem" to make
providers stand out a bit more (maybe simply put them on top and make
a clearly visible section). We are not going to maintain and keep the
nice "registry" similar to Astronomer's one (we could even actually
make the link to the Astronomer registry more prominent as the way to
"search" for providers on our Ecosystem Page. We could also add a link
to Pypi with the "aifrflow provider" classifier at the ecosystem page
as another way of searching for providers. All that is perfectly fine,
I think with the ASF Policies and spirit. And it will be good for
discovery.

WDYT?

J.

On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <sa...@union.ai> wrote:
>
> Hello!
>
> The reason behind submitting Flyte provider to the Airflow repository is because we felt it'd be effortless for the Airflow users to use the integration. Moreover, since it'd be under the umbrella of Airflow, we estimated that the Airflow users would not hesitate from using the provider.
>
> We could definitely have this as a standalone provider, but the easy-to-get-started incentive of Airflow providers seemed like a better option.
>
> If there's a sophisticated plan in place for having standalone providers in PyPI, we're up for it.
>
> Thanks,
> Samhita
>
> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:
>>
>> Hello all
>>
>> I want to try to explain a motivation behind submission of the Delta Sharing provider:
>>
>> Let me start with the fact that the original issue was created against Airflow repository, and it was accepted as potential new functionality. And discussion about new providers has started almost on the day when PR was submitted :-)
>> Delta Sharing is the OSS project under umbrella of the Linux Foundation that defines a protocol and reference implementations. It was started by the Databricks, but has other contributors as well - that's why it wasn't pushed into a Databricks provider, as it's not specific to Databricks.
>> Another thought about submitting it as a separate provider was to get more people interested in this functionality and build additional integrations on top of it.
>> Another important aspect of having providers in the Airflow repository is that they are tested together with changes in the core of the Airflow.
>>
>> I completely understand the concerns about more maintenance effort, but my personal point of view (about it below) is similar to Rafal's & John's - if there are well defined criteria & plans for decommissioning or something like, then providers could be part of the releases, etc.
>>
>> I just want to add that although I'm employed by Databricks, I'm not a part of the development team - I'm in the field team who work with customers, sees how they are using different tools, seeing pain points, etc.  Most of work so far was done on my own time - I'm doing some coordination, but most of new functionality (AAD tokens support, Repos, Databricks SQL operators, etc.) is coming from seeing customers using Airflow together with Databricks.
>>
>>
>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz <ra...@google.com.invalid> wrote:
>>>
>>> Hi,
>>>
>>> I think that we will need to find some middle ground here - we are trying to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would also add another 4th dimension - Airflow Service Provider, :).
>>>
>>> Airflow users - whether they do self-managed Airflow or use "managed Airflow" provided by others are beneficients of the fact that Airflow has a decent portfolio of providers.
>>> It's not only a guarantee that these providers should work fine and they meet Airflow coding/testing standards. It's also a kind of guarantee, that once they start using Airflow
>>> with providers backed by the Airflow community they won't be on their own when it comes to troubleshooting/updating/etc. It will be much easier for them to convince their companies to use Airflow for production use cases as the Airflow platform (core + providers) is tested/maintained by the Airflow community.
>>>
>>> Keeping providers within the Airflow repository generates integration and maintenance work on the Airflow community side. On the other hand, if this work is not done within the community then this effort would need to be done by all users to a certain extent. So from this perspective it's more optimal for the community to do it so users can use off-the-shelf Airflow for the majority of their use cases
>>>
>>> When it comes to accepting new providers - I like John's suggestions:
>>> a) well defined standard that needs to be met by providers - passing the "provider qualification" would be some effort so each service provider would need to decide if it wouldn't be easier to maintain their provider on their own.
>>> b) well define lifecycle for providers - which would allow to identify providers that are obsolete/not popular any more and make them obsolete.
>>>
>>> Regards, Rafal.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>> I've been thinking about it - to make up my mind a little. The good thing for me is that I have no strong opinion and I can rather easily see (or so I think) of both sides.
>>>>
>>>> TL;DR; I think we need an explanation from the "Service Providers" - what they want to achieve by contributing providers to the community and see if we can achieve similar results differently.
>>>>
>>>>
>>>> Obviously I am a bit biased from the maintainer point of view, but since I cooperate with various stakeholders i spoke to some of them just see their point of view and this is what I got:
>>>>
>>>> Seems that we have really three  types of stakeholders that are really interested in "providers":
>>>>
>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take care about its future and development and "grand vision" of where we want to be in few years
>>>> 2) "Users" - those who use Airflow and integration with the Service Provider
>>>> 3) "Service providers" - those who run the services that Airflow integrates with - via providers (that group might also contain those stakeholders that run Airflow "as a service")
>>>>
>>>> Let me see it from all the different POVs:
>>>>
>>>>
>>>> From 1) Maintainer POV
>>>>
>>>> More providers mean slower growth of the platform overall as the more providers we add and manage as a community, the less time we can spend on improving Airflow as a core.
>>>> Also the vision I think we all share is that Airflow is not a "standalone orchestrator" any more - due to its popularity, reach and power, it became an "orchestrating platform" and this is the vision that keeps us - maintainers - busy.
>>>>
>>>> Over the last 2 years pretty much everything we do - make Airflow "more extensible". You can add custom "secrets managers". "timetables", "defferers" etc. "Customizability" is now built-in and "theme" of being a modern platform.
>>>> Hell - we even recently added "Airflow Provider" trove classified in PyPI: https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider and the main justification in the discussion was that we expect MORE 3rd-parties to use it, rather than relying on "apache-airflow-provider" package name.
>>>> So from maintainer POV - having 3rd-party providers as "extensions" to Airlow makes perfect sense and is the way to go.
>>>>
>>>>
>>>> From  2) User POV
>>>>
>>>> Users want to use Airflow with all the integrations they use together. But only with those that they actually use. Similarly as maintainers - supporting and needing all 70+ providers is something they usually do not REALLY care about.
>>>> They literally care about the few providers they use. We even taught the users that they can upgrade and install providers separately from the core. So they already know they can mix and match Airflow + Providers to get what they want.
>>>>
>>>> And they do use it - even if they use our image, the image only contains a handful of the providers and when they need to install
>>>> new providers - they just install it from PyPI. And for that the difference of "community providers" vs. 3rd party providers - except the stamp of approval of the ASF, is not really visible.
>>>> Surely they can use [extras] to install the providers but that is just a convenience and is definitely not needed by the users.
>>>> For example when they build a custom image they usually extend Airflow and simply 'pip install <PROVIDER>'
>>>> As long as someone makes sure that the provider can be installed on certain versions of Airflow - it does not matter.
>>>>
>>>> Also from the users perspective Airflow became "popular" enough that it no longer needed "more integrations" to be more "appealing" for the users.
>>>> They already use Airflow. They like it (hopefully) and the fact that this or that provider is part of the community makes no difference any more.
>>>>
>>>>
>>>> From 3) "Service providers" POV
>>>>
>>>> Here I am not sure. It's not very clear what service providers get from being part of the "community providers".
>>>>
>>>> I hear that some big service (cloud providers) find it cool that we give it the ASF "Stamp of Approval". And they are willing to pay the price of a slower merge process, dependence on the community and following strict rules of the ASF.
>>>> And the community also is happy to pay the price of maintaining those (including the dependencies which Elad mention) to make sure that all the community providers work in concert - because those "Services" are hugely popular and we "want" as a community to invest there.
>>>> But maintaining those  deps in sync is a huge effort and it will become even worse - the more we add. On the other hand for 3rd party providers it will be EASIER to keep up.
>>>> They don't have to care about all the community providers to work together, they can choose a subset. And when they release their libraries they can take care about making sure the dependencies are not broken.
>>>>
>>>> There are other "drawbacks" for being a "community" provider. For example we have the rule that we support the min-Airflow version for providers from the community 12 months after Airflow release.
>>>> This means that users of Airflow 2.1 will not receive updates for the providers after 21st of May. This is the price to pay for community-managed providers. We will not release bug fixes in providers or changes for Airflow 2.1 users after 21st of May.
>>>> But if you manage your own provider - you still can support 2.0 or even 1.10 if you want.
>>>>
>>>> I cannot really see why a Service Provider would want to become an Airflow Community Provider.
>>>>
>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, and Cloudera people think and why they think this is the best choice.
>>>>
>>>> I think when we understand what the  "Service Providers" want to achieve this way, maybe we will be able to come up with some middle ground and at least set some rules when it makes sense and when it does not make sense.
>>>> Maybe 'contributing provider' is the way to achieve something else and we simply do not realize that in the new "Airflow as a Platform" world, all the stakeholders can achieve very similar results using different approaches.
>>>>
>>>> * For example we could think about how we can make it easier for Airflow users to discover and install their providers - without actually taking ownership of the code by the community.
>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a "compliance check" as suggested above
>>>> * Or maybe we could introduce a "breeze" extension to be able to install and test provider in the "latest airflow" so that the service providers could check it before we even release airflow and dependencies
>>>>
>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think) - could you tell us (every one of you separately) - what are your goals when you came up with the "contribute the new provider" idea?
>>>>
>>>> J.
>>>>
>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>>>>
>>>>> Ash what is your recommendation for the users should we follow your suggestion?
>>>>> This means that the big big big joy of using airflow constraints and getting a working environment with all required providers will be no more.
>>>>> So users will get a working "Vanilla" Airflow and then will need to figure out how they are going to tackle independent providers that may not be able to coexist one with another.
>>>>> This means that users will need to create their own constraints mechanism and maintain it.
>>>>>
>>>>> From my perspective this increases the complexity of getting Airflow to be production ready.
>>>>> I know that we say providers vs core but I think that from users perspective providers are an integral part of Airflow.
>>>>> Having the best scheduler and the best UI is not enough. Providers are a crucial part that complete the set.
>>>>>
>>>>> Maybe eventually there should be something like a provider store where there can be official providers and 3rd party providers.
>>>>>
>>>>> This may be even greater discussion than what we are having here. It feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in an ideal world even the providers currently part of the Airflow repo would be managed separately. (I'm not actually suggesting removing any providers.) I don't think it's a matter of gatekeeping, I just think it's actually kind of odd to have providers in the same repo as core Airflow, and it increases confusion about Airflow versions vs provider package versions.
>>>>>>
>>>>>> Collin McNulty
>>>>>>
>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:
>>>>>>>
>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the packages may streamline many aspects for providers/companies.
>>>>>>>
>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources needed for running them.
>>>>>>> 3. The development of the package can be incorporated into their company processes - not every company is used to OSS mode.
>>>>>>>
>>>>>>> Whatever way we go - we should have some basics guidelines and requirements (for example to brand a provider as “recommended by community” or something).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Tomsk
>>
>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Samhita Alla <sa...@union.ai>.
Hello!

The reason behind submitting Flyte provider to the Airflow repository is
because we felt it'd be effortless for the Airflow users to use the
integration. Moreover, since it'd be under the umbrella of Airflow, we
estimated that the Airflow users would not hesitate from using the
provider.

We could definitely have this as a standalone provider, but the
easy-to-get-started incentive of Airflow providers seemed like a better
option.

If there's a sophisticated plan in place for having standalone providers in
PyPI, we're up for it.

Thanks,
Samhita

On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <al...@gmail.com> wrote:

> Hello all
>
> I want to try to explain a motivation behind submission of the Delta
> Sharing provider:
>
>    - Let me start with the fact that the original issue was created
>    against Airflow repository, and it was accepted as potential new
>    functionality. And discussion about new providers has started almost on the
>    day when PR was submitted :-)
>    - Delta Sharing is the OSS project under umbrella of the Linux
>    Foundation that defines a protocol and reference implementations. It was
>    started by the Databricks, but has other contributors as well - that's why
>    it wasn't pushed into a Databricks provider, as it's not specific to
>    Databricks.
>    - Another thought about submitting it as a separate provider was to
>    get more people interested in this functionality and build additional
>    integrations on top of it.
>    - Another important aspect of having providers in the Airflow
>    repository is that they are tested together with changes in the core of the
>    Airflow.
>
> I completely understand the concerns about more maintenance effort, but my
> personal point of view (about it below) is similar to Rafal's & John's - if
> there are well defined criteria & plans for decommissioning or something
> like, then providers could be part of the releases, etc.
>
> I just want to add that although I'm employed by Databricks, I'm not a
> part of the development team - I'm in the field team who work with
> customers, sees how they are using different tools, seeing pain points,
> etc.  Most of work so far was done on my own time - I'm doing some
> coordination, but most of new functionality (AAD tokens support, Repos,
> Databricks SQL operators, etc.) is coming from seeing customers using
> Airflow together with Databricks.
>
>
> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz
> <ra...@google.com.invalid> wrote:
>
>> Hi,
>>
>> I think that we will need to find some middle ground here - we are trying
>> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would
>> also add another 4th dimension - Airflow Service Provider, :).
>>
>> Airflow users - whether they do self-managed Airflow or use "managed
>> Airflow" provided by others are beneficients of the fact that Airflow has a
>> decent portfolio of providers.
>> It's not only a guarantee that these providers should work fine and they
>> meet Airflow coding/testing standards. It's also a kind of guarantee, that
>> once they start using Airflow
>> with providers backed by the Airflow community they won't be on their own
>> when it comes to troubleshooting/updating/etc. It will be much easier for
>> them to convince their companies to use Airflow for production use cases as
>> the Airflow platform (core + providers) is tested/maintained by the Airflow
>> community.
>>
>> Keeping providers within the Airflow repository generates integration and
>> maintenance work on the Airflow community side. On the other hand, if this
>> work is not done within the community then this effort would need to be
>> done by all users to a certain extent. So from this perspective it's more
>> optimal for the community to do it so users can use off-the-shelf Airflow
>> for the majority of their use cases
>>
>> When it comes to accepting new providers - I like John's suggestions:
>> a) well defined standard that needs to be met by providers - passing the
>> "provider qualification" would be some effort so each service provider
>> would need to decide if it wouldn't be easier to maintain their provider on
>> their own.
>> b) well define lifecycle for providers - which would allow to identify
>> providers that are obsolete/not popular any more and make them obsolete.
>>
>> Regards, Rafal.
>>
>>
>>
>>
>>
>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I've been thinking about it - to make up my mind a little. The good
>>> thing for me is that I have no strong opinion and I can rather easily see
>>> (or so I think) of both sides.
>>>
>>> TL;DR; I think we need an explanation from the "Service Providers" -
>>> what they want to achieve by contributing providers to the community and
>>> see if we can achieve similar results differently.
>>>
>>>
>>> Obviously I am a bit biased from the maintainer point of view, but since
>>> I cooperate with various stakeholders i spoke to some of them just see
>>> their point of view and this is what I got:
>>>
>>> Seems that we have really three  types of stakeholders that are really
>>> interested in "providers":
>>>
>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take
>>> care about its future and development and "grand vision" of where we want
>>> to be in few years
>>> 2) "Users" - those who use Airflow and integration with the Service
>>> Provider
>>> 3) "Service providers" - those who run the services that
>>> Airflow integrates with - via providers (that group might also contain
>>> those stakeholders that run Airflow "as a service")
>>>
>>> Let me see it from all the different POVs:
>>>
>>>
>>> From 1) Maintainer POV
>>>
>>> More providers mean slower growth of the platform overall as the more
>>> providers we add and manage as a community, the less time we can spend on
>>> improving Airflow as a core.
>>> Also the vision I think we all share is that Airflow is not a
>>> "standalone orchestrator" any more - due to its popularity, reach and
>>> power, it became an "orchestrating platform" and this is the vision that
>>> keeps us - maintainers - busy.
>>>
>>> Over the last 2 years pretty much everything we do - make Airflow "more
>>> extensible". You can add custom "secrets managers". "timetables",
>>> "defferers" etc. "Customizability" is now built-in and "theme" of being a
>>> modern platform.
>>> Hell - we even recently added "Airflow Provider" trove classified in
>>> PyPI:
>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>>> and the main justification in the discussion was that we expect MORE
>>> 3rd-parties to use it, rather than relying on "apache-airflow-provider"
>>> package name.
>>> So from maintainer POV - having 3rd-party providers as "extensions" to
>>> Airlow makes perfect sense and is the way to go.
>>>
>>>
>>> From  2) User POV
>>>
>>> Users want to use Airflow with all the integrations they use together.
>>> But only with those that they actually use. Similarly as maintainers -
>>> supporting and needing all 70+ providers is something they usually do not
>>> REALLY care about.
>>> They literally care about the few providers they use. We even taught the
>>> users that they can upgrade and install providers separately from the core.
>>> So they already know they can mix and match Airflow + Providers to get what
>>> they want.
>>>
>>> And they do use it - even if they use our image, the image only contains
>>> a handful of the providers and when they need to install
>>> new providers - they just install it from PyPI. And for that the
>>> difference of "community providers" vs. 3rd party providers - except the
>>> stamp of approval of the ASF, is not really visible.
>>> Surely they can use [extras] to install the providers but that is just a
>>> convenience and is definitely not needed by the users.
>>> For example when they build a custom image they usually extend Airflow
>>> and simply 'pip install <PROVIDER>'
>>> As long as someone makes sure that the provider can be installed on
>>> certain versions of Airflow - it does not matter.
>>>
>>> Also from the users perspective Airflow became "popular" enough that it
>>> no longer needed "more integrations" to be more "appealing" for the users.
>>> They already use Airflow. They like it (hopefully) and the fact that
>>> this or that provider is part of the community makes no difference any more.
>>>
>>>
>>> From 3) "Service providers" POV
>>>
>>> Here I am not sure. It's not very clear what service providers get from
>>> being part of the "community providers".
>>>
>>> I hear that some big service (cloud providers) find it cool that we give
>>> it the ASF "Stamp of Approval". And they are willing to pay the price of a
>>> slower merge process, dependence on the community and following strict
>>> rules of the ASF.
>>> And the community also is happy to pay the price of maintaining those
>>> (including the dependencies which Elad mention) to make sure that all the
>>> community providers work in concert - because those "Services" are hugely
>>> popular and we "want" as a community to invest there.
>>> But maintaining those  deps in sync is a huge effort and it will become
>>> even worse - the more we add. On the other hand for 3rd party providers it
>>> will be EASIER to keep up.
>>> They don't have to care about all the community providers to work
>>> together, they can choose a subset. And when they release their libraries
>>> they can take care about making sure the dependencies are not broken.
>>>
>>> There are other "drawbacks" for being a "community" provider. For
>>> example we have the rule that we support the min-Airflow version for
>>> providers from the community 12 months after Airflow release.
>>> This means that users of Airflow 2.1 will not receive updates for the
>>> providers after 21st of May. This is the price to pay for community-managed
>>> providers. We will not release bug fixes in providers or changes for
>>> Airflow 2.1 users after 21st of May.
>>> But if you manage your own provider - you still can support 2.0 or even
>>> 1.10 if you want.
>>>
>>> I cannot really see why a Service Provider would want to become an
>>> Airflow Community Provider.
>>>
>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit,
>>> and Cloudera people think and why they think this is the best choice.
>>>
>>> I think when we understand what the  "Service Providers" want to achieve
>>> this way, maybe we will be able to come up with some middle ground and at
>>> least set some rules when it makes sense and when it does not make sense.
>>> Maybe 'contributing provider' is the way to achieve something else and
>>> we simply do not realize that in the new "Airflow as a Platform" world, all
>>> the stakeholders can achieve very similar results using different
>>> approaches.
>>>
>>> * For example we could think about how we can make it easier for Airflow
>>> users to discover and install their providers - without actually taking
>>> ownership of the code by the community.
>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a
>>> "compliance check" as suggested above
>>> * Or maybe we could introduce a "breeze" extension to be able to install
>>> and test provider in the "latest airflow" so that the service providers
>>> could check it before we even release airflow and dependencies
>>>
>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I
>>> think) - could you tell us (every one of you separately) - what are your
>>> goals when you came up with the "contribute the new provider" idea?
>>>
>>> J.
>>>
>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>>
>>>> Ash what is your recommendation for the users should we follow your
>>>> suggestion?
>>>> This means that the big big big joy of using airflow constraints and
>>>> getting a working environment with all required providers will be no more.
>>>> So users will get a working "Vanilla" Airflow and then will need to
>>>> figure out how they are going to tackle independent providers that may not
>>>> be able to coexist one with another.
>>>> This means that users will need to create their own constraints
>>>> mechanism and maintain it.
>>>>
>>>> From my perspective this increases the complexity of getting Airflow to
>>>> be production ready.
>>>> I know that we say providers vs core but I think that from users
>>>> perspective providers are an integral part of Airflow.
>>>> Having the best scheduler and the best UI is not enough. Providers are
>>>> a crucial part that complete the set.
>>>>
>>>> Maybe eventually there should be something like a provider store where
>>>> there can be official providers and 3rd party providers.
>>>>
>>>> This may be even greater discussion than what we are having here. It
>>>> feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
>>>> <co...@astronomer.io.invalid> wrote:
>>>>
>>>>> I agree with Ash and Tomasz. If it were not for the history, I think
>>>>> in an ideal world even the providers currently part of the Airflow repo
>>>>> would be managed separately. (I'm not actually suggesting removing any
>>>>> providers.) I don't think it's a matter of gatekeeping, I just think it's
>>>>> actually kind of odd to have providers in the same repo as core Airflow,
>>>>> and it increases confusion about Airflow versions vs provider package
>>>>> versions.
>>>>>
>>>>> Collin McNulty
>>>>>
>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I’m leaning toward Ash approach. Having providers maintaining the
>>>>>> packages may streamline many aspects for providers/companies.
>>>>>>
>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>> 2. It’s easier for them to add E2E tests and manage the resources
>>>>>> needed for running them.
>>>>>> 3. The development of the package can be incorporated into their
>>>>>> company processes - not every company is used to OSS mode.
>>>>>>
>>>>>> Whatever way we go - we should have some basics guidelines and
>>>>>> requirements (for example to brand a provider as “recommended by community”
>>>>>> or something).
>>>>>>
>>>>>> Cheers,
>>>>>> Tomsk
>>>>>>
>>>>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: [DISCUSS] Approach for new providers of the community

Posted by Alex Ott <al...@gmail.com>.
Hello all

I want to try to explain a motivation behind submission of the Delta
Sharing provider:

   - Let me start with the fact that the original issue was created against
   Airflow repository, and it was accepted as potential new functionality. And
   discussion about new providers has started almost on the day when PR was
   submitted :-)
   - Delta Sharing is the OSS project under umbrella of the Linux
   Foundation that defines a protocol and reference implementations. It was
   started by the Databricks, but has other contributors as well - that's why
   it wasn't pushed into a Databricks provider, as it's not specific to
   Databricks.
   - Another thought about submitting it as a separate provider was to get
   more people interested in this functionality and build additional
   integrations on top of it.
   - Another important aspect of having providers in the Airflow repository
   is that they are tested together with changes in the core of the Airflow.

I completely understand the concerns about more maintenance effort, but my
personal point of view (about it below) is similar to Rafal's & John's - if
there are well defined criteria & plans for decommissioning or something
like, then providers could be part of the releases, etc.

I just want to add that although I'm employed by Databricks, I'm not a part
of the development team - I'm in the field team who work with customers,
sees how they are using different tools, seeing pain points, etc.  Most of
work so far was done on my own time - I'm doing some coordination, but most
of new functionality (AAD tokens support, Repos, Databricks SQL operators,
etc.) is coming from seeing customers using Airflow together with
Databricks.


On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz
<ra...@google.com.invalid> wrote:

> Hi,
>
> I think that we will need to find some middle ground here - we are trying
> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would
> also add another 4th dimension - Airflow Service Provider, :).
>
> Airflow users - whether they do self-managed Airflow or use "managed
> Airflow" provided by others are beneficients of the fact that Airflow has a
> decent portfolio of providers.
> It's not only a guarantee that these providers should work fine and they
> meet Airflow coding/testing standards. It's also a kind of guarantee, that
> once they start using Airflow
> with providers backed by the Airflow community they won't be on their own
> when it comes to troubleshooting/updating/etc. It will be much easier for
> them to convince their companies to use Airflow for production use cases as
> the Airflow platform (core + providers) is tested/maintained by the Airflow
> community.
>
> Keeping providers within the Airflow repository generates integration and
> maintenance work on the Airflow community side. On the other hand, if this
> work is not done within the community then this effort would need to be
> done by all users to a certain extent. So from this perspective it's more
> optimal for the community to do it so users can use off-the-shelf Airflow
> for the majority of their use cases
>
> When it comes to accepting new providers - I like John's suggestions:
> a) well defined standard that needs to be met by providers - passing the
> "provider qualification" would be some effort so each service provider
> would need to decide if it wouldn't be easier to maintain their provider on
> their own.
> b) well define lifecycle for providers - which would allow to identify
> providers that are obsolete/not popular any more and make them obsolete.
>
> Regards, Rafal.
>
>
>
>
>
> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> I've been thinking about it - to make up my mind a little. The good thing
>> for me is that I have no strong opinion and I can rather easily see (or so
>> I think) of both sides.
>>
>> TL;DR; I think we need an explanation from the "Service Providers" - what
>> they want to achieve by contributing providers to the community and see if
>> we can achieve similar results differently.
>>
>>
>> Obviously I am a bit biased from the maintainer point of view, but since
>> I cooperate with various stakeholders i spoke to some of them just see
>> their point of view and this is what I got:
>>
>> Seems that we have really three  types of stakeholders that are really
>> interested in "providers":
>>
>> 1) "Maintainers" - those who mostly maintain Airflow and have to take
>> care about its future and development and "grand vision" of where we want
>> to be in few years
>> 2) "Users" - those who use Airflow and integration with the Service
>> Provider
>> 3) "Service providers" - those who run the services that
>> Airflow integrates with - via providers (that group might also contain
>> those stakeholders that run Airflow "as a service")
>>
>> Let me see it from all the different POVs:
>>
>>
>> From 1) Maintainer POV
>>
>> More providers mean slower growth of the platform overall as the more
>> providers we add and manage as a community, the less time we can spend on
>> improving Airflow as a core.
>> Also the vision I think we all share is that Airflow is not a "standalone
>> orchestrator" any more - due to its popularity, reach and power, it became
>> an "orchestrating platform" and this is the vision that keeps us -
>> maintainers - busy.
>>
>> Over the last 2 years pretty much everything we do - make Airflow "more
>> extensible". You can add custom "secrets managers". "timetables",
>> "defferers" etc. "Customizability" is now built-in and "theme" of being a
>> modern platform.
>> Hell - we even recently added "Airflow Provider" trove classified in
>> PyPI:
>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>> and the main justification in the discussion was that we expect MORE
>> 3rd-parties to use it, rather than relying on "apache-airflow-provider"
>> package name.
>> So from maintainer POV - having 3rd-party providers as "extensions" to
>> Airlow makes perfect sense and is the way to go.
>>
>>
>> From  2) User POV
>>
>> Users want to use Airflow with all the integrations they use together.
>> But only with those that they actually use. Similarly as maintainers -
>> supporting and needing all 70+ providers is something they usually do not
>> REALLY care about.
>> They literally care about the few providers they use. We even taught the
>> users that they can upgrade and install providers separately from the core.
>> So they already know they can mix and match Airflow + Providers to get what
>> they want.
>>
>> And they do use it - even if they use our image, the image only contains
>> a handful of the providers and when they need to install
>> new providers - they just install it from PyPI. And for that the
>> difference of "community providers" vs. 3rd party providers - except the
>> stamp of approval of the ASF, is not really visible.
>> Surely they can use [extras] to install the providers but that is just a
>> convenience and is definitely not needed by the users.
>> For example when they build a custom image they usually extend Airflow
>> and simply 'pip install <PROVIDER>'
>> As long as someone makes sure that the provider can be installed on
>> certain versions of Airflow - it does not matter.
>>
>> Also from the users perspective Airflow became "popular" enough that it
>> no longer needed "more integrations" to be more "appealing" for the users.
>> They already use Airflow. They like it (hopefully) and the fact that this
>> or that provider is part of the community makes no difference any more.
>>
>>
>> From 3) "Service providers" POV
>>
>> Here I am not sure. It's not very clear what service providers get from
>> being part of the "community providers".
>>
>> I hear that some big service (cloud providers) find it cool that we give
>> it the ASF "Stamp of Approval". And they are willing to pay the price of a
>> slower merge process, dependence on the community and following strict
>> rules of the ASF.
>> And the community also is happy to pay the price of maintaining those
>> (including the dependencies which Elad mention) to make sure that all the
>> community providers work in concert - because those "Services" are hugely
>> popular and we "want" as a community to invest there.
>> But maintaining those  deps in sync is a huge effort and it will become
>> even worse - the more we add. On the other hand for 3rd party providers it
>> will be EASIER to keep up.
>> They don't have to care about all the community providers to work
>> together, they can choose a subset. And when they release their libraries
>> they can take care about making sure the dependencies are not broken.
>>
>> There are other "drawbacks" for being a "community" provider. For example
>> we have the rule that we support the min-Airflow version for providers from
>> the community 12 months after Airflow release.
>> This means that users of Airflow 2.1 will not receive updates for the
>> providers after 21st of May. This is the price to pay for community-managed
>> providers. We will not release bug fixes in providers or changes for
>> Airflow 2.1 users after 21st of May.
>> But if you manage your own provider - you still can support 2.0 or even
>> 1.10 if you want.
>>
>> I cannot really see why a Service Provider would want to become an
>> Airflow Community Provider.
>>
>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit,
>> and Cloudera people think and why they think this is the best choice.
>>
>> I think when we understand what the  "Service Providers" want to achieve
>> this way, maybe we will be able to come up with some middle ground and at
>> least set some rules when it makes sense and when it does not make sense.
>> Maybe 'contributing provider' is the way to achieve something else and we
>> simply do not realize that in the new "Airflow as a Platform" world, all
>> the stakeholders can achieve very similar results using different
>> approaches.
>>
>> * For example we could think about how we can make it easier for Airflow
>> users to discover and install their providers - without actually taking
>> ownership of the code by the community.
>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a
>> "compliance check" as suggested above
>> * Or maybe we could introduce a "breeze" extension to be able to install
>> and test provider in the "latest airflow" so that the service providers
>> could check it before we even release airflow and dependencies
>>
>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I
>> think) - could you tell us (every one of you separately) - what are your
>> goals when you came up with the "contribute the new provider" idea?
>>
>> J.
>>
>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>>
>>> Ash what is your recommendation for the users should we follow your
>>> suggestion?
>>> This means that the big big big joy of using airflow constraints and
>>> getting a working environment with all required providers will be no more.
>>> So users will get a working "Vanilla" Airflow and then will need to
>>> figure out how they are going to tackle independent providers that may not
>>> be able to coexist one with another.
>>> This means that users will need to create their own constraints
>>> mechanism and maintain it.
>>>
>>> From my perspective this increases the complexity of getting Airflow to
>>> be production ready.
>>> I know that we say providers vs core but I think that from users
>>> perspective providers are an integral part of Airflow.
>>> Having the best scheduler and the best UI is not enough. Providers are a
>>> crucial part that complete the set.
>>>
>>> Maybe eventually there should be something like a provider store where
>>> there can be official providers and 3rd party providers.
>>>
>>> This may be even greater discussion than what we are having here. It
>>> feels more like Airflow as a product vs Airflow as an ecosystem.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
>>> <co...@astronomer.io.invalid> wrote:
>>>
>>>> I agree with Ash and Tomasz. If it were not for the history, I think in
>>>> an ideal world even the providers currently part of the Airflow repo would
>>>> be managed separately. (I'm not actually suggesting removing any
>>>> providers.) I don't think it's a matter of gatekeeping, I just think it's
>>>> actually kind of odd to have providers in the same repo as core Airflow,
>>>> and it increases confusion about Airflow versions vs provider package
>>>> versions.
>>>>
>>>> Collin McNulty
>>>>
>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org>
>>>> wrote:
>>>>
>>>>> I’m leaning toward Ash approach. Having providers maintaining the
>>>>> packages may streamline many aspects for providers/companies.
>>>>>
>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>> 2. It’s easier for them to add E2E tests and manage the resources
>>>>> needed for running them.
>>>>> 3. The development of the package can be incorporated into their
>>>>> company processes - not every company is used to OSS mode.
>>>>>
>>>>> Whatever way we go - we should have some basics guidelines and
>>>>> requirements (for example to brand a provider as “recommended by community”
>>>>> or something).
>>>>>
>>>>> Cheers,
>>>>> Tomsk
>>>>>
>>>>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: [DISCUSS] Approach for new providers of the community

Posted by Rafal Biegacz <ra...@google.com.INVALID>.
Hi,

I think that we will need to find some middle ground here - we are trying
to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would
also add another 4th dimension - Airflow Service Provider, :).

Airflow users - whether they do self-managed Airflow or use "managed
Airflow" provided by others are beneficients of the fact that Airflow has a
decent portfolio of providers.
It's not only a guarantee that these providers should work fine and they
meet Airflow coding/testing standards. It's also a kind of guarantee, that
once they start using Airflow
with providers backed by the Airflow community they won't be on their own
when it comes to troubleshooting/updating/etc. It will be much easier for
them to convince their companies to use Airflow for production use cases as
the Airflow platform (core + providers) is tested/maintained by the Airflow
community.

Keeping providers within the Airflow repository generates integration and
maintenance work on the Airflow community side. On the other hand, if this
work is not done within the community then this effort would need to be
done by all users to a certain extent. So from this perspective it's more
optimal for the community to do it so users can use off-the-shelf Airflow
for the majority of their use cases

When it comes to accepting new providers - I like John's suggestions:
a) well defined standard that needs to be met by providers - passing the
"provider qualification" would be some effort so each service provider
would need to decide if it wouldn't be easier to maintain their provider on
their own.
b) well define lifecycle for providers - which would allow to identify
providers that are obsolete/not popular any more and make them obsolete.

Regards, Rafal.





On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I've been thinking about it - to make up my mind a little. The good thing
> for me is that I have no strong opinion and I can rather easily see (or so
> I think) of both sides.
>
> TL;DR; I think we need an explanation from the "Service Providers" - what
> they want to achieve by contributing providers to the community and see if
> we can achieve similar results differently.
>
>
> Obviously I am a bit biased from the maintainer point of view, but since I
> cooperate with various stakeholders i spoke to some of them just see their
> point of view and this is what I got:
>
> Seems that we have really three  types of stakeholders that are really
> interested in "providers":
>
> 1) "Maintainers" - those who mostly maintain Airflow and have to take care
> about its future and development and "grand vision" of where we want to be
> in few years
> 2) "Users" - those who use Airflow and integration with the Service
> Provider
> 3) "Service providers" - those who run the services that
> Airflow integrates with - via providers (that group might also contain
> those stakeholders that run Airflow "as a service")
>
> Let me see it from all the different POVs:
>
>
> From 1) Maintainer POV
>
> More providers mean slower growth of the platform overall as the more
> providers we add and manage as a community, the less time we can spend on
> improving Airflow as a core.
> Also the vision I think we all share is that Airflow is not a "standalone
> orchestrator" any more - due to its popularity, reach and power, it became
> an "orchestrating platform" and this is the vision that keeps us -
> maintainers - busy.
>
> Over the last 2 years pretty much everything we do - make Airflow "more
> extensible". You can add custom "secrets managers". "timetables",
> "defferers" etc. "Customizability" is now built-in and "theme" of being a
> modern platform.
> Hell - we even recently added "Airflow Provider" trove classified in PyPI:
> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
> and the main justification in the discussion was that we expect MORE
> 3rd-parties to use it, rather than relying on "apache-airflow-provider"
> package name.
> So from maintainer POV - having 3rd-party providers as "extensions" to
> Airlow makes perfect sense and is the way to go.
>
>
> From  2) User POV
>
> Users want to use Airflow with all the integrations they use together. But
> only with those that they actually use. Similarly as maintainers -
> supporting and needing all 70+ providers is something they usually do not
> REALLY care about.
> They literally care about the few providers they use. We even taught the
> users that they can upgrade and install providers separately from the core.
> So they already know they can mix and match Airflow + Providers to get what
> they want.
>
> And they do use it - even if they use our image, the image only contains a
> handful of the providers and when they need to install
> new providers - they just install it from PyPI. And for that the
> difference of "community providers" vs. 3rd party providers - except the
> stamp of approval of the ASF, is not really visible.
> Surely they can use [extras] to install the providers but that is just a
> convenience and is definitely not needed by the users.
> For example when they build a custom image they usually extend Airflow and
> simply 'pip install <PROVIDER>'
> As long as someone makes sure that the provider can be installed on
> certain versions of Airflow - it does not matter.
>
> Also from the users perspective Airflow became "popular" enough that it no
> longer needed "more integrations" to be more "appealing" for the users.
> They already use Airflow. They like it (hopefully) and the fact that this
> or that provider is part of the community makes no difference any more.
>
>
> From 3) "Service providers" POV
>
> Here I am not sure. It's not very clear what service providers get from
> being part of the "community providers".
>
> I hear that some big service (cloud providers) find it cool that we give
> it the ASF "Stamp of Approval". And they are willing to pay the price of a
> slower merge process, dependence on the community and following strict
> rules of the ASF.
> And the community also is happy to pay the price of maintaining those
> (including the dependencies which Elad mention) to make sure that all the
> community providers work in concert - because those "Services" are hugely
> popular and we "want" as a community to invest there.
> But maintaining those  deps in sync is a huge effort and it will become
> even worse - the more we add. On the other hand for 3rd party providers it
> will be EASIER to keep up.
> They don't have to care about all the community providers to work
> together, they can choose a subset. And when they release their libraries
> they can take care about making sure the dependencies are not broken.
>
> There are other "drawbacks" for being a "community" provider. For example
> we have the rule that we support the min-Airflow version for providers from
> the community 12 months after Airflow release.
> This means that users of Airflow 2.1 will not receive updates for the
> providers after 21st of May. This is the price to pay for community-managed
> providers. We will not release bug fixes in providers or changes for
> Airflow 2.1 users after 21st of May.
> But if you manage your own provider - you still can support 2.0 or even
> 1.10 if you want.
>
> I cannot really see why a Service Provider would want to become an Airflow
> Community Provider.
>
> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit,
> and Cloudera people think and why they think this is the best choice.
>
> I think when we understand what the  "Service Providers" want to achieve
> this way, maybe we will be able to come up with some middle ground and at
> least set some rules when it makes sense and when it does not make sense.
> Maybe 'contributing provider' is the way to achieve something else and we
> simply do not realize that in the new "Airflow as a Platform" world, all
> the stakeholders can achieve very similar results using different
> approaches.
>
> * For example we could think about how we can make it easier for Airflow
> users to discover and install their providers - without actually taking
> ownership of the code by the community.
> * Or maybe we could introduce a tool to make a 3rd-party provider pass a
> "compliance check" as suggested above
> * Or maybe we could introduce a "breeze" extension to be able to install
> and test provider in the "latest airflow" so that the service providers
> could check it before we even release airflow and dependencies
>
> So what I think we really need -  Alex, Samhita, Andon, Philippe (I think)
> - could you tell us (every one of you separately) - what are your goals
> when you came up with the "contribute the new provider" idea?
>
> J.
>
> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:
>
>> Ash what is your recommendation for the users should we follow your
>> suggestion?
>> This means that the big big big joy of using airflow constraints and
>> getting a working environment with all required providers will be no more.
>> So users will get a working "Vanilla" Airflow and then will need to
>> figure out how they are going to tackle independent providers that may not
>> be able to coexist one with another.
>> This means that users will need to create their own constraints mechanism
>> and maintain it.
>>
>> From my perspective this increases the complexity of getting Airflow to
>> be production ready.
>> I know that we say providers vs core but I think that from users
>> perspective providers are an integral part of Airflow.
>> Having the best scheduler and the best UI is not enough. Providers are a
>> crucial part that complete the set.
>>
>> Maybe eventually there should be something like a provider store where
>> there can be official providers and 3rd party providers.
>>
>> This may be even greater discussion than what we are having here. It
>> feels more like Airflow as a product vs Airflow as an ecosystem.
>>
>>
>>
>>
>>
>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
>> <co...@astronomer.io.invalid> wrote:
>>
>>> I agree with Ash and Tomasz. If it were not for the history, I think in
>>> an ideal world even the providers currently part of the Airflow repo would
>>> be managed separately. (I'm not actually suggesting removing any
>>> providers.) I don't think it's a matter of gatekeeping, I just think it's
>>> actually kind of odd to have providers in the same repo as core Airflow,
>>> and it increases confusion about Airflow versions vs provider package
>>> versions.
>>>
>>> Collin McNulty
>>>
>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org>
>>> wrote:
>>>
>>>> I’m leaning toward Ash approach. Having providers maintaining the
>>>> packages may streamline many aspects for providers/companies.
>>>>
>>>> 1. They are owners so they can merge and release whenever they need.
>>>> 2. It’s easier for them to add E2E tests and manage the resources
>>>> needed for running them.
>>>> 3. The development of the package can be incorporated into their
>>>> company processes - not every company is used to OSS mode.
>>>>
>>>> Whatever way we go - we should have some basics guidelines and
>>>> requirements (for example to brand a provider as “recommended by community”
>>>> or something).
>>>>
>>>> Cheers,
>>>> Tomsk
>>>>
>>>

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
I've been thinking about it - to make up my mind a little. The good thing
for me is that I have no strong opinion and I can rather easily see (or so
I think) of both sides.

TL;DR; I think we need an explanation from the "Service Providers" - what
they want to achieve by contributing providers to the community and see if
we can achieve similar results differently.


Obviously I am a bit biased from the maintainer point of view, but since I
cooperate with various stakeholders i spoke to some of them just see their
point of view and this is what I got:

Seems that we have really three  types of stakeholders that are really
interested in "providers":

1) "Maintainers" - those who mostly maintain Airflow and have to take care
about its future and development and "grand vision" of where we want to be
in few years
2) "Users" - those who use Airflow and integration with the Service Provider
3) "Service providers" - those who run the services that Airflow integrates
with - via providers (that group might also contain those stakeholders that
run Airflow "as a service")

Let me see it from all the different POVs:


From 1) Maintainer POV

More providers mean slower growth of the platform overall as the more
providers we add and manage as a community, the less time we can spend on
improving Airflow as a core.
Also the vision I think we all share is that Airflow is not a "standalone
orchestrator" any more - due to its popularity, reach and power, it became
an "orchestrating platform" and this is the vision that keeps us -
maintainers - busy.

Over the last 2 years pretty much everything we do - make Airflow "more
extensible". You can add custom "secrets managers". "timetables",
"defferers" etc. "Customizability" is now built-in and "theme" of being a
modern platform.
Hell - we even recently added "Airflow Provider" trove classified in PyPI:
https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
and the main justification in the discussion was that we expect MORE
3rd-parties to use it, rather than relying on "apache-airflow-provider"
package name.
So from maintainer POV - having 3rd-party providers as "extensions" to
Airlow makes perfect sense and is the way to go.


From  2) User POV

Users want to use Airflow with all the integrations they use together. But
only with those that they actually use. Similarly as maintainers -
supporting and needing all 70+ providers is something they usually do not
REALLY care about.
They literally care about the few providers they use. We even taught the
users that they can upgrade and install providers separately from the core.
So they already know they can mix and match Airflow + Providers to get what
they want.

And they do use it - even if they use our image, the image only contains a
handful of the providers and when they need to install
new providers - they just install it from PyPI. And for that the difference
of "community providers" vs. 3rd party providers - except the stamp of
approval of the ASF, is not really visible.
Surely they can use [extras] to install the providers but that is just a
convenience and is definitely not needed by the users.
For example when they build a custom image they usually extend Airflow and
simply 'pip install <PROVIDER>'
As long as someone makes sure that the provider can be installed on certain
versions of Airflow - it does not matter.

Also from the users perspective Airflow became "popular" enough that it no
longer needed "more integrations" to be more "appealing" for the users.
They already use Airflow. They like it (hopefully) and the fact that this
or that provider is part of the community makes no difference any more.


From 3) "Service providers" POV

Here I am not sure. It's not very clear what service providers get from
being part of the "community providers".

I hear that some big service (cloud providers) find it cool that we give it
the ASF "Stamp of Approval". And they are willing to pay the price of a
slower merge process, dependence on the community and following strict
rules of the ASF.
And the community also is happy to pay the price of maintaining those
(including the dependencies which Elad mention) to make sure that all the
community providers work in concert - because those "Services" are hugely
popular and we "want" as a community to invest there.
But maintaining those  deps in sync is a huge effort and it will become
even worse - the more we add. On the other hand for 3rd party providers it
will be EASIER to keep up.
They don't have to care about all the community providers to work together,
they can choose a subset. And when they release their libraries they can
take care about making sure the dependencies are not broken.

There are other "drawbacks" for being a "community" provider. For example
we have the rule that we support the min-Airflow version for providers from
the community 12 months after Airflow release.
This means that users of Airflow 2.1 will not receive updates for the
providers after 21st of May. This is the price to pay for community-managed
providers. We will not release bug fixes in providers or changes for
Airflow 2.1 users after 21st of May.
But if you manage your own provider - you still can support 2.0 or even
1.10 if you want.

I cannot really see why a Service Provider would want to become an Airflow
Community Provider.

And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit,
and Cloudera people think and why they think this is the best choice.

I think when we understand what the  "Service Providers" want to achieve
this way, maybe we will be able to come up with some middle ground and at
least set some rules when it makes sense and when it does not make sense.
Maybe 'contributing provider' is the way to achieve something else and we
simply do not realize that in the new "Airflow as a Platform" world, all
the stakeholders can achieve very similar results using different
approaches.

* For example we could think about how we can make it easier for Airflow
users to discover and install their providers - without actually taking
ownership of the code by the community.
* Or maybe we could introduce a tool to make a 3rd-party provider pass a
"compliance check" as suggested above
* Or maybe we could introduce a "breeze" extension to be able to install
and test provider in the "latest airflow" so that the service providers
could check it before we even release airflow and dependencies

So what I think we really need -  Alex, Samhita, Andon, Philippe (I think)
- could you tell us (every one of you separately) - what are your goals
when you came up with the "contribute the new provider" idea?

J.

On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <el...@apache.org> wrote:

> Ash what is your recommendation for the users should we follow your
> suggestion?
> This means that the big big big joy of using airflow constraints and
> getting a working environment with all required providers will be no more.
> So users will get a working "Vanilla" Airflow and then will need to figure
> out how they are going to tackle independent providers that may not be able
> to coexist one with another.
> This means that users will need to create their own constraints mechanism
> and maintain it.
>
> From my perspective this increases the complexity of getting Airflow to be
> production ready.
> I know that we say providers vs core but I think that from users
> perspective providers are an integral part of Airflow.
> Having the best scheduler and the best UI is not enough. Providers are a
> crucial part that complete the set.
>
> Maybe eventually there should be something like a provider store where
> there can be official providers and 3rd party providers.
>
> This may be even greater discussion than what we are having here. It feels
> more like Airflow as a product vs Airflow as an ecosystem.
>
>
>
>
>
> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
> <co...@astronomer.io.invalid> wrote:
>
>> I agree with Ash and Tomasz. If it were not for the history, I think in
>> an ideal world even the providers currently part of the Airflow repo would
>> be managed separately. (I'm not actually suggesting removing any
>> providers.) I don't think it's a matter of gatekeeping, I just think it's
>> actually kind of odd to have providers in the same repo as core Airflow,
>> and it increases confusion about Airflow versions vs provider package
>> versions.
>>
>> Collin McNulty
>>
>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org>
>> wrote:
>>
>>> I’m leaning toward Ash approach. Having providers maintaining the
>>> packages may streamline many aspects for providers/companies.
>>>
>>> 1. They are owners so they can merge and release whenever they need.
>>> 2. It’s easier for them to add E2E tests and manage the resources needed
>>> for running them.
>>> 3. The development of the package can be incorporated into their company
>>> processes - not every company is used to OSS mode.
>>>
>>> Whatever way we go - we should have some basics guidelines and
>>> requirements (for example to brand a provider as “recommended by community”
>>> or something).
>>>
>>> Cheers,
>>> Tomsk
>>>
>>

Re: [DISCUSS] Approach for new providers of the community

Posted by Elad Kalif <el...@apache.org>.
Ash what is your recommendation for the users should we follow your
suggestion?
This means that the big big big joy of using airflow constraints and
getting a working environment with all required providers will be no more.
So users will get a working "Vanilla" Airflow and then will need to figure
out how they are going to tackle independent providers that may not be able
to coexist one with another.
This means that users will need to create their own constraints mechanism
and maintain it.

From my perspective this increases the complexity of getting Airflow to be
production ready.
I know that we say providers vs core but I think that from users
perspective providers are an integral part of Airflow.
Having the best scheduler and the best UI is not enough. Providers are a
crucial part that complete the set.

Maybe eventually there should be something like a provider store where
there can be official providers and 3rd party providers.

This may be even greater discussion than what we are having here. It feels
more like Airflow as a product vs Airflow as an ecosystem.





On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty <co...@astronomer.io.invalid>
wrote:

> I agree with Ash and Tomasz. If it were not for the history, I think in an
> ideal world even the providers currently part of the Airflow repo would be
> managed separately. (I'm not actually suggesting removing any providers.) I
> don't think it's a matter of gatekeeping, I just think it's actually kind
> of odd to have providers in the same repo as core Airflow, and it increases
> confusion about Airflow versions vs provider package versions.
>
> Collin McNulty
>
> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org>
> wrote:
>
>> I’m leaning toward Ash approach. Having providers maintaining the
>> packages may streamline many aspects for providers/companies.
>>
>> 1. They are owners so they can merge and release whenever they need.
>> 2. It’s easier for them to add E2E tests and manage the resources needed
>> for running them.
>> 3. The development of the package can be incorporated into their company
>> processes - not every company is used to OSS mode.
>>
>> Whatever way we go - we should have some basics guidelines and
>> requirements (for example to brand a provider as “recommended by community”
>> or something).
>>
>> Cheers,
>> Tomsk
>>
>

Re: [DISCUSS] Approach for new providers of the community

Posted by Collin McNulty <co...@astronomer.io.INVALID>.
I agree with Ash and Tomasz. If it were not for the history, I think in an
ideal world even the providers currently part of the Airflow repo would be
managed separately. (I'm not actually suggesting removing any providers.) I
don't think it's a matter of gatekeeping, I just think it's actually kind
of odd to have providers in the same repo as core Airflow, and it increases
confusion about Airflow versions vs provider package versions.

Collin McNulty

On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <tu...@apache.org> wrote:

> I’m leaning toward Ash approach. Having providers maintaining the packages
> may streamline many aspects for providers/companies.
>
> 1. They are owners so they can merge and release whenever they need.
> 2. It’s easier for them to add E2E tests and manage the resources needed
> for running them.
> 3. The development of the package can be incorporated into their company
> processes - not every company is used to OSS mode.
>
> Whatever way we go - we should have some basics guidelines and
> requirements (for example to brand a provider as “recommended by community”
> or something).
>
> Cheers,
> Tomsk
>

Re: [DISCUSS] Approach for new providers of the community

Posted by Tomasz Urbaszek <tu...@apache.org>.
I’m leaning toward Ash approach. Having providers maintaining the packages
may streamline many aspects for providers/companies.

1. They are owners so they can merge and release whenever they need.
2. It’s easier for them to add E2E tests and manage the resources needed
for running them.
3. The development of the package can be incorporated into their company
processes - not every company is used to OSS mode.

Whatever way we go - we should have some basics guidelines and requirements
(for example to brand a provider as “recommended by community” or
something).

Cheers,
Tomsk

Re: [DISCUSS] Approach for new providers of the community

Posted by Ash Berlin-Taylor <as...@apache.org>.
So I think my opinion is opposite of Elad -- and that we basically 
accept almost no new providers, and instead encourage people to create 
and release their own packages directly.

I want _less_ code in apache/airflow repo, not more. The more we have 
the more combinations we have to test on every commit, and the longer 
and longer the list of extras and providers we have to maintain is.

On Wed, Apr 6 2022 at 22:08:28 +0100, Ash Berlin-Taylor 
<as...@apache.org> wrote:
> My general thoughts: have as much as possible outside of Airflow.
> 
> If a provider is being contributed by the "owner" of the service 
> (i.e. Cloudera provider being contributed by Cloudera) then it 
> shouldn't live in Airflow and that company/project should release to 
> pypi directly.
> 
> The only time we should accept a new provider is if it is by a user 
> of the service, and likely to be popular and possible for us (Airflow 
> team) to run (i.e. no paid for accounts needed).
> 
> -Ash
> 
> On 4 April 2022 14:39:34 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
>> Hey all,
>> 
>> We seem to have an influx of new providers coming our way:
>> 
>> * Delta Sharing:
>> <https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c>
>> * Flyte:  
>> <https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x>
>> * Versatile Data Kit:
>> <https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0>
>> 
>> I think it might be a good idea to bring the discussion in one place
>> (here) and decide on what our approach is for accepting new providers
>> (the original discussion from Andon was focused mostly about VDK's
>> case, but maybe we could work out a general approach and "guidelines"
>> - what approach is best so that we do not have to discuss it
>> separately for each proposal, but we have some more (or less) clear
>> rules on when we think it's good to accept providers as community.
>> 
>> Generally speaking we have two approaches:
>> * providers managed by the Apache Airflow community
>> * providers managed by 3rd-parties
>> 
>> I think my email here, nicely summarizes what is in
>> <https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n>
>> 
>> I tried to look for earlier devlist discussions about the subject
>> (maybe someone can find it :), I think we have never formalized nor
>> written down but I do recall some (slack??) discussions about it from
>> the past.
>> 
>> While we have no control/influence (and we do not want to have) for
>> 3rd-party providers, we definitely have both for the 
>> community-managed
>> ones - and there should be some rules defined to decide when we are
>> "ok" to accept a provider. Not always having "more" providers in the
>> "community" area is better. More often than not, code is a liability
>> more often than an asset.
>> 
>> From those discussions I had I recall points such us:
>> 
>> * likelihood of the provider being used by many users
>> * possibility to test/support the providers by maintainers or
>> dedicated "stakeholders"
>> * quality of the code and following our expectations (docs/how to
>> guides, unit/system test)
>> * competing (?) with Airflow - there could be some providers of
>> "competing" products maybe (I am not sure if this is a concern of
>> ours) which we simply might decide to not maintain in the community
>> 
>> I am happy to write it down and propose such rules revolving around
>> those - but I would like to hear what people think first.
>> 
>> What are your thoughts here?
>> 
>> J


Re: [DISCUSS] Approach for new providers of the community

Posted by Ash Berlin-Taylor <as...@apache.org>.
My general thoughts: have as much as possible outside of Airflow.

If a provider is being contributed by the "owner" of the service (i.e. Cloudera provider being contributed by Cloudera) then it shouldn't live in Airflow and that company/project should release to pypi directly.

The only time we should accept a new provider is if it is by a user of the service, and likely to be popular and possible for us (Airflow team) to run (i.e. no paid for accounts needed).

-Ash

On 4 April 2022 14:39:34 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
>Hey all,
>
>We seem to have an influx of new providers coming our way:
>
>* Delta Sharing:
>https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
>* Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
>* Versatile Data Kit:
>https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
>
>I think it might be a good idea to bring the discussion in one place
>(here) and decide on what our approach is for accepting new providers
>(the original discussion from Andon was focused mostly about VDK's
>case, but maybe we could work out a general approach and "guidelines"
>- what approach is best so that we do not have to discuss it
>separately for each proposal, but we have some more (or less) clear
>rules on when we think it's good to accept providers as community.
>
>Generally speaking we have two approaches:
>* providers managed by the Apache Airflow community
>* providers managed by 3rd-parties
>
>I think my email here, nicely summarizes what is in
>https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
>
>I tried to look for earlier devlist discussions about the subject
>(maybe someone can find it :), I think we have never formalized nor
>written down but I do recall some (slack??) discussions about it from
>the past.
>
>While we have no control/influence (and we do not want to have) for
>3rd-party providers, we definitely have both for the community-managed
>ones - and there should be some rules defined to decide when we are
>"ok" to accept a provider. Not always having "more" providers in the
>"community" area is better. More often than not, code is a liability
>more often than an asset.
>
>From those discussions I had I recall points such us:
>
>* likelihood of the provider being used by many users
>* possibility to test/support the providers by maintainers or
>dedicated "stakeholders"
>* quality of the code and following our expectations (docs/how to
>guides, unit/system test)
>* competing (?) with Airflow - there could be some providers of
>"competing" products maybe (I am not sure if this is a concern of
>ours) which we simply might decide to not maintain in the community
>
>I am happy to write it down and propose such rules revolving around
>those - but I would like to hear what people think first.
>
>What are your thoughts here?
>
>J

Re: Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
I very much like some of the points there :).

I think indeed we missed so far clear guidance on what criteria a new
provider needs to meet - even if we actually had some of that in our heads
- that was more of a "tribal knowledge" and you could likely figure it out
by looking at other providers, but we did not have it hashed out.

And yeah absolutely AIP-47 as an enabler for finishing AIP-4 (automating
system tests for external systems) and specifically the dashboard showing
status is very, very, very dear to my heart :). I wrote AIP-4 in
September 2018 as my first AIP proposal which I created ~ month after I
started my first contributions to Airflow :).

Looks like we might be finally completing it :D.

I think we should wait a bit for more comments and I might try to start
drafting a proposed PR describing the policy.

J.

On Tue, Apr 5, 2022 at 10:34 PM Mehta, Shubham <sh...@amazon.com.invalid>
wrote:

> Hi all,
>
> I’m Shubham, Sr. Product Manager at AWS, working closely with John and the
> MWAA team. Glad to see the Airflow community openly discussing this topic
> which will likely shape Airflow’s growth in the future.
>
>
>
> Firstly, I am with Elad and Dennis that we shouldn’t be gatekeeping the
> new providers. At the same time, I empathize with Jarek’s concern about
> taking responsibility for maintaining the new providers. It is important to
> set the right expectation for our Airflow users when they try to use any
> Airflow provider to meet their development needs.
>
>
>
> Borrowing the “verified” feature from Twitter, I believe Airflow can
> provide a list of providers that meet our community guidelines, are well
> maintained, and are healthy. We can leverage AIP-47 Airflow System Test (
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests)
> to build a public-facing dashboard (something that Niko has been a big
> proponent of internally for AWS provider) that shows the status of system
> tests for all providers. It will improve the experience of Airflow users
> when they start using any provider package and reduce the issues we get.
>
>
>
> Deprecation will be difficult once a provider is added as there might be
> some users who depend on it. A list of "verified" Airflow providers and a
> dashboard with system tests will reduce the need for deprecation.
>
>
>
> Shubham
>
>
>
> *From: *"Jackson, John" <ja...@amazon.com.INVALID>
> *Reply-To: *"dev@airflow.apache.org" <de...@airflow.apache.org>
> *Date: *Tuesday, April 5, 2022 at 10:56 AM
> *To: *"dev@airflow.apache.org" <de...@airflow.apache.org>
> *Subject: *RE: [EXTERNAL] Re: [DISCUSS] Approach for new providers of the
> community
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi Folks,
>
>
>
> This is a great topic and indeed important as Airflow’s popularity
> continues to grow.
>
>
>
> One thing that will help is to provide clear, unambiguous, community
> guidelines for providers--both existing and new.  It should provide such
> items as:
>
>
>
>    - What qualifies as a “new provider” vs extending an existing provider
>    or releasing as an independent project.
>    - Rules about Python dependencies and other install actions that
>    providers can take, and how it interacts with Airflow core code (for
>    example, providers or their dependencies should not be allowed to
>    monkey-patch core code, or force an Airflow/DB upgrade).
>    - The minimum standards for unit tests, system tests, examples, and
>    documentation with consistent naming conventions (I’m looking at you
>    “examples”) and technology stacks (i.e. “mock” usage) for each.
>    - Clear direction as to when to create a hook vs operator vs sensor,
>    and minimum required functionality for each.
>    - A depreciation plan, for example that a provider is guaranteed to be
>    supported for x releases, however if it goes through n releases without
>    update it goes into a “quarantined” state, and if not verified it moves to
>    “retired” (or “moved to the attic” as Jarek stated).
>    - A bar-raising plan, to get all existing providers either up to the
>    current feature bar by a certain date, or retired.
>
>
>
> This should make accepting new providers/operators/etc PRs easier, as
> there will be an unambiguous checklist that needs to be met before it’s
> even reviewed (which could maybe even be automated).  It will also ensure
> user confidence in Airflow providers as a whole, as there will be a
> consistent level of features, functionality, and quality regardless of
> which provider the user chooses to deploy.
>
>
>
> John
>
>
>
> On 2022/04/05 08:58:17 Jarek Potiuk wrote:
>
> > One more Provider in progress I forgot :). Cloudera:
>
> > https://github.com/apache/airflow/pull/22659
>
> >
>
> > Just wanted to stress how important the result of this discussion is.
>
> > The number of PR for new providers we get is kind of unprecedented.
>
> > This month we have started discussions (and actual PRs) about adding
>
> > at least 4 biggish providers.
>
> >
>
> > It's either a coincidence, or we simply reached the status that a
>
> > "lot" of 3rd parties want to integrate with Airflow as Airflow is
>
> > really a de-facto Platform for Orchestration for "Everyone" :D :D.
>
> >
>
> > This is a great thing if it's the latter.
>
> >
>
> > I just want to make sure we get it right when it comes to "embracing"
>
> > then as a community. It's not really about gatekeeping but more about
>
> > "taking responsibility" for the code. If we accept code to the
>
> > community we take responsibility for maintaining it too. Of course
>
> > there are various stakeholders there and I am sure "Cloudera" people
>
> > will maintain their provider and provide bug fixes - but the issues
>
> > will also come our way if the Cloudera provider does not work (and
>
> > with the ASF "stamp of approval" we give our users some kind of
>
> > expectations that we have to fulfill).
>
> >
>
> > Unlike in many technical decisions :) I have no very strong opinion
>
> > about this and I am really interested to hear what the community
>
> > members think.
>
> >
>
> > We are prepared to handle literally hundreds of providers if need be
>
> > (with some small automation improvements) - so there are no technical
>
> > reasons to limit the number of providers.
>
> > In the (near) future we might even decide to split them into separate
>
> > repositories (there are some discussions about that and it's likely to
>
> > happen) to make some housekeeping easier and to make sure it does not
>
> > hold us back when we develop some core features.
>
> >
>
> > I am however leaning towards what both Elad and Denis wrote: accepting
>
> > new providers should be easy and it should only be gated by the
>
> > technical code quality bar, but there should also be some expectations
>
> > for the provider being maintained.
>
> > And as Dennis wrote - rather than "voting" for approval, there should
>
> > be rather a clear road (and voting possibly) to "retire" provider if
>
> > it is not maintained any more (This is called "Moving to attic" in the
>
> > ASF terminology).
>
> >
>
> > But maybe there are others who think differently. Would love to hear it.
>
> >
>
> > J
>
> >
>
> >
>
> > On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis
>
> > <fe...@amazon.com.invalid> wrote:
>
> > >
>
> > > I think I'd just +1 Elad's comments.  I don't know if we (the
> community) really need to be gatekeeping which providers get first class
> status like that.  In the end, the users of any given provider become
> responsible for maintaining it, so I feel it sorts itself out without added
> bureaucracy.  Perhaps some form of formalized decision tree on when to drop
> a provider package as "no longer maintained/supported", but I don't feel
> there should be a high barrier to entry on adding a new one provided the
> code doesn't break any existing packages and meets community quality
> standards.
>
> > >
>
> > >
>
> > > ________________________________
>
> > > From: Elad Kalif <el...@apache.org>
>
> > > Sent: Monday, April 4, 2022 7:24 AM
>
> > > To: dev@airflow.apache.org
>
> > > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the
> community
>
> > >
>
> > >
>
> > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
> > >
>
> > >
>
> > > Interesting topic!
>
> > >
>
> > > I think the most important thing for us is that we are able to
> maintain the provider (in terms of not causing problems for Airflow core or
> other providers).
>
> > > Some of the maintained providers (Google for example) have open bugs
> for 2 years. So even if we have many provider mantiners it doesn't
> guarantee fixing problems.
>
> > > I am not worried about provider internal issues (operator not working
> properly, etc..)  - it affects only the users of the provider itself and
> the users of the provider are always welcome to submit PRs with fixes.
>
> > >
>
> > > I don't feel comfortable blocking a new provider just because it has a
> small market / competitors' tools also don't support it etc...
>
> > >
>
> > > I guess my take is:
>
> > >
>
> > > Accept any new provider that meets quality/requirements (just as we
> did so far)
>
> > > Since providers are independent packages - In the rare case (I say
> rare as it never happened till now) where the provider causes problems with
> core/other providers and no one is willing to address it.
>
> > >  if we can terminate the provider/mark it as not matinined in PyPi -
> it should be enough I think.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > >>
>
> > >> Hey all,
>
> > >>
>
> > >> We seem to have an influx of new providers coming our way:
>
> > >>
>
> > >> * Delta Sharing:
>
> > >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
>
> > >> * Flyte:
> https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
>
> > >> * Versatile Data Kit:
>
> > >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
>
> > >>
>
> > >> I think it might be a good idea to bring the discussion in one place
>
> > >> (here) and decide on what our approach is for accepting new providers
>
> > >> (the original discussion from Andon was focused mostly about VDK's
>
> > >> case, but maybe we could work out a general approach and "guidelines"
>
> > >> - what approach is best so that we do not have to discuss it
>
> > >> separately for each proposal, but we have some more (or less) clear
>
> > >> rules on when we think it's good to accept providers as community.
>
> > >>
>
> > >> Generally speaking we have two approaches:
>
> > >> * providers managed by the Apache Airflow community
>
> > >> * providers managed by 3rd-parties
>
> > >>
>
> > >> I think my email here, nicely summarizes what is in
>
> > >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
>
> > >>
>
> > >> I tried to look for earlier devlist discussions about the subject
>
> > >> (maybe someone can find it :), I think we have never formalized nor
>
> > >> written down but I do recall some (slack??) discussions about it from
>
> > >> the past.
>
> > >>
>
> > >> While we have no control/influence (and we do not want to have) for
>
> > >> 3rd-party providers, we definitely have both for the community-managed
>
> > >> ones - and there should be some rules defined to decide when we are
>
> > >> "ok" to accept a provider. Not always having "more" providers in the
>
> > >> "community" area is better. More often than not, code is a liability
>
> > >> more often than an asset.
>
> > >>
>
> > >> From those discussions I had I recall points such us:
>
> > >>
>
> > >> * likelihood of the provider being used by many users
>
> > >> * possibility to test/support the providers by maintainers or
>
> > >> dedicated "stakeholders"
>
> > >> * quality of the code and following our expectations (docs/how to
>
> > >> guides, unit/system test)
>
> > >> * competing (?) with Airflow - there could be some providers of
>
> > >> "competing" products maybe (I am not sure if this is a concern of
>
> > >> ours) which we simply might decide to not maintain in the community
>
> > >>
>
> > >> I am happy to write it down and propose such rules revolving around
>
> > >> those - but I would like to hear what people think first.
>
> > >>
>
> > >> What are your thoughts here?
>
> > >>
>
> > >> J
>
> >
>

Re: Re: [DISCUSS] Approach for new providers of the community

Posted by "Mehta, Shubham" <sh...@amazon.com.INVALID>.
Hi all,
I’m Shubham, Sr. Product Manager at AWS, working closely with John and the MWAA team. Glad to see the Airflow community openly discussing this topic which will likely shape Airflow’s growth in the future.

Firstly, I am with Elad and Dennis that we shouldn’t be gatekeeping the new providers. At the same time, I empathize with Jarek’s concern about taking responsibility for maintaining the new providers. It is important to set the right expectation for our Airflow users when they try to use any Airflow provider to meet their development needs.

Borrowing the “verified” feature from Twitter, I believe Airflow can provide a list of providers that meet our community guidelines, are well maintained, and are healthy. We can leverage AIP-47 Airflow System Test (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests) to build a public-facing dashboard (something that Niko has been a big proponent of internally for AWS provider) that shows the status of system tests for all providers. It will improve the experience of Airflow users when they start using any provider package and reduce the issues we get.

Deprecation will be difficult once a provider is added as there might be some users who depend on it. A list of "verified" Airflow providers and a dashboard with system tests will reduce the need for deprecation.

Shubham

From: "Jackson, John" <ja...@amazon.com.INVALID>
Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Date: Tuesday, April 5, 2022 at 10:56 AM
To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Subject: RE: [EXTERNAL] Re: [DISCUSS] Approach for new providers of the community


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi Folks,

This is a great topic and indeed important as Airflow’s popularity continues to grow.

One thing that will help is to provide clear, unambiguous, community guidelines for providers--both existing and new.  It should provide such items as:


  *   What qualifies as a “new provider” vs extending an existing provider or releasing as an independent project.
  *   Rules about Python dependencies and other install actions that providers can take, and how it interacts with Airflow core code (for example, providers or their dependencies should not be allowed to monkey-patch core code, or force an Airflow/DB upgrade).
  *   The minimum standards for unit tests, system tests, examples, and documentation with consistent naming conventions (I’m looking at you “examples”) and technology stacks (i.e. “mock” usage) for each.
  *   Clear direction as to when to create a hook vs operator vs sensor, and minimum required functionality for each.
  *   A depreciation plan, for example that a provider is guaranteed to be supported for x releases, however if it goes through n releases without update it goes into a “quarantined” state, and if not verified it moves to “retired” (or “moved to the attic” as Jarek stated).
  *   A bar-raising plan, to get all existing providers either up to the current feature bar by a certain date, or retired.

This should make accepting new providers/operators/etc PRs easier, as there will be an unambiguous checklist that needs to be met before it’s even reviewed (which could maybe even be automated).  It will also ensure user confidence in Airflow providers as a whole, as there will be a consistent level of features, functionality, and quality regardless of which provider the user chooses to deploy.

John

On 2022/04/05 08:58:17 Jarek Potiuk wrote:
> One more Provider in progress I forgot :). Cloudera:
> https://github.com/apache/airflow/pull/22659
>
> Just wanted to stress how important the result of this discussion is.
> The number of PR for new providers we get is kind of unprecedented.
> This month we have started discussions (and actual PRs) about adding
> at least 4 biggish providers.
>
> It's either a coincidence, or we simply reached the status that a
> "lot" of 3rd parties want to integrate with Airflow as Airflow is
> really a de-facto Platform for Orchestration for "Everyone" :D :D.
>
> This is a great thing if it's the latter.
>
> I just want to make sure we get it right when it comes to "embracing"
> then as a community. It's not really about gatekeeping but more about
> "taking responsibility" for the code. If we accept code to the
> community we take responsibility for maintaining it too. Of course
> there are various stakeholders there and I am sure "Cloudera" people
> will maintain their provider and provide bug fixes - but the issues
> will also come our way if the Cloudera provider does not work (and
> with the ASF "stamp of approval" we give our users some kind of
> expectations that we have to fulfill).
>
> Unlike in many technical decisions :) I have no very strong opinion
> about this and I am really interested to hear what the community
> members think.
>
> We are prepared to handle literally hundreds of providers if need be
> (with some small automation improvements) - so there are no technical
> reasons to limit the number of providers.
> In the (near) future we might even decide to split them into separate
> repositories (there are some discussions about that and it's likely to
> happen) to make some housekeeping easier and to make sure it does not
> hold us back when we develop some core features.
>
> I am however leaning towards what both Elad and Denis wrote: accepting
> new providers should be easy and it should only be gated by the
> technical code quality bar, but there should also be some expectations
> for the provider being maintained.
> And as Dennis wrote - rather than "voting" for approval, there should
> be rather a clear road (and voting possibly) to "retire" provider if
> it is not maintained any more (This is called "Moving to attic" in the
> ASF terminology).
>
> But maybe there are others who think differently. Would love to hear it.
>
> J
>
>
> On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis
> <fe...@amazon.com.invalid>> wrote:
> >
> > I think I'd just +1 Elad's comments.  I don't know if we (the community) really need to be gatekeeping which providers get first class status like that.  In the end, the users of any given provider become responsible for maintaining it, so I feel it sorts itself out without added bureaucracy.  Perhaps some form of formalized decision tree on when to drop a provider package as "no longer maintained/supported", but I don't feel there should be a high barrier to entry on adding a new one provided the code doesn't break any existing packages and meets community quality standards.
> >
> >
> > ________________________________
> > From: Elad Kalif <el...@apache.org>>
> > Sent: Monday, April 4, 2022 7:24 AM
> > To: dev@airflow.apache.org<ma...@airflow.apache.org>
> > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the community
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> > Interesting topic!
> >
> > I think the most important thing for us is that we are able to maintain the provider (in terms of not causing problems for Airflow core or other providers).
> > Some of the maintained providers (Google for example) have open bugs for 2 years. So even if we have many provider mantiners it doesn't guarantee fixing problems.
> > I am not worried about provider internal issues (operator not working properly, etc..)  - it affects only the users of the provider itself and the users of the provider are always welcome to submit PRs with fixes.
> >
> > I don't feel comfortable blocking a new provider just because it has a small market / competitors' tools also don't support it etc...
> >
> > I guess my take is:
> >
> > Accept any new provider that meets quality/requirements (just as we did so far)
> > Since providers are independent packages - In the rare case (I say rare as it never happened till now) where the provider causes problems with core/other providers and no one is willing to address it.
> >  if we can terminate the provider/mark it as not matinined in PyPi - it should be enough I think.
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com>> wrote:
> >>
> >> Hey all,
> >>
> >> We seem to have an influx of new providers coming our way:
> >>
> >> * Delta Sharing:
> >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
> >> * Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
> >> * Versatile Data Kit:
> >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
> >>
> >> I think it might be a good idea to bring the discussion in one place
> >> (here) and decide on what our approach is for accepting new providers
> >> (the original discussion from Andon was focused mostly about VDK's
> >> case, but maybe we could work out a general approach and "guidelines"
> >> - what approach is best so that we do not have to discuss it
> >> separately for each proposal, but we have some more (or less) clear
> >> rules on when we think it's good to accept providers as community.
> >>
> >> Generally speaking we have two approaches:
> >> * providers managed by the Apache Airflow community
> >> * providers managed by 3rd-parties
> >>
> >> I think my email here, nicely summarizes what is in
> >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
> >>
> >> I tried to look for earlier devlist discussions about the subject
> >> (maybe someone can find it :), I think we have never formalized nor
> >> written down but I do recall some (slack??) discussions about it from
> >> the past.
> >>
> >> While we have no control/influence (and we do not want to have) for
> >> 3rd-party providers, we definitely have both for the community-managed
> >> ones - and there should be some rules defined to decide when we are
> >> "ok" to accept a provider. Not always having "more" providers in the
> >> "community" area is better. More often than not, code is a liability
> >> more often than an asset.
> >>
> >> From those discussions I had I recall points such us:
> >>
> >> * likelihood of the provider being used by many users
> >> * possibility to test/support the providers by maintainers or
> >> dedicated "stakeholders"
> >> * quality of the code and following our expectations (docs/how to
> >> guides, unit/system test)
> >> * competing (?) with Airflow - there could be some providers of
> >> "competing" products maybe (I am not sure if this is a concern of
> >> ours) which we simply might decide to not maintain in the community
> >>
> >> I am happy to write it down and propose such rules revolving around
> >> those - but I would like to hear what people think first.
> >>
> >> What are your thoughts here?
> >>
> >> J
>

Re: [DISCUSS] Approach for new providers of the community

Posted by Jarek Potiuk <ja...@potiuk.com>.
One more Provider in progress I forgot :). Cloudera:
https://github.com/apache/airflow/pull/22659

Just wanted to stress how important the result of this discussion is.
The number of PR for new providers we get is kind of unprecedented.
This month we have started discussions (and actual PRs) about adding
at least 4 biggish providers.

It's either a coincidence, or we simply reached the status that a
"lot" of 3rd parties want to integrate with Airflow as Airflow is
really a de-facto Platform for Orchestration for "Everyone" :D :D.

This is a great thing if it's the latter.

I just want to make sure we get it right when it comes to "embracing"
then as a community. It's not really about gatekeeping but more about
"taking responsibility" for the code. If we accept code to the
community we take responsibility for maintaining it too. Of course
there are various stakeholders there and I am sure "Cloudera" people
will maintain their provider and provide bug fixes - but the issues
will also come our way if the Cloudera provider does not work (and
with the ASF "stamp of approval" we give our users some kind of
expectations that we have to fulfill).

Unlike in many technical decisions :) I have no very strong opinion
about this and I am really interested to hear what the community
members think.

We are prepared to handle literally hundreds of providers if need be
(with some small automation improvements) - so there are no technical
reasons to limit the number of providers.
In the (near) future we might even decide to split them into separate
repositories (there are some discussions about that and it's likely to
happen) to make some housekeeping easier and to make sure it does not
hold us back when we develop some core features.

I am however leaning towards what both Elad and Denis wrote: accepting
new providers should be easy and it should only be gated by the
technical code quality bar, but there should also be some expectations
for the provider being maintained.
And as Dennis wrote - rather than "voting" for approval, there should
be rather a clear road (and voting possibly) to "retire" provider if
it is not maintained any more (This is called "Moving to attic" in the
ASF terminology).

But maybe there are others who think differently. Would love to hear it.

J


On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis
<fe...@amazon.com.invalid> wrote:
>
> I think I'd just +1 Elad's comments.  I don't know if we (the community) really need to be gatekeeping which providers get first class status like that.  In the end, the users of any given provider become responsible for maintaining it, so I feel it sorts itself out without added bureaucracy.  Perhaps some form of formalized decision tree on when to drop a provider package as "no longer maintained/supported", but I don't feel there should be a high barrier to entry on adding a new one provided the code doesn't break any existing packages and meets community quality standards.
>
>
> ________________________________
> From: Elad Kalif <el...@apache.org>
> Sent: Monday, April 4, 2022 7:24 AM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the community
>
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
> Interesting topic!
>
> I think the most important thing for us is that we are able to maintain the provider (in terms of not causing problems for Airflow core or other providers).
> Some of the maintained providers (Google for example) have open bugs for 2 years. So even if we have many provider mantiners it doesn't guarantee fixing problems.
> I am not worried about provider internal issues (operator not working properly, etc..)  - it affects only the users of the provider itself and the users of the provider are always welcome to submit PRs with fixes.
>
> I don't feel comfortable blocking a new provider just because it has a small market / competitors' tools also don't support it etc...
>
> I guess my take is:
>
> Accept any new provider that meets quality/requirements (just as we did so far)
> Since providers are independent packages - In the rare case (I say rare as it never happened till now) where the provider causes problems with core/other providers and no one is willing to address it.
>  if we can terminate the provider/mark it as not matinined in PyPi - it should be enough I think.
>
>
>
>
>
>
>
> On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> Hey all,
>>
>> We seem to have an influx of new providers coming our way:
>>
>> * Delta Sharing:
>> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
>> * Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
>> * Versatile Data Kit:
>> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
>>
>> I think it might be a good idea to bring the discussion in one place
>> (here) and decide on what our approach is for accepting new providers
>> (the original discussion from Andon was focused mostly about VDK's
>> case, but maybe we could work out a general approach and "guidelines"
>> - what approach is best so that we do not have to discuss it
>> separately for each proposal, but we have some more (or less) clear
>> rules on when we think it's good to accept providers as community.
>>
>> Generally speaking we have two approaches:
>> * providers managed by the Apache Airflow community
>> * providers managed by 3rd-parties
>>
>> I think my email here, nicely summarizes what is in
>> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
>>
>> I tried to look for earlier devlist discussions about the subject
>> (maybe someone can find it :), I think we have never formalized nor
>> written down but I do recall some (slack??) discussions about it from
>> the past.
>>
>> While we have no control/influence (and we do not want to have) for
>> 3rd-party providers, we definitely have both for the community-managed
>> ones - and there should be some rules defined to decide when we are
>> "ok" to accept a provider. Not always having "more" providers in the
>> "community" area is better. More often than not, code is a liability
>> more often than an asset.
>>
>> From those discussions I had I recall points such us:
>>
>> * likelihood of the provider being used by many users
>> * possibility to test/support the providers by maintainers or
>> dedicated "stakeholders"
>> * quality of the code and following our expectations (docs/how to
>> guides, unit/system test)
>> * competing (?) with Airflow - there could be some providers of
>> "competing" products maybe (I am not sure if this is a concern of
>> ours) which we simply might decide to not maintain in the community
>>
>> I am happy to write it down and propose such rules revolving around
>> those - but I would like to hear what people think first.
>>
>> What are your thoughts here?
>>
>> J

Re: [DISCUSS] Approach for new providers of the community

Posted by "Ferruzzi, Dennis" <fe...@amazon.com.INVALID>.
I think I'd just +1 Elad's comments.  I don't know if we (the community) really need to be gatekeeping which providers get first class status like that.  In the end, the users of any given provider become responsible for maintaining it, so I feel it sorts itself out without added bureaucracy.  Perhaps some form of formalized decision tree on when to drop a provider package as "no longer maintained/supported", but I don't feel there should be a high barrier to entry on adding a new one provided the code doesn't break any existing packages and meets community quality standards.


________________________________
From: Elad Kalif <el...@apache.org>
Sent: Monday, April 4, 2022 7:24 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the community


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Interesting topic!

I think the most important thing for us is that we are able to maintain the provider (in terms of not causing problems for Airflow core or other providers).
Some of the maintained providers (Google for example) have open bugs for 2 years. So even if we have many provider mantiners it doesn't guarantee fixing problems.
I am not worried about provider internal issues (operator not working properly, etc..)  - it affects only the users of the provider itself and the users of the provider are always welcome to submit PRs with fixes.

I don't feel comfortable blocking a new provider just because it has a small market / competitors' tools also don't support it etc...

I guess my take is:

  1.  Accept any new provider that meets quality/requirements (just as we did so far)
  2.  Since providers are independent packages - In the rare case (I say rare as it never happened till now) where the provider causes problems with core/other providers and no one is willing to address it.
 if we can terminate the provider/mark it as not matinined in PyPi - it should be enough I think.







On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com>> wrote:
Hey all,

We seem to have an influx of new providers coming our way:

* Delta Sharing:
https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
* Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
* Versatile Data Kit:
https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0

I think it might be a good idea to bring the discussion in one place
(here) and decide on what our approach is for accepting new providers
(the original discussion from Andon was focused mostly about VDK's
case, but maybe we could work out a general approach and "guidelines"
- what approach is best so that we do not have to discuss it
separately for each proposal, but we have some more (or less) clear
rules on when we think it's good to accept providers as community.

Generally speaking we have two approaches:
* providers managed by the Apache Airflow community
* providers managed by 3rd-parties

I think my email here, nicely summarizes what is in
https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n

I tried to look for earlier devlist discussions about the subject
(maybe someone can find it :), I think we have never formalized nor
written down but I do recall some (slack??) discussions about it from
the past.

While we have no control/influence (and we do not want to have) for
3rd-party providers, we definitely have both for the community-managed
ones - and there should be some rules defined to decide when we are
"ok" to accept a provider. Not always having "more" providers in the
"community" area is better. More often than not, code is a liability
more often than an asset.

From those discussions I had I recall points such us:

* likelihood of the provider being used by many users
* possibility to test/support the providers by maintainers or
dedicated "stakeholders"
* quality of the code and following our expectations (docs/how to
guides, unit/system test)
* competing (?) with Airflow - there could be some providers of
"competing" products maybe (I am not sure if this is a concern of
ours) which we simply might decide to not maintain in the community

I am happy to write it down and propose such rules revolving around
those - but I would like to hear what people think first.

What are your thoughts here?

J

Re: [DISCUSS] Approach for new providers of the community

Posted by Elad Kalif <el...@apache.org>.
Interesting topic!

I think the most important thing for us is that we are able to maintain the
provider (in terms of not causing problems for Airflow core or other
providers).
Some of the maintained providers (Google for example) have open bugs for 2
years. So even if we have many provider mantiners it doesn't
guarantee fixing problems.
I am not worried about provider internal issues (operator not working
properly, etc..)  - it affects only the users of the provider itself and
the users of the provider are always welcome to submit PRs with fixes.

I don't feel comfortable blocking a new provider just because it has a
small market / competitors' tools also don't support it etc...

I guess my take is:

   1. Accept any new provider that meets quality/requirements (just as we
   did so far)
   2. Since providers are independent packages - In the rare case (I say
   rare as it never happened till now) where the provider causes problems with
   core/other providers and no one is willing to address it.
    if we can terminate the provider/mark it as not matinined in PyPi - it
   should be enough I think.








On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hey all,
>
> We seem to have an influx of new providers coming our way:
>
> * Delta Sharing:
> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
> * Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
> * Versatile Data Kit:
> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
>
> I think it might be a good idea to bring the discussion in one place
> (here) and decide on what our approach is for accepting new providers
> (the original discussion from Andon was focused mostly about VDK's
> case, but maybe we could work out a general approach and "guidelines"
> - what approach is best so that we do not have to discuss it
> separately for each proposal, but we have some more (or less) clear
> rules on when we think it's good to accept providers as community.
>
> Generally speaking we have two approaches:
> * providers managed by the Apache Airflow community
> * providers managed by 3rd-parties
>
> I think my email here, nicely summarizes what is in
> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
>
> I tried to look for earlier devlist discussions about the subject
> (maybe someone can find it :), I think we have never formalized nor
> written down but I do recall some (slack??) discussions about it from
> the past.
>
> While we have no control/influence (and we do not want to have) for
> 3rd-party providers, we definitely have both for the community-managed
> ones - and there should be some rules defined to decide when we are
> "ok" to accept a provider. Not always having "more" providers in the
> "community" area is better. More often than not, code is a liability
> more often than an asset.
>
> From those discussions I had I recall points such us:
>
> * likelihood of the provider being used by many users
> * possibility to test/support the providers by maintainers or
> dedicated "stakeholders"
> * quality of the code and following our expectations (docs/how to
> guides, unit/system test)
> * competing (?) with Airflow - there could be some providers of
> "competing" products maybe (I am not sure if this is a concern of
> ours) which we simply might decide to not maintain in the community
>
> I am happy to write it down and propose such rules revolving around
> those - but I would like to hear what people think first.
>
> What are your thoughts here?
>
> J
>