You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Marcus Eagan <me...@marcuseagan.com> on 2023/01/29 19:48:33 UTC

Thoughts on Adding Weaviate Provider?

Hi Devs,

In keeping with the open source ethos and the need for DAG workflows in
Neural Search pipelines, I welcome feedback on the idea of adding a Weaviate
<https://weaviate.io/developers/weaviate> provider to Airflow. It's the
best open source neural search engine.

I see the need for a test and would be willing to invest in a mock if
necessary, but I'm curious about the appetite for such work in general.

I'm open to feedback. I've contributed a lot to various open
source projects and one very small contribution to Airflow back in the day
to help with enterprise adoption.

Best,

Marcus

Re: Thoughts on Adding Weaviate Provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
Cool. Glad I could help.

On Sun, Jan 29, 2023 at 10:12 PM Marcus Eagan <me...@marcuseagan.com> wrote:
>
> J,
>
> Not discouraging at all. I reached out in search of clarity that I now have.
>
> Thanks for the detailed response. I think all of this makes a ton of sense. Upon first sight of how the provider ecosystem had grown, I was shocked. I will talk with the Weaviate team and community about Airflow's position. My expectation is that everyone would be excited for the outcomes you outlined in the email, specifically regarding the registry.
>
> Best,
>
> Marcus
>
> On Sun, Jan 29, 2023 at 12:11 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> TL;DR; Generally we are very cautious about adding new "external
>> service" providers to the community and it's highly unlikely we would
>> want Weaviate in, but you are absolutely free (and encouraged) to
>> release the provider on your own. Even if there is an open-source
>> engine (like yours), when there is a cloud service behind it, really
>> those who run and build the services should - in general - be taking
>> responsibility for releasing their integration with Airflow.
>>
>> In the vast majority of cases (and very likely in your case) it is far
>> better for services like yours to build and release the provider on
>> your own. There is absolutely nothing an Airflow provider released by
>> you cannot do differently than the community providers. It's much
>> simpler for you to support Airflow and release your own provider for
>> Airflow than for the Airflow community to maintain an external service
>> provider (adding to 70+ providers we already manage in the community).
>> We are happy to merge information about your provider to our ecosystem
>> page https://airflow.apache.org/ecosystem/#third-party-airflow-plugins-and-providers
>> and you can also add it to Astronomer Registry - you will find the
>> links to registry in the ecosystem (and any other registries that
>> might be there).
>>
>> We would only consider a new provider to be donated to us if a lot of
>> condition is fulfilled - not only mocking, but we also expect from
>> those who wish to donate such service provider to run system tests
>> with the real services (using their own resources) and dedication to
>> keep the system tests running and tested (otherwise we will stop
>> releasing such provider). At the same time we impose a lot of
>> limitations for such provider including minimum supported Airflow
>> Version (in April we will bump all providers to only support Airflow
>> 2.4+ for example) and bound to our very strict release process
>> https://github.com/apache/airflow#release-process-for-providers.
>> Recently this requirement caused Cloudera - for example - to release
>> their provider on their own (see the ecosystem page for the link).
>> Other - popular - providers that we already have are catching up with
>> this requirement. Fo example AWS recently released (and maintain)
>> their dashboard of system tests
>> https://aws-mwaa.github.io/open-source/system-tests/dashboard.html
>> that they run and maintain and we are going to use it in our release
>> process when releasing AWS provider. You would have to do something
>> similar as a very basic requirement to get the Weaviate provider
>> adopted in the Airflow community.
>>
>> You can take a look at the recent discussions we had about it to get
>> more context. Read all of those before responding please. Those
>> threads have likely all the answers to all the questions you might
>> have:
>>
>> * https://lists.apache.org/thread/hvl2sg7mc6gwxs1h5kzhrcdtt8cc36dd
>> * https://lists.apache.org/thread/1gtw5vyypxh0p72wh4dss7cllcvhgh01
>> * https://lists.apache.org/thread/qk2co6trd7gm57744shprw2fhgmjr637
>> * https://lists.apache.org/thread/8b1jvld3npgzz2z0o3gv14lvtornbdrm
>>
>> A bit discouraging, I understand, but we debated and discussed a lot
>> about it and this is the approach we apply to all similar requests.
>>
>> J.
>>
>> On Sun, Jan 29, 2023 at 8:49 PM Marcus Eagan <me...@marcuseagan.com> wrote:
>> >
>> > Hi Devs,
>> >
>> > In keeping with the open source ethos and the need for DAG workflows in Neural Search pipelines, I welcome feedback on the idea of adding a Weaviate provider to Airflow. It's the best open source neural search engine.
>> >
>> > I see the need for a test and would be willing to invest in a mock if necessary, but I'm curious about the appetite for such work in general.
>> >
>> > I'm open to feedback. I've contributed a lot to various open source projects and one very small contribution to Airflow back in the day to help with enterprise adoption.
>> >
>> > Best,
>> >
>> > Marcus
>> >

Re: Thoughts on Adding Weaviate Provider?

Posted by Marcus Eagan <me...@marcuseagan.com>.
J,

Not discouraging at all. I reached out in search of clarity that I now
have.

Thanks for the detailed response. I think all of this makes a ton of sense.
Upon first sight of how the provider ecosystem had grown, I was shocked. I
will talk with the Weaviate team and community about Airflow's position. My
expectation is that everyone would be excited for the outcomes you outlined
in the email, specifically regarding the registry.

Best,

Marcus

On Sun, Jan 29, 2023 at 12:11 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> TL;DR; Generally we are very cautious about adding new "external
> service" providers to the community and it's highly unlikely we would
> want Weaviate in, but you are absolutely free (and encouraged) to
> release the provider on your own. Even if there is an open-source
> engine (like yours), when there is a cloud service behind it, really
> those who run and build the services should - in general - be taking
> responsibility for releasing their integration with Airflow.
>
> In the vast majority of cases (and very likely in your case) it is far
> better for services like yours to build and release the provider on
> your own. There is absolutely nothing an Airflow provider released by
> you cannot do differently than the community providers. It's much
> simpler for you to support Airflow and release your own provider for
> Airflow than for the Airflow community to maintain an external service
> provider (adding to 70+ providers we already manage in the community).
> We are happy to merge information about your provider to our ecosystem
> page
> https://airflow.apache.org/ecosystem/#third-party-airflow-plugins-and-providers
> and you can also add it to Astronomer Registry - you will find the
> links to registry in the ecosystem (and any other registries that
> might be there).
>
> We would only consider a new provider to be donated to us if a lot of
> condition is fulfilled - not only mocking, but we also expect from
> those who wish to donate such service provider to run system tests
> with the real services (using their own resources) and dedication to
> keep the system tests running and tested (otherwise we will stop
> releasing such provider). At the same time we impose a lot of
> limitations for such provider including minimum supported Airflow
> Version (in April we will bump all providers to only support Airflow
> 2.4+ for example) and bound to our very strict release process
> https://github.com/apache/airflow#release-process-for-providers.
> Recently this requirement caused Cloudera - for example - to release
> their provider on their own (see the ecosystem page for the link).
> Other - popular - providers that we already have are catching up with
> this requirement. Fo example AWS recently released (and maintain)
> their dashboard of system tests
> https://aws-mwaa.github.io/open-source/system-tests/dashboard.html
> that they run and maintain and we are going to use it in our release
> process when releasing AWS provider. You would have to do something
> similar as a very basic requirement to get the Weaviate provider
> adopted in the Airflow community.
>
> You can take a look at the recent discussions we had about it to get
> more context. Read all of those before responding please. Those
> threads have likely all the answers to all the questions you might
> have:
>
> * https://lists.apache.org/thread/hvl2sg7mc6gwxs1h5kzhrcdtt8cc36dd
> * https://lists.apache.org/thread/1gtw5vyypxh0p72wh4dss7cllcvhgh01
> * https://lists.apache.org/thread/qk2co6trd7gm57744shprw2fhgmjr637
> * https://lists.apache.org/thread/8b1jvld3npgzz2z0o3gv14lvtornbdrm
>
> A bit discouraging, I understand, but we debated and discussed a lot
> about it and this is the approach we apply to all similar requests.
>
> J.
>
> On Sun, Jan 29, 2023 at 8:49 PM Marcus Eagan <me...@marcuseagan.com> wrote:
> >
> > Hi Devs,
> >
> > In keeping with the open source ethos and the need for DAG workflows in
> Neural Search pipelines, I welcome feedback on the idea of adding a
> Weaviate provider to Airflow. It's the best open source neural search
> engine.
> >
> > I see the need for a test and would be willing to invest in a mock if
> necessary, but I'm curious about the appetite for such work in general.
> >
> > I'm open to feedback. I've contributed a lot to various open source
> projects and one very small contribution to Airflow back in the day to help
> with enterprise adoption.
> >
> > Best,
> >
> > Marcus
> >
>

Re: Thoughts on Adding Weaviate Provider?

Posted by Jarek Potiuk <ja...@potiuk.com>.
TL;DR; Generally we are very cautious about adding new "external
service" providers to the community and it's highly unlikely we would
want Weaviate in, but you are absolutely free (and encouraged) to
release the provider on your own. Even if there is an open-source
engine (like yours), when there is a cloud service behind it, really
those who run and build the services should - in general - be taking
responsibility for releasing their integration with Airflow.

In the vast majority of cases (and very likely in your case) it is far
better for services like yours to build and release the provider on
your own. There is absolutely nothing an Airflow provider released by
you cannot do differently than the community providers. It's much
simpler for you to support Airflow and release your own provider for
Airflow than for the Airflow community to maintain an external service
provider (adding to 70+ providers we already manage in the community).
We are happy to merge information about your provider to our ecosystem
page https://airflow.apache.org/ecosystem/#third-party-airflow-plugins-and-providers
and you can also add it to Astronomer Registry - you will find the
links to registry in the ecosystem (and any other registries that
might be there).

We would only consider a new provider to be donated to us if a lot of
condition is fulfilled - not only mocking, but we also expect from
those who wish to donate such service provider to run system tests
with the real services (using their own resources) and dedication to
keep the system tests running and tested (otherwise we will stop
releasing such provider). At the same time we impose a lot of
limitations for such provider including minimum supported Airflow
Version (in April we will bump all providers to only support Airflow
2.4+ for example) and bound to our very strict release process
https://github.com/apache/airflow#release-process-for-providers.
Recently this requirement caused Cloudera - for example - to release
their provider on their own (see the ecosystem page for the link).
Other - popular - providers that we already have are catching up with
this requirement. Fo example AWS recently released (and maintain)
their dashboard of system tests
https://aws-mwaa.github.io/open-source/system-tests/dashboard.html
that they run and maintain and we are going to use it in our release
process when releasing AWS provider. You would have to do something
similar as a very basic requirement to get the Weaviate provider
adopted in the Airflow community.

You can take a look at the recent discussions we had about it to get
more context. Read all of those before responding please. Those
threads have likely all the answers to all the questions you might
have:

* https://lists.apache.org/thread/hvl2sg7mc6gwxs1h5kzhrcdtt8cc36dd
* https://lists.apache.org/thread/1gtw5vyypxh0p72wh4dss7cllcvhgh01
* https://lists.apache.org/thread/qk2co6trd7gm57744shprw2fhgmjr637
* https://lists.apache.org/thread/8b1jvld3npgzz2z0o3gv14lvtornbdrm

A bit discouraging, I understand, but we debated and discussed a lot
about it and this is the approach we apply to all similar requests.

J.

On Sun, Jan 29, 2023 at 8:49 PM Marcus Eagan <me...@marcuseagan.com> wrote:
>
> Hi Devs,
>
> In keeping with the open source ethos and the need for DAG workflows in Neural Search pipelines, I welcome feedback on the idea of adding a Weaviate provider to Airflow. It's the best open source neural search engine.
>
> I see the need for a test and would be willing to invest in a mock if necessary, but I'm curious about the appetite for such work in general.
>
> I'm open to feedback. I've contributed a lot to various open source projects and one very small contribution to Airflow back in the day to help with enterprise adoption.
>
> Best,
>
> Marcus
>