You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Martijn Visser <ma...@ververica.com> on 2022/01/14 09:25:36 UTC

Fwd: [DISCUSS] Moving connectors from Flink to external connector repositories

Hi User mailing list,

I'm also forwarding this thread to you. Please let me know if you have any
comments or feedback!

Best regards,

Martijn

---------- Forwarded message ---------
From: Martijn Visser <ma...@ververica.com>
Date: Fri, 14 Jan 2022 at 06:28
Subject: Re: [DISCUSS] Moving connectors from Flink to external connector
repositories
To: Qingsheng Ren <re...@gmail.com>
Cc: dev <de...@flink.apache.org>


Hi everyone,

If you have any more comments or questions, please let me know. Else I
would open up a vote on this thread in the next couple of days.

Best regards,

Martijn

On Thu, 6 Jan 2022 at 09:45, Qingsheng Ren <re...@gmail.com> wrote:

> Thanks Martijn for driving this!
>
> I’m +1 for Martijn’s proposal. It’s important to avoid making some
> connectors above others, and all connectors should share the same quality
> standard. Keeping some basic connectors like FileSystem is reasonable since
> it’s crucial for new users to try and explore Flink quickly.
>
> Another point I’d like to mention is that we need to add more E2E cases
> using basic connectors in Flink main repo after we moving connectors out.
> Currently E2E tests are heavily dependent on connectors. It’s essential to
> keep the coverage and quality of Flink main repo even without these
> connector’s E2E cases.
>
> Best regards,
>
> Qingsheng Ren
>
>
> > On Jan 5, 2022, at 9:59 PM, Martijn Visser <ma...@ververica.com>
> wrote:
> >
> > Hi everyone,
> >
> > As already mentioned in the previous discussion thread [1] I'm opening
> up a
> > parallel discussion thread on moving connectors from Flink to external
> > connector repositories. If you haven't read up on this discussion
> before, I
> > recommend reading that one first.
> >
> > The goal with the external connector repositories is to make it easier to
> > develop and release connectors by not being bound to the release cycle of
> > Flink itself. It should result in faster connector releases, a more
> active
> > connector community and a reduced build time for Flink.
> >
> > We currently have the following connectors available in Flink itself:
> >
> > * Kafka -> For DataStream & Table/SQL users
> > * Upsert-Kafka -> For Table/SQL users
> > * Cassandra -> For DataStream users
> > * Elasticsearch -> For DataStream & Table/SQL users
> > * Kinesis -> For DataStream users & Table/SQL users
> > * RabbitMQ -> For DataStream users
> > * Google Cloud PubSub -> For DataStream users
> > * Hybrid Source -> For DataStream users
> > * NiFi -> For DataStream users
> > * Pulsar -> For DataStream users
> > * Twitter -> For DataStream users
> > * JDBC -> For DataStream & Table/SQL users
> > * FileSystem -> For DataStream & Table/SQL users
> > * HBase -> For DataStream & Table/SQL users
> > * DataGen -> For Table/SQL users
> > * Print -> For Table/SQL users
> > * BlackHole -> For Table/SQL users
> > * Hive -> For Table/SQL users
> >
> > I would propose to move out all connectors except Hybrid Source,
> > FileSystem, DataGen, Print and BlackHole because:
> >
> > * We should avoid at all costs that certain connectors are considered as
> > 'Core' connectors. If that happens, it creates a perception that there
> are
> > first-grade/high-quality connectors because they are in 'Core' Flink and
> > second-grade/lesser-quality connectors because they are outside of the
> > Flink codebase. It directly hurts the goal, because these connectors are
> > still bound to the release cycle of Flink. Last but not least, it risks
> any
> > success of external connector repositories since every connector
> > contributor would still want to be in 'Core' Flink.
> > * To continue on the quality of connectors, we should aim that all
> > connectors are of high quality. That means that we shouldn't have a
> > connector that's only available for either DataStream or Table/SQL users,
> > but for both. It also means that (if applicable) the connector should
> > support all options, like bounded and unbounded scan, lookup, batch and
> > streaming sink capabilities. In the end the quality should depend on the
> > maintainers of the connector, not on where the code is maintained.
> > * The Hybrid Source connector is a special connector because of its
> > purpose.
> > * The FileSystem, DataGen, Print and BlackHole connectors are important
> for
> > first time Flink users/testers. If you want to experiment with Flink, you
> > will most likely start with a local file before moving to one of the
> other
> > sources or sinks. These 4 connectors can help with either reading/writing
> > local files or generating/displaying/ignoring data.
> > * Some of the connectors haven't been maintained in a long time (for
> > example, NiFi and Google Cloud PubSub). An argument could be made that we
> > check if we actually want to move such a connector or make the decision
> to
> > drop the connector entirely.
> >
> > I'm looking forward to your thoughts!
> >
> > Best regards,
> >
> > Martijn Visser | Product Manager
> >
> > martijn@ververica.com
> >
> > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> >
> > <https://www.ververica.com/>
> >
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
>
>

Re: [DISCUSS] Moving connectors from Flink to external connector repositories

Posted by Konstantin Knauf <kn...@apache.org>.

Hi Martijn,

makes sense to me. For dropping a connector, I think, we need separate
discussion for each of them and I would not block this effort on these
discussions.

Cheers,

Konstantin

On Fri, Jan 14, 2022 at 10:26 AM Martijn Visser <ma...@ververica.com>
wrote:

> Hi User mailing list,
>
> I'm also forwarding this thread to you. Please let me know if you have any
> comments or feedback!
>
> Best regards,
>
> Martijn
>
> ---------- Forwarded message ---------
> From: Martijn Visser <ma...@ververica.com>
> Date: Fri, 14 Jan 2022 at 06:28
> Subject: Re: [DISCUSS] Moving connectors from Flink to external connector
> repositories
> To: Qingsheng Ren <re...@gmail.com>
> Cc: dev <de...@flink.apache.org>
>
>
> Hi everyone,
>
> If you have any more comments or questions, please let me know. Else I
> would open up a vote on this thread in the next couple of days.
>
> Best regards,
>
> Martijn
>
> On Thu, 6 Jan 2022 at 09:45, Qingsheng Ren <re...@gmail.com> wrote:
>
>> Thanks Martijn for driving this!
>>
>> I’m +1 for Martijn’s proposal. It’s important to avoid making some
>> connectors above others, and all connectors should share the same quality
>> standard. Keeping some basic connectors like FileSystem is reasonable since
>> it’s crucial for new users to try and explore Flink quickly.
>>
>> Another point I’d like to mention is that we need to add more E2E cases
>> using basic connectors in Flink main repo after we moving connectors out.
>> Currently E2E tests are heavily dependent on connectors. It’s essential to
>> keep the coverage and quality of Flink main repo even without these
>> connector’s E2E cases.
>>
>> Best regards,
>>
>> Qingsheng Ren
>>
>>
>> > On Jan 5, 2022, at 9:59 PM, Martijn Visser <ma...@ververica.com>
>> wrote:
>> >
>> > Hi everyone,
>> >
>> > As already mentioned in the previous discussion thread [1] I'm opening
>> up a
>> > parallel discussion thread on moving connectors from Flink to external
>> > connector repositories. If you haven't read up on this discussion
>> before, I
>> > recommend reading that one first.
>> >
>> > The goal with the external connector repositories is to make it easier
>> to
>> > develop and release connectors by not being bound to the release cycle
>> of
>> > Flink itself. It should result in faster connector releases, a more
>> active
>> > connector community and a reduced build time for Flink.
>> >
>> > We currently have the following connectors available in Flink itself:
>> >
>> > * Kafka -> For DataStream & Table/SQL users
>> > * Upsert-Kafka -> For Table/SQL users
>> > * Cassandra -> For DataStream users
>> > * Elasticsearch -> For DataStream & Table/SQL users
>> > * Kinesis -> For DataStream users & Table/SQL users
>> > * RabbitMQ -> For DataStream users
>> > * Google Cloud PubSub -> For DataStream users
>> > * Hybrid Source -> For DataStream users
>> > * NiFi -> For DataStream users
>> > * Pulsar -> For DataStream users
>> > * Twitter -> For DataStream users
>> > * JDBC -> For DataStream & Table/SQL users
>> > * FileSystem -> For DataStream & Table/SQL users
>> > * HBase -> For DataStream & Table/SQL users
>> > * DataGen -> For Table/SQL users
>> > * Print -> For Table/SQL users
>> > * BlackHole -> For Table/SQL users
>> > * Hive -> For Table/SQL users
>> >
>> > I would propose to move out all connectors except Hybrid Source,
>> > FileSystem, DataGen, Print and BlackHole because:
>> >
>> > * We should avoid at all costs that certain connectors are considered as
>> > 'Core' connectors. If that happens, it creates a perception that there
>> are
>> > first-grade/high-quality connectors because they are in 'Core' Flink and
>> > second-grade/lesser-quality connectors because they are outside of the
>> > Flink codebase. It directly hurts the goal, because these connectors are
>> > still bound to the release cycle of Flink. Last but not least, it risks
>> any
>> > success of external connector repositories since every connector
>> > contributor would still want to be in 'Core' Flink.
>> > * To continue on the quality of connectors, we should aim that all
>> > connectors are of high quality. That means that we shouldn't have a
>> > connector that's only available for either DataStream or Table/SQL
>> users,
>> > but for both. It also means that (if applicable) the connector should
>> > support all options, like bounded and unbounded scan, lookup, batch and
>> > streaming sink capabilities. In the end the quality should depend on the
>> > maintainers of the connector, not on where the code is maintained.
>> > * The Hybrid Source connector is a special connector because of its
>> > purpose.
>> > * The FileSystem, DataGen, Print and BlackHole connectors are important
>> for
>> > first time Flink users/testers. If you want to experiment with Flink,
>> you
>> > will most likely start with a local file before moving to one of the
>> other
>> > sources or sinks. These 4 connectors can help with either
>> reading/writing
>> > local files or generating/displaying/ignoring data.
>> > * Some of the connectors haven't been maintained in a long time (for
>> > example, NiFi and Google Cloud PubSub). An argument could be made that
>> we
>> > check if we actually want to move such a connector or make the decision
>> to
>> > drop the connector entirely.
>> >
>> > I'm looking forward to your thoughts!
>> >
>> > Best regards,
>> >
>> > Martijn Visser | Product Manager
>> >
>> > martijn@ververica.com
>> >
>> > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
>> >
>> > <https://www.ververica.com/>
>> >
>> >
>> > Follow us @VervericaData
>> >
>> > --
>> >
>> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> > Conference
>> >
>> > Stream Processing | Event Driven | Real Time
>>
>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk