You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Konstantine Karantasis <ko...@confluent.io> on 2018/09/26 17:42:37 UTC

Re: [EXTERNAL] [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect

Hi Rhys,

thanks for the proposal and apologies for the late feedback. Utilizing
Connect to mirror Kafka topics is definitely a plausible proposal for a
very useful use case.

However, I don't think the apache/kafka repository is the right place to
host such a Connector. Currently, no full-featured, production-ready
connectors are hosted in AK. The only two connectors shipped with AK
(FileStreamSourceConnector and FileStreamSinkConnector) are there to
demonstrate implementations only as examples.

I find this approach very appealing. AK focuses on providing the core
infrastructure for Connect, that is required in every Kafka Connect
deployment, as well as offering the means to generically install, deploy
and operate connectors. But all the connectors reside outside AK and
comprise a vibrant ecosystem of open source and proprietary components
that, essentially - even for the most useful and ubiquitous of the
connectors - are optional for users to install and use. This seems simple
and flexible, both in terms of releasing and using/deploying software
related to Kafka Connect. I might even say that I'd be in favor of
extending this approach to all the Connect components, including
Transformations and Converters.

I'm aware that MirrorMaker is part of AK, but to me this refers to the
early days of Apache Kafka, when the size of the project and the ecosystem
was smaller, Connect and Streams had not been implemented yet, and
mirroring topics between Kafka clusters was already a basic need. With a
much more rich ecosystem now and more sizable and well defined packages in
AK, I think the approach that decouples connectors from the Connect
framework itself is a good one.

In my opinion, the fact that this connector targets Kafka itself as a
source is not an adequate reason to include it in apache/kafka within the
Connect framework. It seems it can evolve naturally, as every other
connector, in its own repository.

Regards,
Konstantine


On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com> wrote:

> Hi All,
>
> If there are no further comments on this KIP I’ll start a vote early this
> week.
>
> Rhys
>
> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
> <ma...@cable.comcast.com>> wrote:
>
> Hi All,
>
> I’ve updated the proposal to include the improvements suggested by
> Stephane.
>
> I have also submitted a PR to implement this functionality into Kafka.
> https://github.com/apache/kafka/pull/5438
>
> I don’t have a benchmark against MirrorMaker yet, as I only currently have
> a local docker stack available to me, though I have seen very good
> performance in that test stack (200k messages/sec@100bytes on limited
> compute resource containers). Further benchmarking might take a few days.
>
> Review and comments would be appreciated.
>
> Cheers,
> Rhys
>
>
> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
> <ma...@cable.comcast.com>> wrote:
>
> Hi Stephane,
>
> Thanks for your feedback and apologies for the delay in my response.
>
> Are there any performance benchmarks against Mirror Maker available? I'm
> interested to know if this is more performant / scalable.
> Regarding the implementation, here's some feedback:
>
>
> Currently I don’t have any performance benchmarks, but I think this is a
> great idea, ill see if I can set up something one the next week or so.
>
> - I think it's worth mentioning that this solution does not rely on
> consumer groups, and therefore tracking progress may be tricky. Can you
> think of a way to expose that?
>
> This is a reasonable concern. I’m not sure how to track this other than
> looking at the Kafka connect offsets. Once a messages is passed to the
> framework, I'm unaware of a way to get at the commit offsets on the
> producer side. Any thoughts?
>
> - Some code can be in config Validator I believe:
>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>
> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> don't see it reflected yet in the code
>
> - Is there a way to be flexible and merge list and regex, or offer the two
> simultaneously ? source_topics=my_static_topic,prefix.* ?
>
> Agree on all of the above - I will incorporate into the code later this
> week as ill get some time back to work on this.
>
> Cheers,
> Rhys
>
>
>
> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
> wrote:
>
> Hi Rhys,
>
> I think this will be a great addition.
>
> Are there any performance benchmarks against Mirror Maker available? I'm
> interested to know if this is more performant / scalable.
> Regarding the implementation, here's some feedback:
>
> - I think it's worth mentioning that this solution does not rely on
> consumer groups, and therefore tracking progress may be tricky. Can you
> think of a way to expose that?
>
>
> - Some code can be in config Validator I believe:
>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>
> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> don't see it reflected yet in the code
>
> - Is there a way to be flexible and merge list and regex, or offer the two
> simultaneously ? source_topics=my_static_topic,prefix.* ?
>
> Hope that helps
> Stephane
>
> Kind regards,
> Stephane
>
> [image: Simple Machines]
>
> Stephane Maarek | Developer
>
> +61 416 575 980
> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
> simplemachines.com.au<http://simplemachines.com.au>
> Level 2, 145 William Street, Sydney NSW 2010
>
> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
> Rhys_McCaig@comcast.com>> wrote:
>
> Hi All,
>
> As I didn’t get any comment on this KIP and there has since been an
> additional 2 KIP’s created numbered 308 since, I'm bumping this and
> renaming the KIP to 310 to remove the duplication:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>
> Let me know if you have any comments or feedback, would love to hear them.
>
> Cheers,
> Rhys
>
> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
> <ma...@comcast.com>>
> wrote:
>
> Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+>
> Connector+to+Kafka+Connect
>
> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
> <ma...@comcast.com>>
> wrote:
>
> Hi All,
>
> I added a KIP to include a Kafka Source Connector with Kafka Connect.
> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> <htt
> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>
> Looking forward to your feedback and suggestions.
>
> Cheers,
> Rhys
>
>
>
>
>
>
>
>
>

Re: [EXTERNAL] Apache Kafka project charter

Posted by "McCaig, Rhys" <Rh...@comcast.com>.
Ewen, 

Thanks for your comments on this - It’s really useful context and it would be great to include a note summarizing this in the contributor guidelines, especially for connectors.

Unfortunately I missed (sorry) the “rejected alternatives” discussed in KIP-26 when I wrote KIP-310. It probably wouldn’t have stopped me from submitting the KIP in the first place, but I would have spent more time discussing the motivation behind adding a KIP for a Kafka specific connector. 

> 2. There's clear, committed maintainership to keep the component healthy;
> this includes committer throughput for the feature itself, subsequent
> bugfixes, KIPs, etc. (and tbh, this is thin currently on Connect features,
> something I'm very aware of)

You’re right - this is IMO the most important consideration when accepting “new feature” contributions to an open source project. My current day job has me working with Kafka, and my employer is happy to have me work on the connector as required, but this could of course change in the future. My hope with KIP-310 was to hopefully get enough community support for the connector that folks could be comfortable that if I have to step away, then others see enough value in it to continue to update it as Kafka evolves - at this stage I think its fair to say that this bar hasn’t been met. 

Cheers,
Rhys

> On Sep 30, 2018, at 9:49 PM, Ewen Cheslack-Postava <ew...@confluent.io> wrote:
> 
> Hey all,
> 
> Sorry I haven't been closely following the threads on this, but I think I
> can provide a bit more color.
> 
> Jakub, re: general policy, I'll take the blame that the relevant "rejected
> alternatives" section in the KIP
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767#KIP-26-AddKafkaConnectframeworkfordataimport/export-Maintainconnectorsintheprojectalongwithframework
> never made it into documentation. That means everything related to that
> decision is currently locked up in the KIP or, possibly worse, the
> difficult-to-search mailing list archives.
> 
> The reasoning for that decision was to control the scope of what gets added
> to AK itself, scope expectations re: maintenance, and frankly make cases
> like this more clear cut, rather than subjective case-by-case decisions.
> There are plenty of other examples of other projects of similar pluggable
> structure where ownership, maintenance, responsibility for quality, and
> more become *really* hard to reason about because *some* things are
> upstreamed into the main project, others aren't, maintainers come and go,
> etc. (see logstash, Flume, etc) Pushing basically everything except for a
> simple example out to be community maintained helps make many of these
> characteristics clear: look to the maintainer of the connector for guidance
> on support level, compatibility, commitment of the maintainer, etc. Apache
> Kafka will maintain the core framework.
> 
> KIP-310 is, admittedly, a bit of an odd case as it is all about Kafka. It's
> much harder for the Apache Kafka project to commit to maintaining, for
> example, a MongoDB connector than one that deals specifically with Kafka.
> The fact that MM has been included and maintained certainly muddies the
> waters here.
> 
> On the flip side, consider that although it's generally implicit in KIP
> discussions, all KIPs come with the consideration of whether something
> makes sense to support by the Apache Kafka project vs supported by the
> broader community. e.g. KIP-10 (Mesos integration), KIP-69 (schema
> registry), KIP-80 (REST server), KIP-127 (Pluggable JAAS LoginModule, which
> was better implemented by a more general KIP), etc. There's a legitimate
> concern wrt whether the Apache project can provide the throughput to
> maintain substantial new chunks of code, in additional to the large surface
> area we already have, maintain quality, provide the same level of
> compatibility and testing we provide today, etc. Sometimes it makes sense
> for a community member or team to maintain these projects; note that
> connectors are far from the first example of this -- non-Java clients have
> never been maintained by the Apache project for this same reason -- there
> wasn't sufficient expertise or throughput to do so.
> 
> wrt a single connector, I actually don't care much where it is maintained
> as long as:
> 
> 1. Policy for in/out for other connectors in the project is clear. To date,
> we haven't really had a question about this until now.
> 2. There's clear, committed maintainership to keep the component healthy;
> this includes committer throughput for the feature itself, subsequent
> bugfixes, KIPs, etc. (and tbh, this is thin currently on Connect features,
> something I'm very aware of)
> 3. If it lands in Apache Kafka, there's an immediate compatibility
> commitment (compatibility across versions of AK broker  & Connect, as well
> as upgrade & API & config compatibility for the connector itself.
> 
> -Ewen
> 
> On Sun, Sep 30, 2018 at 1:44 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> I am not aware of anything like this. And I also think, it's difficult
>> to generalize. So far, each feature is discussed on a per-case basis.
>> 
>> Because it's hard to draw the boarder line we might be too restrictive
>> or too loose in a "project charter", thus, scaring people from starting
>> KIPs, what would be bad for the community and the project IMHO.
>> 
>> I also think that the overhead of writing a KIP is not too large, and
>> thus the risk (and "wasted time") that a KIP is rejected because "not
>> part of the project" is rather small. Also, anybody could suggest a
>> feature and collect feedback on the mailing list even before a concrete
>> KIP is proposed.
>> 
>> Just my 2 cents.
>> 
>> 
>> -Matthias
>> 
>> 
>> 
>> On 9/29/18 4:31 AM, Jakub Scholz wrote:
>>> Hi community,
>>> 
>>> I noticed following argument in the discussion about KIP-310.
>>> 
>>>> However, I don't think the apache/kafka repository is the right place to
>>> host such a Connector.
>>> 
>>> I was wondering whether there is some project charter describing what
>> does
>>> and what does not belong to the Apache Kafka project. I tried to search
>> for
>>> it, but I haven't found anything.
>>> 
>>> If nothing like that exists, I wonder if we should write something. I
>> think
>>> its not very community friendly to let people write the KIP just to get a
>>> feedback like this. By that I do not mean that the point raised by
>>> Konstantine is necessarily wrong. All I'm trying to say is that I think
>>> there should be some project charter which would describe what does and
>>> doesn't belong into Apache Kafka to make it clear to everyone before
>>> someone starts writing a KIP.
>>> 
>>> WDYT? Does something like that already exist?
>>> 
>>> Thanks & Regards
>>> Jakub
>>> 
>>> On Wed, Sep 26, 2018 at 7:43 PM Konstantine Karantasis <
>>> konstantine@confluent.io> wrote:
>>> 
>>>> Hi Rhys,
>>>> 
>>>> thanks for the proposal and apologies for the late feedback. Utilizing
>>>> Connect to mirror Kafka topics is definitely a plausible proposal for a
>>>> very useful use case.
>>>> 
>>>> However, I don't think the apache/kafka repository is the right place to
>>>> host such a Connector. Currently, no full-featured, production-ready
>>>> connectors are hosted in AK. The only two connectors shipped with AK
>>>> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
>>>> demonstrate implementations only as examples.
>>>> 
>>>> I find this approach very appealing. AK focuses on providing the core
>>>> infrastructure for Connect, that is required in every Kafka Connect
>>>> deployment, as well as offering the means to generically install, deploy
>>>> and operate connectors. But all the connectors reside outside AK and
>>>> comprise a vibrant ecosystem of open source and proprietary components
>>>> that, essentially - even for the most useful and ubiquitous of the
>>>> connectors - are optional for users to install and use. This seems
>> simple
>>>> and flexible, both in terms of releasing and using/deploying software
>>>> related to Kafka Connect. I might even say that I'd be in favor of
>>>> extending this approach to all the Connect components, including
>>>> Transformations and Converters.
>>>> 
>>>> I'm aware that MirrorMaker is part of AK, but to me this refers to the
>>>> early days of Apache Kafka, when the size of the project and the
>> ecosystem
>>>> was smaller, Connect and Streams had not been implemented yet, and
>>>> mirroring topics between Kafka clusters was already a basic need. With a
>>>> much more rich ecosystem now and more sizable and well defined packages
>> in
>>>> AK, I think the approach that decouples connectors from the Connect
>>>> framework itself is a good one.
>>>> 
>>>> In my opinion, the fact that this connector targets Kafka itself as a
>>>> source is not an adequate reason to include it in apache/kafka within
>> the
>>>> Connect framework. It seems it can evolve naturally, as every other
>>>> connector, in its own repository.
>>>> 
>>>> Regards,
>>>> Konstantine
>>>> 
>>>> 
>>>> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> If there are no further comments on this KIP I’ll start a vote early
>> this
>>>>> week.
>>>>> 
>>>>> Rhys
>>>>> 
>>>>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <
>> Rhys_McCaig@cable.comcast.com
>>>>> <ma...@cable.comcast.com>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I’ve updated the proposal to include the improvements suggested by
>>>>> Stephane.
>>>>> 
>>>>> I have also submitted a PR to implement this functionality into Kafka.
>>>>> https://github.com/apache/kafka/pull/5438
>>>>> 
>>>>> I don’t have a benchmark against MirrorMaker yet, as I only currently
>>>> have
>>>>> a local docker stack available to me, though I have seen very good
>>>>> performance in that test stack (200k messages/sec@100bytes on limited
>>>>> compute resource containers). Further benchmarking might take a few
>> days.
>>>>> 
>>>>> Review and comments would be appreciated.
>>>>> 
>>>>> Cheers,
>>>>> Rhys
>>>>> 
>>>>> 
>>>>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <
>> Rhys_McCaig@cable.comcast.com
>>>>> <ma...@cable.comcast.com>> wrote:
>>>>> 
>>>>> Hi Stephane,
>>>>> 
>>>>> Thanks for your feedback and apologies for the delay in my response.
>>>>> 
>>>>> Are there any performance benchmarks against Mirror Maker available?
>> I'm
>>>>> interested to know if this is more performant / scalable.
>>>>> Regarding the implementation, here's some feedback:
>>>>> 
>>>>> 
>>>>> Currently I don’t have any performance benchmarks, but I think this is
>> a
>>>>> great idea, ill see if I can set up something one the next week or so.
>>>>> 
>>>>> - I think it's worth mentioning that this solution does not rely on
>>>>> consumer groups, and therefore tracking progress may be tricky. Can you
>>>>> think of a way to expose that?
>>>>> 
>>>>> This is a reasonable concern. I’m not sure how to track this other than
>>>>> looking at the Kafka connect offsets. Once a messages is passed to the
>>>>> framework, I'm unaware of a way to get at the commit offsets on the
>>>>> producer side. Any thoughts?
>>>>> 
>>>>> - Some code can be in config Validator I believe:
>>>>> 
>>>>> 
>>>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>>>>> 
>>>>> - I think your kip mentions `source.admin.` and `source.consumer.` but
>> I
>>>>> don't see it reflected yet in the code
>>>>> 
>>>>> - Is there a way to be flexible and merge list and regex, or offer the
>>>> two
>>>>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>>>>> 
>>>>> Agree on all of the above - I will incorporate into the code later this
>>>>> week as ill get some time back to work on this.
>>>>> 
>>>>> Cheers,
>>>>> Rhys
>>>>> 
>>>>> 
>>>>> 
>>>>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
>>>>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
>>>>> wrote:
>>>>> 
>>>>> Hi Rhys,
>>>>> 
>>>>> I think this will be a great addition.
>>>>> 
>>>>> Are there any performance benchmarks against Mirror Maker available?
>> I'm
>>>>> interested to know if this is more performant / scalable.
>>>>> Regarding the implementation, here's some feedback:
>>>>> 
>>>>> - I think it's worth mentioning that this solution does not rely on
>>>>> consumer groups, and therefore tracking progress may be tricky. Can you
>>>>> think of a way to expose that?
>>>>> 
>>>>> 
>>>>> - Some code can be in config Validator I believe:
>>>>> 
>>>>> 
>>>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>>>>> 
>>>>> - I think your kip mentions `source.admin.` and `source.consumer.` but
>> I
>>>>> don't see it reflected yet in the code
>>>>> 
>>>>> - Is there a way to be flexible and merge list and regex, or offer the
>>>> two
>>>>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>>>>> 
>>>>> Hope that helps
>>>>> Stephane
>>>>> 
>>>>> Kind regards,
>>>>> Stephane
>>>>> 
>>>>> [image: Simple Machines]
>>>>> 
>>>>> Stephane Maarek | Developer
>>>>> 
>>>>> +61 416 575 980
>>>>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
>>>>> simplemachines.com.au<http://simplemachines.com.au>
>>>>> Level 2, 145 William Street, Sydney NSW 2010
>>>>> 
>>>>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
>>>>> Rhys_McCaig@comcast.com>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> As I didn’t get any comment on this KIP and there has since been an
>>>>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
>>>>> renaming the KIP to 310 to remove the duplication:
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>>>> 
>>>>> Let me know if you have any comments or feedback, would love to hear
>>>> them.
>>>>> 
>>>>> Cheers,
>>>>> Rhys
>>>>> 
>>>>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
>>>>> <ma...@comcast.com>>
>>>>> wrote:
>>>>> 
>>>>> Sorry for the bad link to the KIP, here it is:
>> https://cwiki.apache.org/
>>>>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>>>>> 
>>>>> Connector+to+Kafka+Connect
>>>>> 
>>>>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
>>>>> <ma...@comcast.com>>
>>>>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
>>>>> Here is the KIP:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>>>> 
>>>>> <htt
>>>>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>>>>> 
>>>>> Looking forward to your feedback and suggestions.
>>>>> 
>>>>> Cheers,
>>>>> Rhys
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 


Re: Apache Kafka project charter

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Hey all,

Sorry I haven't been closely following the threads on this, but I think I
can provide a bit more color.

Jakub, re: general policy, I'll take the blame that the relevant "rejected
alternatives" section in the KIP
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767#KIP-26-AddKafkaConnectframeworkfordataimport/export-Maintainconnectorsintheprojectalongwithframework
never made it into documentation. That means everything related to that
decision is currently locked up in the KIP or, possibly worse, the
difficult-to-search mailing list archives.

The reasoning for that decision was to control the scope of what gets added
to AK itself, scope expectations re: maintenance, and frankly make cases
like this more clear cut, rather than subjective case-by-case decisions.
There are plenty of other examples of other projects of similar pluggable
structure where ownership, maintenance, responsibility for quality, and
more become *really* hard to reason about because *some* things are
upstreamed into the main project, others aren't, maintainers come and go,
etc. (see logstash, Flume, etc) Pushing basically everything except for a
simple example out to be community maintained helps make many of these
characteristics clear: look to the maintainer of the connector for guidance
on support level, compatibility, commitment of the maintainer, etc. Apache
Kafka will maintain the core framework.

KIP-310 is, admittedly, a bit of an odd case as it is all about Kafka. It's
much harder for the Apache Kafka project to commit to maintaining, for
example, a MongoDB connector than one that deals specifically with Kafka.
The fact that MM has been included and maintained certainly muddies the
waters here.

On the flip side, consider that although it's generally implicit in KIP
discussions, all KIPs come with the consideration of whether something
makes sense to support by the Apache Kafka project vs supported by the
broader community. e.g. KIP-10 (Mesos integration), KIP-69 (schema
registry), KIP-80 (REST server), KIP-127 (Pluggable JAAS LoginModule, which
was better implemented by a more general KIP), etc. There's a legitimate
concern wrt whether the Apache project can provide the throughput to
maintain substantial new chunks of code, in additional to the large surface
area we already have, maintain quality, provide the same level of
compatibility and testing we provide today, etc. Sometimes it makes sense
for a community member or team to maintain these projects; note that
connectors are far from the first example of this -- non-Java clients have
never been maintained by the Apache project for this same reason -- there
wasn't sufficient expertise or throughput to do so.

wrt a single connector, I actually don't care much where it is maintained
as long as:

1. Policy for in/out for other connectors in the project is clear. To date,
we haven't really had a question about this until now.
2. There's clear, committed maintainership to keep the component healthy;
this includes committer throughput for the feature itself, subsequent
bugfixes, KIPs, etc. (and tbh, this is thin currently on Connect features,
something I'm very aware of)
3. If it lands in Apache Kafka, there's an immediate compatibility
commitment (compatibility across versions of AK broker  & Connect, as well
as upgrade & API & config compatibility for the connector itself.

-Ewen

On Sun, Sep 30, 2018 at 1:44 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> I am not aware of anything like this. And I also think, it's difficult
> to generalize. So far, each feature is discussed on a per-case basis.
>
> Because it's hard to draw the boarder line we might be too restrictive
> or too loose in a "project charter", thus, scaring people from starting
> KIPs, what would be bad for the community and the project IMHO.
>
> I also think that the overhead of writing a KIP is not too large, and
> thus the risk (and "wasted time") that a KIP is rejected because "not
> part of the project" is rather small. Also, anybody could suggest a
> feature and collect feedback on the mailing list even before a concrete
> KIP is proposed.
>
> Just my 2 cents.
>
>
> -Matthias
>
>
>
> On 9/29/18 4:31 AM, Jakub Scholz wrote:
> > Hi community,
> >
> > I noticed following argument in the discussion about KIP-310.
> >
> >> However, I don't think the apache/kafka repository is the right place to
> > host such a Connector.
> >
> > I was wondering whether there is some project charter describing what
> does
> > and what does not belong to the Apache Kafka project. I tried to search
> for
> > it, but I haven't found anything.
> >
> > If nothing like that exists, I wonder if we should write something. I
> think
> > its not very community friendly to let people write the KIP just to get a
> > feedback like this. By that I do not mean that the point raised by
> > Konstantine is necessarily wrong. All I'm trying to say is that I think
> > there should be some project charter which would describe what does and
> > doesn't belong into Apache Kafka to make it clear to everyone before
> > someone starts writing a KIP.
> >
> > WDYT? Does something like that already exist?
> >
> > Thanks & Regards
> > Jakub
> >
> > On Wed, Sep 26, 2018 at 7:43 PM Konstantine Karantasis <
> > konstantine@confluent.io> wrote:
> >
> >> Hi Rhys,
> >>
> >> thanks for the proposal and apologies for the late feedback. Utilizing
> >> Connect to mirror Kafka topics is definitely a plausible proposal for a
> >> very useful use case.
> >>
> >> However, I don't think the apache/kafka repository is the right place to
> >> host such a Connector. Currently, no full-featured, production-ready
> >> connectors are hosted in AK. The only two connectors shipped with AK
> >> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> >> demonstrate implementations only as examples.
> >>
> >> I find this approach very appealing. AK focuses on providing the core
> >> infrastructure for Connect, that is required in every Kafka Connect
> >> deployment, as well as offering the means to generically install, deploy
> >> and operate connectors. But all the connectors reside outside AK and
> >> comprise a vibrant ecosystem of open source and proprietary components
> >> that, essentially - even for the most useful and ubiquitous of the
> >> connectors - are optional for users to install and use. This seems
> simple
> >> and flexible, both in terms of releasing and using/deploying software
> >> related to Kafka Connect. I might even say that I'd be in favor of
> >> extending this approach to all the Connect components, including
> >> Transformations and Converters.
> >>
> >> I'm aware that MirrorMaker is part of AK, but to me this refers to the
> >> early days of Apache Kafka, when the size of the project and the
> ecosystem
> >> was smaller, Connect and Streams had not been implemented yet, and
> >> mirroring topics between Kafka clusters was already a basic need. With a
> >> much more rich ecosystem now and more sizable and well defined packages
> in
> >> AK, I think the approach that decouples connectors from the Connect
> >> framework itself is a good one.
> >>
> >> In my opinion, the fact that this connector targets Kafka itself as a
> >> source is not an adequate reason to include it in apache/kafka within
> the
> >> Connect framework. It seems it can evolve naturally, as every other
> >> connector, in its own repository.
> >>
> >> Regards,
> >> Konstantine
> >>
> >>
> >> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> If there are no further comments on this KIP I’ll start a vote early
> this
> >>> week.
> >>>
> >>> Rhys
> >>>
> >>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <
> Rhys_McCaig@cable.comcast.com
> >>> <ma...@cable.comcast.com>> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I’ve updated the proposal to include the improvements suggested by
> >>> Stephane.
> >>>
> >>> I have also submitted a PR to implement this functionality into Kafka.
> >>> https://github.com/apache/kafka/pull/5438
> >>>
> >>> I don’t have a benchmark against MirrorMaker yet, as I only currently
> >> have
> >>> a local docker stack available to me, though I have seen very good
> >>> performance in that test stack (200k messages/sec@100bytes on limited
> >>> compute resource containers). Further benchmarking might take a few
> days.
> >>>
> >>> Review and comments would be appreciated.
> >>>
> >>> Cheers,
> >>> Rhys
> >>>
> >>>
> >>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <
> Rhys_McCaig@cable.comcast.com
> >>> <ma...@cable.comcast.com>> wrote:
> >>>
> >>> Hi Stephane,
> >>>
> >>> Thanks for your feedback and apologies for the delay in my response.
> >>>
> >>> Are there any performance benchmarks against Mirror Maker available?
> I'm
> >>> interested to know if this is more performant / scalable.
> >>> Regarding the implementation, here's some feedback:
> >>>
> >>>
> >>> Currently I don’t have any performance benchmarks, but I think this is
> a
> >>> great idea, ill see if I can set up something one the next week or so.
> >>>
> >>> - I think it's worth mentioning that this solution does not rely on
> >>> consumer groups, and therefore tracking progress may be tricky. Can you
> >>> think of a way to expose that?
> >>>
> >>> This is a reasonable concern. I’m not sure how to track this other than
> >>> looking at the Kafka connect offsets. Once a messages is passed to the
> >>> framework, I'm unaware of a way to get at the commit offsets on the
> >>> producer side. Any thoughts?
> >>>
> >>> - Some code can be in config Validator I believe:
> >>>
> >>>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>>
> >>> - I think your kip mentions `source.admin.` and `source.consumer.` but
> I
> >>> don't see it reflected yet in the code
> >>>
> >>> - Is there a way to be flexible and merge list and regex, or offer the
> >> two
> >>> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>>
> >>> Agree on all of the above - I will incorporate into the code later this
> >>> week as ill get some time back to work on this.
> >>>
> >>> Cheers,
> >>> Rhys
> >>>
> >>>
> >>>
> >>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
> >>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
> >>> wrote:
> >>>
> >>> Hi Rhys,
> >>>
> >>> I think this will be a great addition.
> >>>
> >>> Are there any performance benchmarks against Mirror Maker available?
> I'm
> >>> interested to know if this is more performant / scalable.
> >>> Regarding the implementation, here's some feedback:
> >>>
> >>> - I think it's worth mentioning that this solution does not rely on
> >>> consumer groups, and therefore tracking progress may be tricky. Can you
> >>> think of a way to expose that?
> >>>
> >>>
> >>> - Some code can be in config Validator I believe:
> >>>
> >>>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>>
> >>> - I think your kip mentions `source.admin.` and `source.consumer.` but
> I
> >>> don't see it reflected yet in the code
> >>>
> >>> - Is there a way to be flexible and merge list and regex, or offer the
> >> two
> >>> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>>
> >>> Hope that helps
> >>> Stephane
> >>>
> >>> Kind regards,
> >>> Stephane
> >>>
> >>> [image: Simple Machines]
> >>>
> >>> Stephane Maarek | Developer
> >>>
> >>> +61 416 575 980
> >>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
> >>> simplemachines.com.au<http://simplemachines.com.au>
> >>> Level 2, 145 William Street, Sydney NSW 2010
> >>>
> >>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
> >>> Rhys_McCaig@comcast.com>> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> As I didn’t get any comment on this KIP and there has since been an
> >>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
> >>> renaming the KIP to 310 to remove the duplication:
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >>>
> >>> Let me know if you have any comments or feedback, would love to hear
> >> them.
> >>>
> >>> Cheers,
> >>> Rhys
> >>>
> >>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
> >>> <ma...@comcast.com>>
> >>> wrote:
> >>>
> >>> Sorry for the bad link to the KIP, here it is:
> https://cwiki.apache.org/
> >>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >>>
> >>> Connector+to+Kafka+Connect
> >>>
> >>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
> >>> <ma...@comcast.com>>
> >>> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
> >>> Here is the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >>>
> >>> <htt
> >>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> >>>
> >>> Looking forward to your feedback and suggestions.
> >>>
> >>> Cheers,
> >>> Rhys
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
>
>

Re: Apache Kafka project charter

Posted by "Matthias J. Sax" <ma...@confluent.io>.
I am not aware of anything like this. And I also think, it's difficult
to generalize. So far, each feature is discussed on a per-case basis.

Because it's hard to draw the boarder line we might be too restrictive
or too loose in a "project charter", thus, scaring people from starting
KIPs, what would be bad for the community and the project IMHO.

I also think that the overhead of writing a KIP is not too large, and
thus the risk (and "wasted time") that a KIP is rejected because "not
part of the project" is rather small. Also, anybody could suggest a
feature and collect feedback on the mailing list even before a concrete
KIP is proposed.

Just my 2 cents.


-Matthias



On 9/29/18 4:31 AM, Jakub Scholz wrote:
> Hi community,
> 
> I noticed following argument in the discussion about KIP-310.
> 
>> However, I don't think the apache/kafka repository is the right place to
> host such a Connector.
> 
> I was wondering whether there is some project charter describing what does
> and what does not belong to the Apache Kafka project. I tried to search for
> it, but I haven't found anything.
> 
> If nothing like that exists, I wonder if we should write something. I think
> its not very community friendly to let people write the KIP just to get a
> feedback like this. By that I do not mean that the point raised by
> Konstantine is necessarily wrong. All I'm trying to say is that I think
> there should be some project charter which would describe what does and
> doesn't belong into Apache Kafka to make it clear to everyone before
> someone starts writing a KIP.
> 
> WDYT? Does something like that already exist?
> 
> Thanks & Regards
> Jakub
> 
> On Wed, Sep 26, 2018 at 7:43 PM Konstantine Karantasis <
> konstantine@confluent.io> wrote:
> 
>> Hi Rhys,
>>
>> thanks for the proposal and apologies for the late feedback. Utilizing
>> Connect to mirror Kafka topics is definitely a plausible proposal for a
>> very useful use case.
>>
>> However, I don't think the apache/kafka repository is the right place to
>> host such a Connector. Currently, no full-featured, production-ready
>> connectors are hosted in AK. The only two connectors shipped with AK
>> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
>> demonstrate implementations only as examples.
>>
>> I find this approach very appealing. AK focuses on providing the core
>> infrastructure for Connect, that is required in every Kafka Connect
>> deployment, as well as offering the means to generically install, deploy
>> and operate connectors. But all the connectors reside outside AK and
>> comprise a vibrant ecosystem of open source and proprietary components
>> that, essentially - even for the most useful and ubiquitous of the
>> connectors - are optional for users to install and use. This seems simple
>> and flexible, both in terms of releasing and using/deploying software
>> related to Kafka Connect. I might even say that I'd be in favor of
>> extending this approach to all the Connect components, including
>> Transformations and Converters.
>>
>> I'm aware that MirrorMaker is part of AK, but to me this refers to the
>> early days of Apache Kafka, when the size of the project and the ecosystem
>> was smaller, Connect and Streams had not been implemented yet, and
>> mirroring topics between Kafka clusters was already a basic need. With a
>> much more rich ecosystem now and more sizable and well defined packages in
>> AK, I think the approach that decouples connectors from the Connect
>> framework itself is a good one.
>>
>> In my opinion, the fact that this connector targets Kafka itself as a
>> source is not an adequate reason to include it in apache/kafka within the
>> Connect framework. It seems it can evolve naturally, as every other
>> connector, in its own repository.
>>
>> Regards,
>> Konstantine
>>
>>
>> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> If there are no further comments on this KIP I’ll start a vote early this
>>> week.
>>>
>>> Rhys
>>>
>>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>>> <ma...@cable.comcast.com>> wrote:
>>>
>>> Hi All,
>>>
>>> I’ve updated the proposal to include the improvements suggested by
>>> Stephane.
>>>
>>> I have also submitted a PR to implement this functionality into Kafka.
>>> https://github.com/apache/kafka/pull/5438
>>>
>>> I don’t have a benchmark against MirrorMaker yet, as I only currently
>> have
>>> a local docker stack available to me, though I have seen very good
>>> performance in that test stack (200k messages/sec@100bytes on limited
>>> compute resource containers). Further benchmarking might take a few days.
>>>
>>> Review and comments would be appreciated.
>>>
>>> Cheers,
>>> Rhys
>>>
>>>
>>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>>> <ma...@cable.comcast.com>> wrote:
>>>
>>> Hi Stephane,
>>>
>>> Thanks for your feedback and apologies for the delay in my response.
>>>
>>> Are there any performance benchmarks against Mirror Maker available? I'm
>>> interested to know if this is more performant / scalable.
>>> Regarding the implementation, here's some feedback:
>>>
>>>
>>> Currently I don’t have any performance benchmarks, but I think this is a
>>> great idea, ill see if I can set up something one the next week or so.
>>>
>>> - I think it's worth mentioning that this solution does not rely on
>>> consumer groups, and therefore tracking progress may be tricky. Can you
>>> think of a way to expose that?
>>>
>>> This is a reasonable concern. I’m not sure how to track this other than
>>> looking at the Kafka connect offsets. Once a messages is passed to the
>>> framework, I'm unaware of a way to get at the commit offsets on the
>>> producer side. Any thoughts?
>>>
>>> - Some code can be in config Validator I believe:
>>>
>>>
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>>>
>>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>>> don't see it reflected yet in the code
>>>
>>> - Is there a way to be flexible and merge list and regex, or offer the
>> two
>>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>>>
>>> Agree on all of the above - I will incorporate into the code later this
>>> week as ill get some time back to work on this.
>>>
>>> Cheers,
>>> Rhys
>>>
>>>
>>>
>>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
>>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
>>> wrote:
>>>
>>> Hi Rhys,
>>>
>>> I think this will be a great addition.
>>>
>>> Are there any performance benchmarks against Mirror Maker available? I'm
>>> interested to know if this is more performant / scalable.
>>> Regarding the implementation, here's some feedback:
>>>
>>> - I think it's worth mentioning that this solution does not rely on
>>> consumer groups, and therefore tracking progress may be tricky. Can you
>>> think of a way to expose that?
>>>
>>>
>>> - Some code can be in config Validator I believe:
>>>
>>>
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>>>
>>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>>> don't see it reflected yet in the code
>>>
>>> - Is there a way to be flexible and merge list and regex, or offer the
>> two
>>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>>>
>>> Hope that helps
>>> Stephane
>>>
>>> Kind regards,
>>> Stephane
>>>
>>> [image: Simple Machines]
>>>
>>> Stephane Maarek | Developer
>>>
>>> +61 416 575 980
>>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
>>> simplemachines.com.au<http://simplemachines.com.au>
>>> Level 2, 145 William Street, Sydney NSW 2010
>>>
>>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
>>> Rhys_McCaig@comcast.com>> wrote:
>>>
>>> Hi All,
>>>
>>> As I didn’t get any comment on this KIP and there has since been an
>>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
>>> renaming the KIP to 310 to remove the duplication:
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>>
>>> Let me know if you have any comments or feedback, would love to hear
>> them.
>>>
>>> Cheers,
>>> Rhys
>>>
>>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
>>> <ma...@comcast.com>>
>>> wrote:
>>>
>>> Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
>>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>>>
>>> Connector+to+Kafka+Connect
>>>
>>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
>>> <ma...@comcast.com>>
>>> wrote:
>>>
>>> Hi All,
>>>
>>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
>>> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>>>
>>> <htt
>>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>>>
>>> Looking forward to your feedback and suggestions.
>>>
>>> Cheers,
>>> Rhys
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
> 


Apache Kafka project charter

Posted by Jakub Scholz <ja...@scholz.cz>.
Hi community,

I noticed following argument in the discussion about KIP-310.

> However, I don't think the apache/kafka repository is the right place to
host such a Connector.

I was wondering whether there is some project charter describing what does
and what does not belong to the Apache Kafka project. I tried to search for
it, but I haven't found anything.

If nothing like that exists, I wonder if we should write something. I think
its not very community friendly to let people write the KIP just to get a
feedback like this. By that I do not mean that the point raised by
Konstantine is necessarily wrong. All I'm trying to say is that I think
there should be some project charter which would describe what does and
doesn't belong into Apache Kafka to make it clear to everyone before
someone starts writing a KIP.

WDYT? Does something like that already exist?

Thanks & Regards
Jakub

On Wed, Sep 26, 2018 at 7:43 PM Konstantine Karantasis <
konstantine@confluent.io> wrote:

> Hi Rhys,
>
> thanks for the proposal and apologies for the late feedback. Utilizing
> Connect to mirror Kafka topics is definitely a plausible proposal for a
> very useful use case.
>
> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. Currently, no full-featured, production-ready
> connectors are hosted in AK. The only two connectors shipped with AK
> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> demonstrate implementations only as examples.
>
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors. But all the connectors reside outside AK and
> comprise a vibrant ecosystem of open source and proprietary components
> that, essentially - even for the most useful and ubiquitous of the
> connectors - are optional for users to install and use. This seems simple
> and flexible, both in terms of releasing and using/deploying software
> related to Kafka Connect. I might even say that I'd be in favor of
> extending this approach to all the Connect components, including
> Transformations and Converters.
>
> I'm aware that MirrorMaker is part of AK, but to me this refers to the
> early days of Apache Kafka, when the size of the project and the ecosystem
> was smaller, Connect and Streams had not been implemented yet, and
> mirroring topics between Kafka clusters was already a basic need. With a
> much more rich ecosystem now and more sizable and well defined packages in
> AK, I think the approach that decouples connectors from the Connect
> framework itself is a good one.
>
> In my opinion, the fact that this connector targets Kafka itself as a
> source is not an adequate reason to include it in apache/kafka within the
> Connect framework. It seems it can evolve naturally, as every other
> connector, in its own repository.
>
> Regards,
> Konstantine
>
>
> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com>
> wrote:
>
> > Hi All,
> >
> > If there are no further comments on this KIP I’ll start a vote early this
> > week.
> >
> > Rhys
> >
> > On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
> > <ma...@cable.comcast.com>> wrote:
> >
> > Hi All,
> >
> > I’ve updated the proposal to include the improvements suggested by
> > Stephane.
> >
> > I have also submitted a PR to implement this functionality into Kafka.
> > https://github.com/apache/kafka/pull/5438
> >
> > I don’t have a benchmark against MirrorMaker yet, as I only currently
> have
> > a local docker stack available to me, though I have seen very good
> > performance in that test stack (200k messages/sec@100bytes on limited
> > compute resource containers). Further benchmarking might take a few days.
> >
> > Review and comments would be appreciated.
> >
> > Cheers,
> > Rhys
> >
> >
> > On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
> > <ma...@cable.comcast.com>> wrote:
> >
> > Hi Stephane,
> >
> > Thanks for your feedback and apologies for the delay in my response.
> >
> > Are there any performance benchmarks against Mirror Maker available? I'm
> > interested to know if this is more performant / scalable.
> > Regarding the implementation, here's some feedback:
> >
> >
> > Currently I don’t have any performance benchmarks, but I think this is a
> > great idea, ill see if I can set up something one the next week or so.
> >
> > - I think it's worth mentioning that this solution does not rely on
> > consumer groups, and therefore tracking progress may be tricky. Can you
> > think of a way to expose that?
> >
> > This is a reasonable concern. I’m not sure how to track this other than
> > looking at the Kafka connect offsets. Once a messages is passed to the
> > framework, I'm unaware of a way to get at the commit offsets on the
> > producer side. Any thoughts?
> >
> > - Some code can be in config Validator I believe:
> >
> >
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >
> > - I think your kip mentions `source.admin.` and `source.consumer.` but I
> > don't see it reflected yet in the code
> >
> > - Is there a way to be flexible and merge list and regex, or offer the
> two
> > simultaneously ? source_topics=my_static_topic,prefix.* ?
> >
> > Agree on all of the above - I will incorporate into the code later this
> > week as ill get some time back to work on this.
> >
> > Cheers,
> > Rhys
> >
> >
> >
> > On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
> > stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
> > wrote:
> >
> > Hi Rhys,
> >
> > I think this will be a great addition.
> >
> > Are there any performance benchmarks against Mirror Maker available? I'm
> > interested to know if this is more performant / scalable.
> > Regarding the implementation, here's some feedback:
> >
> > - I think it's worth mentioning that this solution does not rely on
> > consumer groups, and therefore tracking progress may be tricky. Can you
> > think of a way to expose that?
> >
> >
> > - Some code can be in config Validator I believe:
> >
> >
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >
> > - I think your kip mentions `source.admin.` and `source.consumer.` but I
> > don't see it reflected yet in the code
> >
> > - Is there a way to be flexible and merge list and regex, or offer the
> two
> > simultaneously ? source_topics=my_static_topic,prefix.* ?
> >
> > Hope that helps
> > Stephane
> >
> > Kind regards,
> > Stephane
> >
> > [image: Simple Machines]
> >
> > Stephane Maarek | Developer
> >
> > +61 416 575 980
> > stephane@simplemachines.com.au<ma...@simplemachines.com.au>
> > simplemachines.com.au<http://simplemachines.com.au>
> > Level 2, 145 William Street, Sydney NSW 2010
> >
> > On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
> > Rhys_McCaig@comcast.com>> wrote:
> >
> > Hi All,
> >
> > As I didn’t get any comment on this KIP and there has since been an
> > additional 2 KIP’s created numbered 308 since, I'm bumping this and
> > renaming the KIP to 310 to remove the duplication:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >
> > Let me know if you have any comments or feedback, would love to hear
> them.
> >
> > Cheers,
> > Rhys
> >
> > On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
> > <ma...@comcast.com>>
> > wrote:
> >
> > Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >
> > Connector+to+Kafka+Connect
> >
> > On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
> > <ma...@comcast.com>>
> > wrote:
> >
> > Hi All,
> >
> > I added a KIP to include a Kafka Source Connector with Kafka Connect.
> > Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >
> > <htt
> > ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> >
> > Looking forward to your feedback and suggestions.
> >
> > Cheers,
> > Rhys
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: [EXTERNAL] [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect

Posted by Konstantine Karantasis <ko...@confluent.io>.
I agree with you Rhys that Kafka Connect is an integral part of Apache
Kafka and it perfectly makes sense, in many cases, not to overload the core
or the clients with responsibilities that are related to data export to and
data import from specific systems. That can be true even when Kafka itself
is such a system. Connect already provides a scalable and fault tolerant
infrastructure for such "connectors" (in the broader sense of the word).

But I see the arguments of including connectors in AK and achieving better
mirroring between Kafka clusters as orthogonal. My reference to MirrorMaker
meant to say that MirrorMaker was released in AK when Connect or Streams
where not in place yet, not that it's an outdated tool. It frequently
receives updates (copying headers, handling messages without timestamps,
etc). Also, independently, I'm not dogmatic about not including Connectors
as part of AK. It might happen in the future. But for this particular
connector suggested by this KIP, I don't see a compelling reason to do so.
Mirror Maker is still maintained and if it was an intention of this KIP to
suggest its replacement, then the "Compatibility, Deprecation, and
Migration Plan" section should have been much more extended. Having said
that, I would still be in favor of such a connector living in a separate
repo right now, given that several other connectors or tools are available
to copy data between Kafka clusters and, again, that MirrorMaker is not
deprecated.

In any case, thanks for bringing this subject up. I also welcome this
discussion.

Konstantine



On Wed, Sep 26, 2018 at 2:02 PM McCaig, Rhys <Rh...@comcast.com>
wrote:

> Hi Konstantine,
>
> Thank you for your thoughtful comments!
>
> > However, I don't think the apache/kafka repository is the right place to
> > host such a Connector.
>
> <snip>
> > I find this approach very appealing. AK focuses on providing the core
> > infrastructure for Connect, that is required in every Kafka Connect
> > deployment, as well as offering the means to generically install, deploy
> > and operate connectors.
>
> I personally flip-flopped on this with similar thoughts with this when I
> initially considered raising a KIP for this functionality.
>
> When I initially developed a Kafka source connector, this was out of
> necessity - MirrorMaker requires zkconnect strings, which I didn't have
> access to for the source cluster, and Confluent’s proprietary connector
> also requried zk connections - though it has now been updated to remove
> this limitation.
>
> While I understand the point of view that MirrorMaker refers to the early
> days of Apache Kafka, it has become a critical tool for replicating data
> across Kafka clusters in for a large portion of the community who are
> managing Kafka at scale. As such, I suspect that there is a lot of interest
> in the Kafka project supporting topic replication across clusters. While
> one approach (which I don’t have the knowledge or time to address) could be
> to include it as a core component of Kafka itself (such as Apache Pulsar’s
> global topics), my view is that at this point in time, Kafka Connect is
> considered *the* way to ship data in and our of a specific Kafka cluster,
> regardless of the external system.
>
> I’d welcome further discussion on whether the community thinks is the
> right approach for the Kafka project to take, in regards to handling Kafka
> topic mirroring. I *think* that its important and common enough, that there
> should be support in the project - and MirrorMaker is, as you mention,
> showing its age.
>
> Cheers,
> Rhys
>
>
>
>
> > On Sep 26, 2018, at 10:42 AM, Konstantine Karantasis <
> konstantine@confluent.io> wrote:
> >
> > Hi Rhys,
> >
> > thanks for the proposal and apologies for the late feedback. Utilizing
> > Connect to mirror Kafka topics is definitely a plausible proposal for a
> > very useful use case.
> >
> > However, I don't think the apache/kafka repository is the right place to
> > host such a Connector. Currently, no full-featured, production-ready
> > connectors are hosted in AK. The only two connectors shipped with AK
> > (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> > demonstrate implementations only as examples.
> >
> > I find this approach very appealing. AK focuses on providing the core
> > infrastructure for Connect, that is required in every Kafka Connect
> > deployment, as well as offering the means to generically install, deploy
> > and operate connectors. But all the connectors reside outside AK and
> > comprise a vibrant ecosystem of open source and proprietary components
> > that, essentially - even for the most useful and ubiquitous of the
> > connectors - are optional for users to install and use. This seems simple
> > and flexible, both in terms of releasing and using/deploying software
> > related to Kafka Connect. I might even say that I'd be in favor of
> > extending this approach to all the Connect components, including
> > Transformations and Converters.
> >
> > I'm aware that MirrorMaker is part of AK, but to me this refers to the
> > early days of Apache Kafka, when the size of the project and the
> ecosystem
> > was smaller, Connect and Streams had not been implemented yet, and
> > mirroring topics between Kafka clusters was already a basic need. With a
> > much more rich ecosystem now and more sizable and well defined packages
> in
> > AK, I think the approach that decouples connectors from the Connect
> > framework itself is a good one.
> >
> > In my opinion, the fact that this connector targets Kafka itself as a
> > source is not an adequate reason to include it in apache/kafka within the
> > Connect framework. It seems it can evolve naturally, as every other
> > connector, in its own repository.
> >
> > Regards,
> > Konstantine
> >
> >
> > On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com>
> wrote:
> >
> >> Hi All,
> >>
> >> If there are no further comments on this KIP I’ll start a vote early
> this
> >> week.
> >>
> >> Rhys
> >>
> >> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <
> Rhys_McCaig@cable.comcast.com
> >> <ma...@cable.comcast.com>> wrote:
> >>
> >> Hi All,
> >>
> >> I’ve updated the proposal to include the improvements suggested by
> >> Stephane.
> >>
> >> I have also submitted a PR to implement this functionality into Kafka.
> >> https://github.com/apache/kafka/pull/5438
> >>
> >> I don’t have a benchmark against MirrorMaker yet, as I only currently
> have
> >> a local docker stack available to me, though I have seen very good
> >> performance in that test stack (200k messages/sec@100bytes on limited
> >> compute resource containers). Further benchmarking might take a few
> days.
> >>
> >> Review and comments would be appreciated.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <
> Rhys_McCaig@cable.comcast.com
> >> <ma...@cable.comcast.com>> wrote:
> >>
> >> Hi Stephane,
> >>
> >> Thanks for your feedback and apologies for the delay in my response.
> >>
> >> Are there any performance benchmarks against Mirror Maker available? I'm
> >> interested to know if this is more performant / scalable.
> >> Regarding the implementation, here's some feedback:
> >>
> >>
> >> Currently I don’t have any performance benchmarks, but I think this is a
> >> great idea, ill see if I can set up something one the next week or so.
> >>
> >> - I think it's worth mentioning that this solution does not rely on
> >> consumer groups, and therefore tracking progress may be tricky. Can you
> >> think of a way to expose that?
> >>
> >> This is a reasonable concern. I’m not sure how to track this other than
> >> looking at the Kafka connect offsets. Once a messages is passed to the
> >> framework, I'm unaware of a way to get at the commit offsets on the
> >> producer side. Any thoughts?
> >>
> >> - Some code can be in config Validator I believe:
> >>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>
> >> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> >> don't see it reflected yet in the code
> >>
> >> - Is there a way to be flexible and merge list and regex, or offer the
> two
> >> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>
> >> Agree on all of the above - I will incorporate into the code later this
> >> week as ill get some time back to work on this.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >>
> >> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
> >> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
> >> wrote:
> >>
> >> Hi Rhys,
> >>
> >> I think this will be a great addition.
> >>
> >> Are there any performance benchmarks against Mirror Maker available? I'm
> >> interested to know if this is more performant / scalable.
> >> Regarding the implementation, here's some feedback:
> >>
> >> - I think it's worth mentioning that this solution does not rely on
> >> consumer groups, and therefore tracking progress may be tricky. Can you
> >> think of a way to expose that?
> >>
> >>
> >> - Some code can be in config Validator I believe:
> >>
> >>
> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
> >>
> >> - I think your kip mentions `source.admin.` and `source.consumer.` but I
> >> don't see it reflected yet in the code
> >>
> >> - Is there a way to be flexible and merge list and regex, or offer the
> two
> >> simultaneously ? source_topics=my_static_topic,prefix.* ?
> >>
> >> Hope that helps
> >> Stephane
> >>
> >> Kind regards,
> >> Stephane
> >>
> >> [image: Simple Machines]
> >>
> >> Stephane Maarek | Developer
> >>
> >> +61 416 575 980
> >> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
> >> simplemachines.com.au<http://simplemachines.com.au>
> >> Level 2, 145 William Street, Sydney NSW 2010
> >>
> >> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
> >> Rhys_McCaig@comcast.com>> wrote:
> >>
> >> Hi All,
> >>
> >> As I didn’t get any comment on this KIP and there has since been an
> >> additional 2 KIP’s created numbered 308 since, I'm bumping this and
> >> renaming the KIP to 310 to remove the duplication:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >>
> >> Let me know if you have any comments or feedback, would love to hear
> them.
> >>
> >> Cheers,
> >> Rhys
> >>
> >> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
> >> <ma...@comcast.com>>
> >> wrote:
> >>
> >> Sorry for the bad link to the KIP, here it is:
> https://cwiki.apache.org/
> >> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> >
> >> Connector+to+Kafka+Connect
> >>
> >> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
> >> <ma...@comcast.com>>
> >> wrote:
> >>
> >> Hi All,
> >>
> >> I added a KIP to include a Kafka Source Connector with Kafka Connect.
> >> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
> >
> >> <htt
> >> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> >>
> >> Looking forward to your feedback and suggestions.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
>

Re: [EXTERNAL] [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect

Posted by "McCaig, Rhys" <Rh...@comcast.com>.
Hi Konstantine,

Thank you for your thoughtful comments!

> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. 

<snip>
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors.

I personally flip-flopped on this with similar thoughts with this when I initially considered raising a KIP for this functionality. 

When I initially developed a Kafka source connector, this was out of necessity - MirrorMaker requires zkconnect strings, which I didn't have access to for the source cluster, and Confluent’s proprietary connector also requried zk connections - though it has now been updated to remove this limitation. 

While I understand the point of view that MirrorMaker refers to the early days of Apache Kafka, it has become a critical tool for replicating data across Kafka clusters in for a large portion of the community who are managing Kafka at scale. As such, I suspect that there is a lot of interest in the Kafka project supporting topic replication across clusters. While one approach (which I don’t have the knowledge or time to address) could be to include it as a core component of Kafka itself (such as Apache Pulsar’s global topics), my view is that at this point in time, Kafka Connect is considered *the* way to ship data in and our of a specific Kafka cluster, regardless of the external system. 

I’d welcome further discussion on whether the community thinks is the right approach for the Kafka project to take, in regards to handling Kafka topic mirroring. I *think* that its important and common enough, that there should be support in the project - and MirrorMaker is, as you mention, showing its age. 

Cheers,
Rhys




> On Sep 26, 2018, at 10:42 AM, Konstantine Karantasis <ko...@confluent.io> wrote:
> 
> Hi Rhys,
> 
> thanks for the proposal and apologies for the late feedback. Utilizing
> Connect to mirror Kafka topics is definitely a plausible proposal for a
> very useful use case.
> 
> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. Currently, no full-featured, production-ready
> connectors are hosted in AK. The only two connectors shipped with AK
> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> demonstrate implementations only as examples.
> 
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors. But all the connectors reside outside AK and
> comprise a vibrant ecosystem of open source and proprietary components
> that, essentially - even for the most useful and ubiquitous of the
> connectors - are optional for users to install and use. This seems simple
> and flexible, both in terms of releasing and using/deploying software
> related to Kafka Connect. I might even say that I'd be in favor of
> extending this approach to all the Connect components, including
> Transformations and Converters.
> 
> I'm aware that MirrorMaker is part of AK, but to me this refers to the
> early days of Apache Kafka, when the size of the project and the ecosystem
> was smaller, Connect and Streams had not been implemented yet, and
> mirroring topics between Kafka clusters was already a basic need. With a
> much more rich ecosystem now and more sizable and well defined packages in
> AK, I think the approach that decouples connectors from the Connect
> framework itself is a good one.
> 
> In my opinion, the fact that this connector targets Kafka itself as a
> source is not an adequate reason to include it in apache/kafka within the
> Connect framework. It seems it can evolve naturally, as every other
> connector, in its own repository.
> 
> Regards,
> Konstantine
> 
> 
> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com> wrote:
> 
>> Hi All,
>> 
>> If there are no further comments on this KIP I’ll start a vote early this
>> week.
>> 
>> Rhys
>> 
>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <ma...@cable.comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> I’ve updated the proposal to include the improvements suggested by
>> Stephane.
>> 
>> I have also submitted a PR to implement this functionality into Kafka.
>> https://github.com/apache/kafka/pull/5438
>> 
>> I don’t have a benchmark against MirrorMaker yet, as I only currently have
>> a local docker stack available to me, though I have seen very good
>> performance in that test stack (200k messages/sec@100bytes on limited
>> compute resource containers). Further benchmarking might take a few days.
>> 
>> Review and comments would be appreciated.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <ma...@cable.comcast.com>> wrote:
>> 
>> Hi Stephane,
>> 
>> Thanks for your feedback and apologies for the delay in my response.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> 
>> Currently I don’t have any performance benchmarks, but I think this is a
>> great idea, ill see if I can set up something one the next week or so.
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> This is a reasonable concern. I’m not sure how to track this other than
>> looking at the Kafka connect offsets. Once a messages is passed to the
>> framework, I'm unaware of a way to get at the commit offsets on the
>> producer side. Any thoughts?
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Agree on all of the above - I will incorporate into the code later this
>> week as ill get some time back to work on this.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
>> wrote:
>> 
>> Hi Rhys,
>> 
>> I think this will be a great addition.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Hope that helps
>> Stephane
>> 
>> Kind regards,
>> Stephane
>> 
>> [image: Simple Machines]
>> 
>> Stephane Maarek | Developer
>> 
>> +61 416 575 980
>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
>> simplemachines.com.au<http://simplemachines.com.au>
>> Level 2, 145 William Street, Sydney NSW 2010
>> 
>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
>> Rhys_McCaig@comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> As I didn’t get any comment on this KIP and there has since been an
>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
>> renaming the KIP to 310 to remove the duplication:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> 
>> Let me know if you have any comments or feedback, would love to hear them.
>> 
>> Cheers,
>> Rhys
>> 
>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
>> <ma...@comcast.com>>
>> wrote:
>> 
>> Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+>
>> Connector+to+Kafka+Connect
>> 
>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
>> <ma...@comcast.com>>
>> wrote:
>> 
>> Hi All,
>> 
>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
>> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> <htt
>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> 
>> Looking forward to your feedback and suggestions.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect

Posted by "McCaig, Rhys" <Rh...@comcast.com>.
Hi All,

Based on the feedback in this thread, and in light of Ryanne’s excellent proposal (KIP-382: MirrorMaker 2.0) which incorporates and extends the goals of KIP-310, I have updated the status of KIP-310 to “Discarded" and added a comment that KIP-382 supersedes it.

Thank you all for the discussion and feedback - this is my first KIP and I appreciate the community providing feedback on my contributions!

Rhys

> On Sep 26, 2018, at 10:42 AM, Konstantine Karantasis <ko...@confluent.io> wrote:
> 
> Hi Rhys,
> 
> thanks for the proposal and apologies for the late feedback. Utilizing
> Connect to mirror Kafka topics is definitely a plausible proposal for a
> very useful use case.
> 
> However, I don't think the apache/kafka repository is the right place to
> host such a Connector. Currently, no full-featured, production-ready
> connectors are hosted in AK. The only two connectors shipped with AK
> (FileStreamSourceConnector and FileStreamSinkConnector) are there to
> demonstrate implementations only as examples.
> 
> I find this approach very appealing. AK focuses on providing the core
> infrastructure for Connect, that is required in every Kafka Connect
> deployment, as well as offering the means to generically install, deploy
> and operate connectors. But all the connectors reside outside AK and
> comprise a vibrant ecosystem of open source and proprietary components
> that, essentially - even for the most useful and ubiquitous of the
> connectors - are optional for users to install and use. This seems simple
> and flexible, both in terms of releasing and using/deploying software
> related to Kafka Connect. I might even say that I'd be in favor of
> extending this approach to all the Connect components, including
> Transformations and Converters.
> 
> I'm aware that MirrorMaker is part of AK, but to me this refers to the
> early days of Apache Kafka, when the size of the project and the ecosystem
> was smaller, Connect and Streams had not been implemented yet, and
> mirroring topics between Kafka clusters was already a basic need. With a
> much more rich ecosystem now and more sizable and well defined packages in
> AK, I think the approach that decouples connectors from the Connect
> framework itself is a good one.
> 
> In my opinion, the fact that this connector targets Kafka itself as a
> source is not an adequate reason to include it in apache/kafka within the
> Connect framework. It seems it can evolve naturally, as every other
> connector, in its own repository.
> 
> Regards,
> Konstantine
> 
> 
> On Sat, Aug 4, 2018 at 7:20 PM McCaig, Rhys <Rh...@comcast.com> wrote:
> 
>> Hi All,
>> 
>> If there are no further comments on this KIP I’ll start a vote early this
>> week.
>> 
>> Rhys
>> 
>> On Aug 1, 2018, at 12:32 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <ma...@cable.comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> I’ve updated the proposal to include the improvements suggested by
>> Stephane.
>> 
>> I have also submitted a PR to implement this functionality into Kafka.
>> https://github.com/apache/kafka/pull/5438
>> 
>> I don’t have a benchmark against MirrorMaker yet, as I only currently have
>> a local docker stack available to me, though I have seen very good
>> performance in that test stack (200k messages/sec@100bytes on limited
>> compute resource containers). Further benchmarking might take a few days.
>> 
>> Review and comments would be appreciated.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> On Jun 18, 2018, at 9:00 AM, McCaig, Rhys <Rhys_McCaig@cable.comcast.com
>> <ma...@cable.comcast.com>> wrote:
>> 
>> Hi Stephane,
>> 
>> Thanks for your feedback and apologies for the delay in my response.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> 
>> Currently I don’t have any performance benchmarks, but I think this is a
>> great idea, ill see if I can set up something one the next week or so.
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> This is a reasonable concern. I’m not sure how to track this other than
>> looking at the Kafka connect offsets. Once a messages is passed to the
>> framework, I'm unaware of a way to get at the commit offsets on the
>> producer side. Any thoughts?
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Agree on all of the above - I will incorporate into the code later this
>> week as ill get some time back to work on this.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> On Jun 6, 2018, at 7:16 PM, Stephane Maarek <
>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>>
>> wrote:
>> 
>> Hi Rhys,
>> 
>> I think this will be a great addition.
>> 
>> Are there any performance benchmarks against Mirror Maker available? I'm
>> interested to know if this is more performant / scalable.
>> Regarding the implementation, here's some feedback:
>> 
>> - I think it's worth mentioning that this solution does not rely on
>> consumer groups, and therefore tracking progress may be tricky. Can you
>> think of a way to expose that?
>> 
>> 
>> - Some code can be in config Validator I believe:
>> 
>> https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47
>> 
>> - I think your kip mentions `source.admin.` and `source.consumer.` but I
>> don't see it reflected yet in the code
>> 
>> - Is there a way to be flexible and merge list and regex, or offer the two
>> simultaneously ? source_topics=my_static_topic,prefix.* ?
>> 
>> Hope that helps
>> Stephane
>> 
>> Kind regards,
>> Stephane
>> 
>> [image: Simple Machines]
>> 
>> Stephane Maarek | Developer
>> 
>> +61 416 575 980
>> stephane@simplemachines.com.au<ma...@simplemachines.com.au>
>> simplemachines.com.au<http://simplemachines.com.au>
>> Level 2, 145 William Street, Sydney NSW 2010
>> 
>> On 5 June 2018 at 09:04, McCaig, Rhys <Rhys_McCaig@comcast.com<mailto:
>> Rhys_McCaig@comcast.com>> wrote:
>> 
>> Hi All,
>> 
>> As I didn’t get any comment on this KIP and there has since been an
>> additional 2 KIP’s created numbered 308 since, I'm bumping this and
>> renaming the KIP to 310 to remove the duplication:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> 
>> Let me know if you have any comments or feedback, would love to hear them.
>> 
>> Cheers,
>> Rhys
>> 
>> On May 28, 2018, at 10:23 PM, McCaig, Rhys <rhys_mccaig@comcast.com
>> <ma...@comcast.com>>
>> wrote:
>> 
>> Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
>> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+>
>> Connector+to+Kafka+Connect
>> 
>> On May 28, 2018, at 10:19 PM, McCaig, Rhys <Rhys_McCaig@comcast.com
>> <ma...@comcast.com>>
>> wrote:
>> 
>> Hi All,
>> 
>> I added a KIP to include a Kafka Source Connector with Kafka Connect.
>> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> <htt
>> ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
>> 
>> Looking forward to your feedback and suggestions.
>> 
>> Cheers,
>> Rhys
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>