You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Randall Hauch <rh...@gmail.com> on 2021/11/07 18:19:24 UTC

Do we want to add more SMTs to Apache Kafka?

We have had several requests to add more Connect Single Message
Transforms (SMTs) to the project. When SMTs were first introduced with
KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:

> Criteria: SMTs that are shipped with Kafka Connect should be general enough to apply to many data sources & serialization formats. They should also be simple enough to not cause any additional library dependency to be introduced.
> Beyond those being initially included with this KIP, transformations can be adopted for inclusion in future with JIRA/ML discussion to weigh the tradeoffs.

In the 4+ years that we've had SMTs in the project, we've only
enhanced the framework with KIP-585 (ref 2), and fixed the initial
SMTs (including KIP-437, ref 3). We recently have had quite a few
requests to add new SMTs; a few samples of these include:
* https://issues.apache.org/jira/browse/KAFKA-10299
* https://issues.apache.org/jira/browse/KAFKA-9436
* https://issues.apache.org/jira/browse/KAFKA-9318
* https://issues.apache.org/jira/browse/KAFKA-12443

Adding new or changing existing SMTs to the Apache Kafka project come
with requirements. First, AK releases are infrequent and necessarily
involve the entire project. Second, adding an SMT is an API change and
therefore requires a KIP. Third, all changes in behavior to SMTs
included in an prior AK release must be backward compatible, and
adding or changing an SMT's configuration requires a KIP. This last
one is also challenging if we're limiting ourselves to truly general
SMTs, since these are notoriously difficult to get right the first
time. All of these aspects mean that it's difficult to add, maintain,
and evolve/improve SMTs in AK. And unless a bug fix is critical, we're
likely not to create a patch release for AK just to fix a bug in an
SMT, simply because of the effort involved.

On the other hand, anyone can easily implement their own SMT and
deploy them as a Connect plugin, whether that's part of a connector
plugin or a separate plugin dedicated for one or SMTs. Interestingly,
it's far simpler to implement and maintain custom SMTs outside of AK,
especially since those plugins can be released and deployed in any
Connect runtime version since at least 0.11.0. And if custom SMTs are
maintained in a relatively small project, they can be released often.

Finally, KIP-26 (ref 4) specifically rejected maintaining connector
implementations in the AK project. So we have precedence for choosing
not to accept implementations.

Given the above, I wonder if the time has come for us to prefer only
maintaining the SMT framework and existing SMTs, and to decline adding
new SMTs.

Thoughts?

Best regards,

Randall Hauch

(1) https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
(2) https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
(3) https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
(4) https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767

Re: Do we want to add more SMTs to Apache Kafka?

Posted by Chris Egerton <ch...@confluent.io.INVALID>.
Hi all,

I think restricting the set of out-of-the-box SMTs that we provide with
Connect is reasonable. I do think Joshua raises a valuable point, though.
At the risk of reiterating his ideas, we can gain a few things from
improving the existing SMTs provided with Connect: first, we can establish
precedents for how SMTs are configured and implemented in more complex
scenarios (such as handling explicitly-specified nested fields or
traversing an entire key/value recursively), which can save time for both
developers and users if we do a good enough job for others to start
following the examples we set. Second, we decrease the likelihood that
someone forks, e.g., the InsertField SMT just to add their own small tweak
on top, which both adds unnecessary work for that developer and complicates
the experience for users of Connect ("which SMT do I use now?").

Additionally, I like Gunnar and Brandon's suggestion of a way to discover
SMTs. There's precedent for this with the "Kafka Connector Hub" link on the
https://cwiki.apache.org/confluence/display/KAFKA/ page, which currently
leads to a page on Confluent's website containing a fairly large list of
connectors from a variety of sources (
https://www.confluent.io/product/connectors/). In practice I'm not sure how
many new Kafka users end up visiting the wiki as their first stop, though.
Perhaps we could add a section to the docs page at
https://kafka.apache.org/documentation.html, for connectors,
transformations, and maybe even other pluggable components (converters,
config providers, etc.)?

Cheers,

Chris

On Sun, Nov 21, 2021 at 12:05 PM Joshua Grisham <gr...@gmail.com>
wrote:

> Hi all,
>
> From my perspective I think that the type of transformations which are
> already covered by the existing SMTs is quite good (but anyone else please
> say if you feel like you are missing something that feels "standard"), but
> the biggest issue is the limitations that many of them have which makes
> their usage extremely limited when trying to use them in a real production
> scenario.
>
> In my mind, the single biggest gap is the inability to handle nested fields
> or anything more than records that essentially look like simple key-value
> pairs. (However one exception being if you chain the flatten transform
> first then you can apply others on the flattened result, but this is
> assuming that the flatten transform can actually handle the message first!
> If you have nested arrays then you are toast ;) And wait, maybe you didn't
> actually want to flatten anyway?).
>
> I am not sure the best way to approach this (e.g. allow for some kind of
> path notation so users can address nested fields directly vs allow for
> recursion to match a field name at no matter what level, or both, or
> something else?) but I would say that some kind of standardized approach
> that was implemented in all of the SMTs (where it makes sense) would
> certainly be best! (at least, from a user perspective that the
> configuration to address nested fields is consistent across each transform
> that allows it).  I did this one way in a proposed change for KIP-683 but
> this is only one of the possible ways (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-683%3A+Add+recursive+support+to+Connect+Cast+and+ReplaceField+transforms%2C+and+support+for+casting+complex+types+to+either+a+native+or+JSON+string
> )
>
> Past that, there are a few tweaks or enhancements which could be made to
> some of the existing SMTs which would help prevent them from blocking or
> failing for most general scenarios (for example some of the changes I had
> proposed in the past but haven't since had the time to follow up on them
> fit in this category I think), for example the ability to "cast" a more
> complicated structure (such as an array) as a string (Connect API or JSON)
> so the record can then be flattened and be inserted into a database table
> or something similar will open up a lot of what is IMO currently roadblocks
> that users might often hit in Sink scenarios.
>
> Then there are some small tweaks which maybe can be made for specific
> cases, some of which Randall already mentioned, such as:
>
> * The Filter implementation is very limited to use mostly due to lack of
> some "standard-feeling" predicates (field value filtering is very often
> what I think people are looking for) so often the Confluent or other one is
> used instead.
> * A bit more can be done with InsertField IMO (e.g. giving a wallclock
> timestamp instead of the record's produced timestamp is one example that
> often seems to pop up).
> * Some standardized way to "move" one field to another place e.g. to move
> it out of or into a nested record.
> * Limitations on only processing one field per transformation, e.g. with
> the TimestampConverter like I had proposed with KIP-682 (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> )
> are just a little annoying feeling and can add to processing time in high
> volume scenarios.
>
> (By the way apologies to Randall that I have not had a chance to get back
> yet on KIP-682 but will try to do so in the discussion thread in the coming
> days if I can!)
>
> And finally I also feel like some of the SMTs are a bit disjointed from
> each other when it comes to how the classes are actually designed and how
> the configuration works when using them (both from a user implementing the
> transform, and a transform developer perspective). Some of the class design
> difference might be necessary due to the nature of the transformation
> itself, but I wonder if in the future some kind of standardization could be
> built into a type of base class or something instead, or some enhancements
> to the requirements specified by the interface, which would help to drive a
> more standardized approach?  Or maybe at least just a once-through on the
> code for all of them to align things like how Config string constants/enums
> etc are handled, method names and position within the code, that they are
> all refactored in a similar way, etc.
>
> In the end, I do feel it makes sense to try and sort of aim for the 80/20
> rule with the standard SMTs to be able to support "real world" scenarios,
> but some of these limitations cause them to fall a bit short today.
>
> Hope this is helpful at least to spark other ideas anyway!
>
> Have a nice (rest of the) weekend!
> Joshua Grisham
>
>
> Den lör 20 nov. 2021 kl 01:16 skrev Brandon Brown <brandon@bbrownsound.com
> >:
>
> > I agree, if the desire is to keep the internal SMTs collection small then
> > providing an ease of discovery like Gunnar suggestions would be extremely
> > helpful.
> >
> > Brandon Brown
> >
> > > On Nov 19, 2021, at 6:13 PM, Gunnar Morling
> > <gu...@googlemail.com.invalid> wrote:
> > >
> > > Hi all,
> > >
> > > Just came across this thread, I hope the late reply is ok.
> > >
> > > FWIW, we're in a similar situation in Debezium, where users often
> request
> > > new (Debezium-specific) SMTs, and we generally tend to recommend them
> to
> > be
> > > maintained by users themselves, unless they are truly generic. This
> > > excludes a share of users though who aren't Java developers.
> > >
> > > What might help is having means of simple discoverability of externally
> > > hosted SMTs, e.g. via some kind of catalog hosted on kafka.apache.org.
> > That
> > > way, people would have it easier to find and obtain SMTs from other
> > places,
> > > reducing the pressure to get them added to Apache Kafka proper.
> > >
> > > Best,
> > >
> > > --Gunnar
> > >
> > >
> > >
> > >
> > >> Am So., 7. Nov. 2021 um 21:49 Uhr schrieb Brandon Brown <
> > >> brandon@bbrownsound.com>:
> > >>
> > >> I like the idea of a select number of SMTs being offered and supported
> > out
> > >> of the box. The addition of SMTs via this process is nice because it
> > allows
> > >> for a rich set to be supported out of the box and without the need for
> > >> extra work to deploy.
> > >>
> > >> Perhaps this is a spot where the community could express the interest
> of
> > >> additional SMTs which maybe are available via an open source library
> > and if
> > >> enough usage occurs there could be a path to fold into the Kafka
> > project at
> > >> large?
> > >>
> > >> Brandon Brown
> > >>
> > >>
> > >>>> On Nov 7, 2021, at 1:19 PM, Randall Hauch <rh...@gmail.com> wrote:
> > >>>
> > >>> We have had several requests to add more Connect Single Message
> > >>> Transforms (SMTs) to the project. When SMTs were first introduced
> with
> > >>> KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:
> > >>>
> > >>>> Criteria: SMTs that are shipped with Kafka Connect should be general
> > >> enough to apply to many data sources & serialization formats. They
> > should
> > >> also be simple enough to not cause any additional library dependency
> to
> > be
> > >> introduced.
> > >>>> Beyond those being initially included with this KIP, transformations
> > >> can be adopted for inclusion in future with JIRA/ML discussion to
> weigh
> > the
> > >> tradeoffs.
> > >>>
> > >>> In the 4+ years that we've had SMTs in the project, we've only
> > >>> enhanced the framework with KIP-585 (ref 2), and fixed the initial
> > >>> SMTs (including KIP-437, ref 3). We recently have had quite a few
> > >>> requests to add new SMTs; a few samples of these include:
> > >>> * https://issues.apache.org/jira/browse/KAFKA-10299
> > >>> * https://issues.apache.org/jira/browse/KAFKA-9436
> > >>> * https://issues.apache.org/jira/browse/KAFKA-9318
> > >>> * https://issues.apache.org/jira/browse/KAFKA-12443
> > >>>
> > >>> Adding new or changing existing SMTs to the Apache Kafka project come
> > >>> with requirements. First, AK releases are infrequent and necessarily
> > >>> involve the entire project. Second, adding an SMT is an API change
> and
> > >>> therefore requires a KIP. Third, all changes in behavior to SMTs
> > >>> included in an prior AK release must be backward compatible, and
> > >>> adding or changing an SMT's configuration requires a KIP. This last
> > >>> one is also challenging if we're limiting ourselves to truly general
> > >>> SMTs, since these are notoriously difficult to get right the first
> > >>> time. All of these aspects mean that it's difficult to add, maintain,
> > >>> and evolve/improve SMTs in AK. And unless a bug fix is critical,
> we're
> > >>> likely not to create a patch release for AK just to fix a bug in an
> > >>> SMT, simply because of the effort involved.
> > >>>
> > >>> On the other hand, anyone can easily implement their own SMT and
> > >>> deploy them as a Connect plugin, whether that's part of a connector
> > >>> plugin or a separate plugin dedicated for one or SMTs. Interestingly,
> > >>> it's far simpler to implement and maintain custom SMTs outside of AK,
> > >>> especially since those plugins can be released and deployed in any
> > >>> Connect runtime version since at least 0.11.0. And if custom SMTs are
> > >>> maintained in a relatively small project, they can be released often.
> > >>>
> > >>> Finally, KIP-26 (ref 4) specifically rejected maintaining connector
> > >>> implementations in the AK project. So we have precedence for choosing
> > >>> not to accept implementations.
> > >>>
> > >>> Given the above, I wonder if the time has come for us to prefer only
> > >>> maintaining the SMT framework and existing SMTs, and to decline
> adding
> > >>> new SMTs.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> Best regards,
> > >>>
> > >>> Randall Hauch
> > >>>
> > >>> (1)
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
> > >>> (2)
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
> > >>> (3)
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> > >>> (4)
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> > >>
> >
>

Re: Do we want to add more SMTs to Apache Kafka?

Posted by Joshua Grisham <gr...@gmail.com>.
Hi all,

From my perspective I think that the type of transformations which are
already covered by the existing SMTs is quite good (but anyone else please
say if you feel like you are missing something that feels "standard"), but
the biggest issue is the limitations that many of them have which makes
their usage extremely limited when trying to use them in a real production
scenario.

In my mind, the single biggest gap is the inability to handle nested fields
or anything more than records that essentially look like simple key-value
pairs. (However one exception being if you chain the flatten transform
first then you can apply others on the flattened result, but this is
assuming that the flatten transform can actually handle the message first!
If you have nested arrays then you are toast ;) And wait, maybe you didn't
actually want to flatten anyway?).

I am not sure the best way to approach this (e.g. allow for some kind of
path notation so users can address nested fields directly vs allow for
recursion to match a field name at no matter what level, or both, or
something else?) but I would say that some kind of standardized approach
that was implemented in all of the SMTs (where it makes sense) would
certainly be best! (at least, from a user perspective that the
configuration to address nested fields is consistent across each transform
that allows it).  I did this one way in a proposed change for KIP-683 but
this is only one of the possible ways (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-683%3A+Add+recursive+support+to+Connect+Cast+and+ReplaceField+transforms%2C+and+support+for+casting+complex+types+to+either+a+native+or+JSON+string
)

Past that, there are a few tweaks or enhancements which could be made to
some of the existing SMTs which would help prevent them from blocking or
failing for most general scenarios (for example some of the changes I had
proposed in the past but haven't since had the time to follow up on them
fit in this category I think), for example the ability to "cast" a more
complicated structure (such as an array) as a string (Connect API or JSON)
so the record can then be flattened and be inserted into a database table
or something similar will open up a lot of what is IMO currently roadblocks
that users might often hit in Sink scenarios.

Then there are some small tweaks which maybe can be made for specific
cases, some of which Randall already mentioned, such as:

* The Filter implementation is very limited to use mostly due to lack of
some "standard-feeling" predicates (field value filtering is very often
what I think people are looking for) so often the Confluent or other one is
used instead.
* A bit more can be done with InsertField IMO (e.g. giving a wallclock
timestamp instead of the record's produced timestamp is one example that
often seems to pop up).
* Some standardized way to "move" one field to another place e.g. to move
it out of or into a nested record.
* Limitations on only processing one field per transformation, e.g. with
the TimestampConverter like I had proposed with KIP-682 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats)
are just a little annoying feeling and can add to processing time in high
volume scenarios.

(By the way apologies to Randall that I have not had a chance to get back
yet on KIP-682 but will try to do so in the discussion thread in the coming
days if I can!)

And finally I also feel like some of the SMTs are a bit disjointed from
each other when it comes to how the classes are actually designed and how
the configuration works when using them (both from a user implementing the
transform, and a transform developer perspective). Some of the class design
difference might be necessary due to the nature of the transformation
itself, but I wonder if in the future some kind of standardization could be
built into a type of base class or something instead, or some enhancements
to the requirements specified by the interface, which would help to drive a
more standardized approach?  Or maybe at least just a once-through on the
code for all of them to align things like how Config string constants/enums
etc are handled, method names and position within the code, that they are
all refactored in a similar way, etc.

In the end, I do feel it makes sense to try and sort of aim for the 80/20
rule with the standard SMTs to be able to support "real world" scenarios,
but some of these limitations cause them to fall a bit short today.

Hope this is helpful at least to spark other ideas anyway!

Have a nice (rest of the) weekend!
Joshua Grisham


Den lör 20 nov. 2021 kl 01:16 skrev Brandon Brown <br...@bbrownsound.com>:

> I agree, if the desire is to keep the internal SMTs collection small then
> providing an ease of discovery like Gunnar suggestions would be extremely
> helpful.
>
> Brandon Brown
>
> > On Nov 19, 2021, at 6:13 PM, Gunnar Morling
> <gu...@googlemail.com.invalid> wrote:
> >
> > Hi all,
> >
> > Just came across this thread, I hope the late reply is ok.
> >
> > FWIW, we're in a similar situation in Debezium, where users often request
> > new (Debezium-specific) SMTs, and we generally tend to recommend them to
> be
> > maintained by users themselves, unless they are truly generic. This
> > excludes a share of users though who aren't Java developers.
> >
> > What might help is having means of simple discoverability of externally
> > hosted SMTs, e.g. via some kind of catalog hosted on kafka.apache.org.
> That
> > way, people would have it easier to find and obtain SMTs from other
> places,
> > reducing the pressure to get them added to Apache Kafka proper.
> >
> > Best,
> >
> > --Gunnar
> >
> >
> >
> >
> >> Am So., 7. Nov. 2021 um 21:49 Uhr schrieb Brandon Brown <
> >> brandon@bbrownsound.com>:
> >>
> >> I like the idea of a select number of SMTs being offered and supported
> out
> >> of the box. The addition of SMTs via this process is nice because it
> allows
> >> for a rich set to be supported out of the box and without the need for
> >> extra work to deploy.
> >>
> >> Perhaps this is a spot where the community could express the interest of
> >> additional SMTs which maybe are available via an open source library
> and if
> >> enough usage occurs there could be a path to fold into the Kafka
> project at
> >> large?
> >>
> >> Brandon Brown
> >>
> >>
> >>>> On Nov 7, 2021, at 1:19 PM, Randall Hauch <rh...@gmail.com> wrote:
> >>>
> >>> We have had several requests to add more Connect Single Message
> >>> Transforms (SMTs) to the project. When SMTs were first introduced with
> >>> KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:
> >>>
> >>>> Criteria: SMTs that are shipped with Kafka Connect should be general
> >> enough to apply to many data sources & serialization formats. They
> should
> >> also be simple enough to not cause any additional library dependency to
> be
> >> introduced.
> >>>> Beyond those being initially included with this KIP, transformations
> >> can be adopted for inclusion in future with JIRA/ML discussion to weigh
> the
> >> tradeoffs.
> >>>
> >>> In the 4+ years that we've had SMTs in the project, we've only
> >>> enhanced the framework with KIP-585 (ref 2), and fixed the initial
> >>> SMTs (including KIP-437, ref 3). We recently have had quite a few
> >>> requests to add new SMTs; a few samples of these include:
> >>> * https://issues.apache.org/jira/browse/KAFKA-10299
> >>> * https://issues.apache.org/jira/browse/KAFKA-9436
> >>> * https://issues.apache.org/jira/browse/KAFKA-9318
> >>> * https://issues.apache.org/jira/browse/KAFKA-12443
> >>>
> >>> Adding new or changing existing SMTs to the Apache Kafka project come
> >>> with requirements. First, AK releases are infrequent and necessarily
> >>> involve the entire project. Second, adding an SMT is an API change and
> >>> therefore requires a KIP. Third, all changes in behavior to SMTs
> >>> included in an prior AK release must be backward compatible, and
> >>> adding or changing an SMT's configuration requires a KIP. This last
> >>> one is also challenging if we're limiting ourselves to truly general
> >>> SMTs, since these are notoriously difficult to get right the first
> >>> time. All of these aspects mean that it's difficult to add, maintain,
> >>> and evolve/improve SMTs in AK. And unless a bug fix is critical, we're
> >>> likely not to create a patch release for AK just to fix a bug in an
> >>> SMT, simply because of the effort involved.
> >>>
> >>> On the other hand, anyone can easily implement their own SMT and
> >>> deploy them as a Connect plugin, whether that's part of a connector
> >>> plugin or a separate plugin dedicated for one or SMTs. Interestingly,
> >>> it's far simpler to implement and maintain custom SMTs outside of AK,
> >>> especially since those plugins can be released and deployed in any
> >>> Connect runtime version since at least 0.11.0. And if custom SMTs are
> >>> maintained in a relatively small project, they can be released often.
> >>>
> >>> Finally, KIP-26 (ref 4) specifically rejected maintaining connector
> >>> implementations in the AK project. So we have precedence for choosing
> >>> not to accept implementations.
> >>>
> >>> Given the above, I wonder if the time has come for us to prefer only
> >>> maintaining the SMT framework and existing SMTs, and to decline adding
> >>> new SMTs.
> >>>
> >>> Thoughts?
> >>>
> >>> Best regards,
> >>>
> >>> Randall Hauch
> >>>
> >>> (1)
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
> >>> (2)
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
> >>> (3)
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> >>> (4)
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> >>
>

Re: Do we want to add more SMTs to Apache Kafka?

Posted by Brandon Brown <br...@bbrownsound.com>.
I agree, if the desire is to keep the internal SMTs collection small then providing an ease of discovery like Gunnar suggestions would be extremely helpful. 

Brandon Brown

> On Nov 19, 2021, at 6:13 PM, Gunnar Morling <gu...@googlemail.com.invalid> wrote:
> 
> Hi all,
> 
> Just came across this thread, I hope the late reply is ok.
> 
> FWIW, we're in a similar situation in Debezium, where users often request
> new (Debezium-specific) SMTs, and we generally tend to recommend them to be
> maintained by users themselves, unless they are truly generic. This
> excludes a share of users though who aren't Java developers.
> 
> What might help is having means of simple discoverability of externally
> hosted SMTs, e.g. via some kind of catalog hosted on kafka.apache.org. That
> way, people would have it easier to find and obtain SMTs from other places,
> reducing the pressure to get them added to Apache Kafka proper.
> 
> Best,
> 
> --Gunnar
> 
> 
> 
> 
>> Am So., 7. Nov. 2021 um 21:49 Uhr schrieb Brandon Brown <
>> brandon@bbrownsound.com>:
>> 
>> I like the idea of a select number of SMTs being offered and supported out
>> of the box. The addition of SMTs via this process is nice because it allows
>> for a rich set to be supported out of the box and without the need for
>> extra work to deploy.
>> 
>> Perhaps this is a spot where the community could express the interest of
>> additional SMTs which maybe are available via an open source library and if
>> enough usage occurs there could be a path to fold into the Kafka project at
>> large?
>> 
>> Brandon Brown
>> 
>> 
>>>> On Nov 7, 2021, at 1:19 PM, Randall Hauch <rh...@gmail.com> wrote:
>>> 
>>> We have had several requests to add more Connect Single Message
>>> Transforms (SMTs) to the project. When SMTs were first introduced with
>>> KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:
>>> 
>>>> Criteria: SMTs that are shipped with Kafka Connect should be general
>> enough to apply to many data sources & serialization formats. They should
>> also be simple enough to not cause any additional library dependency to be
>> introduced.
>>>> Beyond those being initially included with this KIP, transformations
>> can be adopted for inclusion in future with JIRA/ML discussion to weigh the
>> tradeoffs.
>>> 
>>> In the 4+ years that we've had SMTs in the project, we've only
>>> enhanced the framework with KIP-585 (ref 2), and fixed the initial
>>> SMTs (including KIP-437, ref 3). We recently have had quite a few
>>> requests to add new SMTs; a few samples of these include:
>>> * https://issues.apache.org/jira/browse/KAFKA-10299
>>> * https://issues.apache.org/jira/browse/KAFKA-9436
>>> * https://issues.apache.org/jira/browse/KAFKA-9318
>>> * https://issues.apache.org/jira/browse/KAFKA-12443
>>> 
>>> Adding new or changing existing SMTs to the Apache Kafka project come
>>> with requirements. First, AK releases are infrequent and necessarily
>>> involve the entire project. Second, adding an SMT is an API change and
>>> therefore requires a KIP. Third, all changes in behavior to SMTs
>>> included in an prior AK release must be backward compatible, and
>>> adding or changing an SMT's configuration requires a KIP. This last
>>> one is also challenging if we're limiting ourselves to truly general
>>> SMTs, since these are notoriously difficult to get right the first
>>> time. All of these aspects mean that it's difficult to add, maintain,
>>> and evolve/improve SMTs in AK. And unless a bug fix is critical, we're
>>> likely not to create a patch release for AK just to fix a bug in an
>>> SMT, simply because of the effort involved.
>>> 
>>> On the other hand, anyone can easily implement their own SMT and
>>> deploy them as a Connect plugin, whether that's part of a connector
>>> plugin or a separate plugin dedicated for one or SMTs. Interestingly,
>>> it's far simpler to implement and maintain custom SMTs outside of AK,
>>> especially since those plugins can be released and deployed in any
>>> Connect runtime version since at least 0.11.0. And if custom SMTs are
>>> maintained in a relatively small project, they can be released often.
>>> 
>>> Finally, KIP-26 (ref 4) specifically rejected maintaining connector
>>> implementations in the AK project. So we have precedence for choosing
>>> not to accept implementations.
>>> 
>>> Given the above, I wonder if the time has come for us to prefer only
>>> maintaining the SMT framework and existing SMTs, and to decline adding
>>> new SMTs.
>>> 
>>> Thoughts?
>>> 
>>> Best regards,
>>> 
>>> Randall Hauch
>>> 
>>> (1)
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
>>> (2)
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
>>> (3)
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
>>> (4)
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
>> 

Re: Do we want to add more SMTs to Apache Kafka?

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Hi all,

Just came across this thread, I hope the late reply is ok.

FWIW, we're in a similar situation in Debezium, where users often request
new (Debezium-specific) SMTs, and we generally tend to recommend them to be
maintained by users themselves, unless they are truly generic. This
excludes a share of users though who aren't Java developers.

What might help is having means of simple discoverability of externally
hosted SMTs, e.g. via some kind of catalog hosted on kafka.apache.org. That
way, people would have it easier to find and obtain SMTs from other places,
reducing the pressure to get them added to Apache Kafka proper.

Best,

--Gunnar




Am So., 7. Nov. 2021 um 21:49 Uhr schrieb Brandon Brown <
brandon@bbrownsound.com>:

> I like the idea of a select number of SMTs being offered and supported out
> of the box. The addition of SMTs via this process is nice because it allows
> for a rich set to be supported out of the box and without the need for
> extra work to deploy.
>
> Perhaps this is a spot where the community could express the interest of
> additional SMTs which maybe are available via an open source library and if
> enough usage occurs there could be a path to fold into the Kafka project at
> large?
>
> Brandon Brown
>
>
> > On Nov 7, 2021, at 1:19 PM, Randall Hauch <rh...@gmail.com> wrote:
> >
> > We have had several requests to add more Connect Single Message
> > Transforms (SMTs) to the project. When SMTs were first introduced with
> > KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:
> >
> >> Criteria: SMTs that are shipped with Kafka Connect should be general
> enough to apply to many data sources & serialization formats. They should
> also be simple enough to not cause any additional library dependency to be
> introduced.
> >> Beyond those being initially included with this KIP, transformations
> can be adopted for inclusion in future with JIRA/ML discussion to weigh the
> tradeoffs.
> >
> > In the 4+ years that we've had SMTs in the project, we've only
> > enhanced the framework with KIP-585 (ref 2), and fixed the initial
> > SMTs (including KIP-437, ref 3). We recently have had quite a few
> > requests to add new SMTs; a few samples of these include:
> > * https://issues.apache.org/jira/browse/KAFKA-10299
> > * https://issues.apache.org/jira/browse/KAFKA-9436
> > * https://issues.apache.org/jira/browse/KAFKA-9318
> > * https://issues.apache.org/jira/browse/KAFKA-12443
> >
> > Adding new or changing existing SMTs to the Apache Kafka project come
> > with requirements. First, AK releases are infrequent and necessarily
> > involve the entire project. Second, adding an SMT is an API change and
> > therefore requires a KIP. Third, all changes in behavior to SMTs
> > included in an prior AK release must be backward compatible, and
> > adding or changing an SMT's configuration requires a KIP. This last
> > one is also challenging if we're limiting ourselves to truly general
> > SMTs, since these are notoriously difficult to get right the first
> > time. All of these aspects mean that it's difficult to add, maintain,
> > and evolve/improve SMTs in AK. And unless a bug fix is critical, we're
> > likely not to create a patch release for AK just to fix a bug in an
> > SMT, simply because of the effort involved.
> >
> > On the other hand, anyone can easily implement their own SMT and
> > deploy them as a Connect plugin, whether that's part of a connector
> > plugin or a separate plugin dedicated for one or SMTs. Interestingly,
> > it's far simpler to implement and maintain custom SMTs outside of AK,
> > especially since those plugins can be released and deployed in any
> > Connect runtime version since at least 0.11.0. And if custom SMTs are
> > maintained in a relatively small project, they can be released often.
> >
> > Finally, KIP-26 (ref 4) specifically rejected maintaining connector
> > implementations in the AK project. So we have precedence for choosing
> > not to accept implementations.
> >
> > Given the above, I wonder if the time has come for us to prefer only
> > maintaining the SMT framework and existing SMTs, and to decline adding
> > new SMTs.
> >
> > Thoughts?
> >
> > Best regards,
> >
> > Randall Hauch
> >
> > (1)
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
> > (2)
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
> > (3)
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> > (4)
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
>

Re: Do we want to add more SMTs to Apache Kafka?

Posted by Brandon Brown <br...@bbrownsound.com>.
I like the idea of a select number of SMTs being offered and supported out of the box. The addition of SMTs via this process is nice because it allows for a rich set to be supported out of the box and without the need for extra work to deploy. 

Perhaps this is a spot where the community could express the interest of additional SMTs which maybe are available via an open source library and if enough usage occurs there could be a path to fold into the Kafka project at large?

Brandon Brown


> On Nov 7, 2021, at 1:19 PM, Randall Hauch <rh...@gmail.com> wrote:
> 
> We have had several requests to add more Connect Single Message
> Transforms (SMTs) to the project. When SMTs were first introduced with
> KIP-66 (ref 1) in Jun 2017, the KIP mentioned the following:
> 
>> Criteria: SMTs that are shipped with Kafka Connect should be general enough to apply to many data sources & serialization formats. They should also be simple enough to not cause any additional library dependency to be introduced.
>> Beyond those being initially included with this KIP, transformations can be adopted for inclusion in future with JIRA/ML discussion to weigh the tradeoffs.
> 
> In the 4+ years that we've had SMTs in the project, we've only
> enhanced the framework with KIP-585 (ref 2), and fixed the initial
> SMTs (including KIP-437, ref 3). We recently have had quite a few
> requests to add new SMTs; a few samples of these include:
> * https://issues.apache.org/jira/browse/KAFKA-10299
> * https://issues.apache.org/jira/browse/KAFKA-9436
> * https://issues.apache.org/jira/browse/KAFKA-9318
> * https://issues.apache.org/jira/browse/KAFKA-12443
> 
> Adding new or changing existing SMTs to the Apache Kafka project come
> with requirements. First, AK releases are infrequent and necessarily
> involve the entire project. Second, adding an SMT is an API change and
> therefore requires a KIP. Third, all changes in behavior to SMTs
> included in an prior AK release must be backward compatible, and
> adding or changing an SMT's configuration requires a KIP. This last
> one is also challenging if we're limiting ourselves to truly general
> SMTs, since these are notoriously difficult to get right the first
> time. All of these aspects mean that it's difficult to add, maintain,
> and evolve/improve SMTs in AK. And unless a bug fix is critical, we're
> likely not to create a patch release for AK just to fix a bug in an
> SMT, simply because of the effort involved.
> 
> On the other hand, anyone can easily implement their own SMT and
> deploy them as a Connect plugin, whether that's part of a connector
> plugin or a separate plugin dedicated for one or SMTs. Interestingly,
> it's far simpler to implement and maintain custom SMTs outside of AK,
> especially since those plugins can be released and deployed in any
> Connect runtime version since at least 0.11.0. And if custom SMTs are
> maintained in a relatively small project, they can be released often.
> 
> Finally, KIP-26 (ref 4) specifically rejected maintaining connector
> implementations in the AK project. So we have precedence for choosing
> not to accept implementations.
> 
> Given the above, I wonder if the time has come for us to prefer only
> maintaining the SMT framework and existing SMTs, and to decline adding
> new SMTs.
> 
> Thoughts?
> 
> Best regards,
> 
> Randall Hauch
> 
> (1) https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
> (2) https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs
> (3) https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
> (4) https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767