You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Tom Bentley <tb...@redhat.com> on 2020/03/24 11:56:04 UTC

[DISCUSS] KIP-585: Conditional SMT

Hi,

I've opened KIP-585 which is intended to provide a mechanism to
conditionally apply SMTs in Kafka Connect. I'd be grateful for any
feedback:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT

Many thanks,

Tom

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Sönke Liebau <so...@opencore.com.INVALID>.
Hi Tom, hi Chris,

in principle an expression language sounds good, as it would allow us to
cover pretty much any use-case that we could possibly come up with. And
bearing in mind that I was the one who originally started talking about
more complex use-cases, I should be happy about that :)

That being said, I have my doubts, whether we'd be doing the world a favor
by heading down this road. There are or have been quite a few data
transformation DSLs out there, there is Camel's simple [1], Cloudera had a
try with Morphlines (or later KiteSDK) [2], Heka [3] and Logstash [4] both
allow defining transformation pipelines  - and those are just the ones I
came up with of the top of my head.

Instead of trying to evolve SMTs into the direction these tools have taken,
maybe there is worth in keeping these transforms simple and limited and if
more complex processing is needed pick something that is better suited -
Kafka Streams would definitely be an option as well for example.

I hope that makes sense? Happy to elaborate more if this just reads like
gibberish :)

Best regards,
Sönke

[1]
https://livebook.manning.com/book/camel-in-action-second-edition/appendix-a/1
[2]
https://blog.cloudera.com/introducing-morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-hadoop/
[3] https://hekad.readthedocs.io/en/v0.10.0/
[4] https://www.elastic.co/guide/en/logstash/current/pipeline.html

On Thu, 2 Apr 2020 at 18:48, Tom Bentley <tb...@redhat.com> wrote:

> Hi Chris and Sönke,
>
> Using the numbering from Chris's email...
>
> 1. That's a good point, I'll see what is needed to make that work.
>
> 2. I'm happy enough to add support for "and" and "or" as part of this KIP
> if people can see a need for it.
>
> In a similar vein, I was wondering about whether it would be worthwhile
> having the equivalent of an "else" clause (what's in the KIP is basically
> an "if" statement). Without support for "else" I think people would often
> need two conditionals, with the condition of one being the negation of the
> condition of another.
>
> 3. I can see the attraction of an expression language. The pros include
> being terse and familiar to programmers and potentially very flexible if
> that's needed in the future. I had a play and implemented it using ANTLR
> and it's not difficult to write a grammar and implement the functions we've
> already discussed and get decent error messages when the expression is
> malformed. So on the one hand I quite like the idea. On the other hand it
> feels like overkill for the use cases that have actually been identified so
> far.
>
> @Sönke what do you make of the expression language idea?
>
> Kind regards,
>
> Tom
>
> On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <ch...@confluent.io>
> wrote:
>
> > Hi Tom,
> >
> > This looks great and I'd love to see the out-of-the-box SMTs become even
> > more powerful with the improvements you've proposed! Got a few remarks
> and
> > would be interested in your thoughts:
> >
> > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> method,
> > what would you think about adopting a similar approach as the framework
> > uses with connectors, by adding a "Config validate(Map<String, String>
> > props)" method that can perform custom validation outside of what can be
> > performed by the ConfigDef's single-property-at-a-time validation? It may
> > be a little heavyweight for use with this particular SMT, but it'd
> provide
> > more flexibility for other SMT implementations and would mirror an API
> that
> > developers targeting the framework are likely already familiar with.
> > 2. The possibility for adding the logical operators "and" and "or" is
> > mentioned, but only as a potential future change and not one proposed by
> > this KIP. Why not include those operators sooner rather than later?
> > 3. The syntax for named conditions that are then referenced in logical
> > operators is... tricky. It took me a few attempts to grok the example
> > provided in the KIP after reading Sönke's question about the example for
> > negation. What about a more sophisticated but less verbose syntax that
> > supports a single configuration for the condition, even with logical
> > operators? I'm thinking something like
> > "transforms.conditionalExtract.condition: not(has-header:my-header)"
> > instead of the "transforms.conditionalExtract.condition: not:hasMyHeader"
> > and "transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header" properties. If support for a logical "and" is
> added,
> > this could then be expanded to something like
> > "transforms.conditionalExtract.condition: and(has-header(my-header),
> > not(topic-matches(my-prefix-.*)))". There would be additional complexity
> > here with the need to escape parentheses and commas that are intended to
> be
> > treated literally (as part of a header name, for example) instead of as
> > part of the syntax for the condition itself, but a little additional
> > complexity for edge cases like that may be warranted if it heavily
> reduces
> > complexity for the common cases. The rationale for the proposed
> > parentheses-based syntax here instead of what's mentioned in the KIP
> > (something like "and: <condition1>, <condition2>") is to help with
> > readability; we probably wouldn't need that with the approach of naming
> > conditions via separate properties, but things may get a little nasty
> with
> > literal conditions included there, especially with the possibility of
> > nesting. Finally, the shift in syntax here should make it easier to
> handle
> > cases like the header value matching brought up by Sönke; it might look
> > something like "header-matches(header-name, header-value-pattern)".
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Hi Sönke,
> > >
> > > Thanks for taking a look.
> > >
> > > Let me answer in reverse order, since I think it might make more sense
> > that
> > > way...
> > >
> > > Also, while writing that, I noticed that you have a version with and
> > > > without "name" for your transformation in the KIP:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > > > and
> > > > transforms.conditionalExtract.condition: has-header:my-header
> > > >
> > >
> > > The example
> > >     transforms.conditionalExtract.condition: has-header:my-header
> > > is a "has-header" condition (the prefix of the config value), which
> will
> > > match records with a "my-header" header (given in the suffix).
> > >
> > > The other example given is:
> > >     transforms.conditionalExtract.condition: not:hasMyHeader
> > >     transforms.conditionalExtract.condition.hasMyHeader:
> > > has-header:my-header
> > > The root of the condition is a "not" condition (the prefix of the value
> > for
> > > the transforms.conditionalExtract.condition key) of another named
> > condition
> > > called "hasMyHeader" (the suffix). Any name could be used for the other
> > > condition. That other condition is configured at
> > > "transforms.conditionalExtract.condition.<conditionName>". That
> condition
> > > is a "has-header" condition (the prefix), which will match records
> with a
> > > "my-header" header (given in the suffix). So the "has-header:"
> condition
> > > type would always require a suffix, as would the "not:" condition type.
> > > Hypothetically you could have a "true" condition type (which would not
> > > require a suffix), and the hypothetical binary conditions "and:" and
> > "or:"
> > > would require a pair of other condition names.
> > >
> > > So what's proposed is a scheme for encoding conditions where the
> > condition
> > > type is the prefix of the value of some "....condition" config key, and
> > the
> > > optional suffix provides parameters for the condition. This makes those
> > > parameters a bit inflexible, but is relatively succinct.
> > >
> > > This leads on to your first point. You're right that use cases might
> > appear
> > > which need other conditions, and we should make it flexible enough to
> be
> > > able to cope with future use cases. On the other hand, I was concerned
> > that
> > > we end up with something which is quite complicated to configure.
> (There
> > > comes a point where it might makes more sense for the user to write
> their
> > > own SMT).
> > >
> > > Just of the top of my head it might look like:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > header-name:my-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > field-value:my-value
> > > >
> > >
> > > That won't work because the format is basically a
> Properties/Map<String,
> > > String> and what you've suggested has duplicate keys.
> > >
> > > One thing I did briefly consider what the ability to treat conditions
> as
> > > Configurable objects in their own right (opening up the possibility of
> > > people supplying their own Conditions, just like they can supply their
> > own
> > > SMTs). That might be configured something like this:
> > >
> > >     transforms.conditionalExtract.condition: not
> > >     transforms.conditionalExtract.condition.not.type: Not
> > >     transforms.conditionalExtract.condition.not.negated: foo
> > >     transforms.conditionalExtract.condition.foo.type:
> HasHeaderWithValue
> > >     transforms.conditionalExtract.condition.foo.header: my-header
> > >     transforms.conditionalExtract.condition.foo.value: my-value
> > >
> > > I didn't propose that I couldn't see the use cases to justify this kind
> > of
> > > complexity, especially as the common case would surely be matching
> > against
> > > topic name (to be honest I wasn't completely convinced by the need for
> > > "has-header"). In the current version of the KIP that's just
> > >
> > >     transforms.conditionalExtract.condition: topic-matches:
> my-prefix-.*
> > >
> > > but using the more flexible scheme that would require something more
> like
> > > this:
> > >
> > >     transforms.conditionalExtract.condition: bar
> > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> > >     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
> > >
> > > If people know of use cases which would justify more sophistication,
> I'm
> > > happy to reconsider.
> > >
> > > Thanks again for taking a look!
> > >
> > > Tom
> > >
> > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > > <so...@opencore.com.invalid> wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > sounds useful to me, thanks for the KIP!
> > > > The only thought that I had while reading was that this will probably
> > > raise
> > > > questions about more involved conditions fairly quickly. For example
> > the
> > > > "has-header" will cause an appetite for conditions like
> > > > "this-header-has-that-value".
> > > > This would necessitate two parameters to be passed into the
> condition,
> > > > which I think is not currently included in the KIP. I am not saying
> add
> > > > this now, but might it make sense to discuss a concept of how that
> > might
> > > > look now, to avoid potential changes later on.
> > > >
> > > > Just of the top of my head it might look like:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > header-name:my-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > field-value:my-value
> > > >
> > > >
> > > > Also, while writing that, I noticed that you have a version with and
> > > > without "name" for your transformation in the KIP:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > > > and
> > > > transforms.conditionalExtract.condition: has-header:my-header
> > > >
> > > >
> > > > Is this intentional and the name is optional?
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Sönke
> > > >
> > > >
> > > >
> > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
> wrote:
> > > > >
> > > > > Does anyone have any comments, feedback or thoughts about this?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for
> any
> > > > > > feedback:
> > > > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > > > >
> > > > > > Many thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > >
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Konstantine,

Thanks very much for your input. It's helpful to have a steer from a
committer on the syntax point especially. I've made all those changes and
added a note about compatibility.

Cheers,

Tom

On Thu, May 14, 2020 at 6:00 AM Konstantine Karantasis <
konstantine@confluent.io> wrote:

> Hi Tom.
>
> Thanks for the KIP. I like how the proposal has ended up to be and I think
> it describes a practical approach.
>
> I have to say that, for a moment, earlier in the discussion I thought we
> were leaning a bit towards an unconventional mini assembly language based
> on java properties.
> The reliance on java properties for SMT definition and configuration has
> drawn some criticism in the past, so keeping the model lean and appealing
> as well as expressive is worth the effort in my opinion.
>
> With that in mind, I feel that we could move forward without the question
> mark notation. That approach would be similar to what was introduced when
> SMTs were implemented by adding the `transforms` prefix. We'll have a top
> level property called `predicates` and a namespace starting with the
> `predicates.` prefix followed by aliases. Avoiding the '?' in the
> transforms seems also ok to me in order to keep things simple and
> intuitive.
>
> Besides these suggestions, I have a few additional minor comments:
>
> 1. Should we extend the `AutoCloseable` interface instead of `Closeable`
> since we ignore the Exception in the signature anyways? I know this will
> make Predicate different from Transformation but it also corresponds better
> to the intended use.
> 2. As Andrew mentioned, Predicate needs to be declared an interface in the
> KIP.
> 3. Currently examples are neither java properties nor json. Should we
> comply with one of the two formats to avoid confusing users that will read
> the KIP before trying this feature?
>
> Thanks again for driving this proposal.
>
> Konstantine
>
> On Mon, May 11, 2020 at 7:57 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi Andrew,
> >
> > That works nicely enough for the proposal where the predicate is
> configured
> > directly on the transformation. But I thought there was more consensus
> > around the proposal to have the transformation configuration refer to a
> > predicate indirectly, defined with the ?predicates key. I guess an
> example
> > would look like this:
> >
> > transforms: t2
> > transforms.t2?predicate: has-prefix
> > transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> > transforms.t2.field: c1
> > ?predicates: has-prefix
> > ?predicates.has-prefix.type:
> > org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> > ?predicates.has-prefix.negate: true
> > ?predicates.has-prefix.pattern: my-prefix-.*
> >
> > I agree that this seems to reduce the chance of conflict to practically
> > nothing. I'm happy enough to go with this and I've updated the KIP
> > accordingly.
> >
> > Assuming no one raises more concerns I'll start a vote soon.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Mon, May 11, 2020 at 10:54 AM Andrew Schofield <
> > andrew_schofield@live.com>
> > wrote:
> >
> > > Hi,
> > > I have implemented some of this and configured some predicates and I
> > > prefer this slight modification of the ? syntax:
> > >
> > > transforms: t2
> > > transforms.t2?type:
> > > org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> > > transforms.t2?negate: true
> > > transforms.t2?pattern: my-prefix-.*
> > > transforms.t2.type:
> org.apache.kafka.connect.transforms.ExtractField$Key
> > > transforms.t2.field: c1
> > >
> > > So, "transforms.<alias>." introduces the config for the transformation,
> > > while "transforms.<alias>?" introduces the
> > > config for the predicate. The risk of a clash now is that someone has
> > used
> > > a ? in the alias name, which is unlikely.
> > >
> > > Also, there's a slight typo in the Java definition of the Predicate
> > > interface. It should be:
> > >
> > > public interface Predicate<R extends ConnectRecord<R>> extends
> > > Configurable, Closeable
> > >
> > > Looks like discussion has died down so I wonder whether it's time to
> > begin
> > > a vote.
> > >
> > > Cheers,
> > > Andrew
> > >
> > > On 06/05/2020, 08:57, "Andrew Schofield" <an...@live.com>
> > > wrote:
> > >
> > > >    Hi Tom,
> > > >    I think that either proposal is sufficiently readable and they
> both
> > > have elements of
> > > >    cryptic syntax. I remember talking to an engineering director once
> > > who said he didn't hire
> > > >    "curly braces people" any more. This is an interface for curly
> > braces
> > > people whichever
> > > >    way we go here.
> > >
> > > >    I slightly prefer the predicates as a top-level concept rather
> than
> > > as a feature
> > > >    of the If transform. I also don't like the namespacing behaviour
> of
> > > the If transform which
> > > >    I guess works like variable declaration so each scope has its own
> > > namespace of transform
> > > >    aliases. I expect the following is valid but ill-advised.
> > >
> > > >        transforms: t1
> > > >        transforms.t1.type: If
> > > >        transforms.t1.!test.type: TopicNameMatch
> > > >        transforms.t1.!test.pattern: my-prefix-.*
> > > >        transforms.t1.then: t1
> > > >        transforms.t1.then.t1.type: ExtractField$Key
> > > >        transforms.t1.then.t1.field: c1
> > >
> > > >    I would love to see conditional application of transforms in SMT
> in
> > > Kafka 2.6.
> > > >    Let's see what other people think and get a consensus view on how
> to
> > > spell it.
> > >
> > > >    Cheers,
> > > >    Andrew
> > >
> > > >    On 05/05/2020, 15:20, "Tom Bentley" <tb...@redhat.com> wrote:
> > >
> > > >        Hi,
> > >
> > > >        Again, reusing the existing numbering...
> > >
> > > >        3) I had exactly the same thought. The reason I didn't already
> > > rename it
> > > >        was because :
> > >
> > > >        * Filter is not exactly wrong (by default it just filters out
> > all
> > > the
> > > >        messages)
> > > >        * Filter is a useful hint about how it's intended to be used.
> > >
> > > >        But I don't have a strong opinion, so I'm happy to rename it
> if
> > > people
> > > >        think that Drop is better.
> > >
> > > >        4) Putting the negation on the use site–rather than the
> > > declaration
> > > >        site–seems like a good idea, since it makes it easier to get
> > > 'else'-like
> > > >        behaviour.
> > >
> > > >        But, playing devil's advocate, I wondered how all this
> compares
> > > with the
> > > >        original proposal, so I went to the trouble of writing out a
> few
> > > examples
> > > >        to compare the two
> > >
> > > >        Example of a basic 'if'
> > > >        Current proposal:
> > > >        transforms: t2
> > > >        transforms.t2.?predicate: !has-my-prefix
> > > >        transforms.t2.type: ExtractField$Key
> > > >        transforms.t2.field: c1
> > > >        ?predicates: has-my-prefix
> > > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > > >        Original (or originalish, see notes below):
> > > >        transforms: t1
> > > >        transforms.t1.type: If
> > > >        transforms.t1.!test.type: TopicNameMatch
> > > >        transforms.t1.!test.pattern: my-prefix-.*
> > > >        transforms.t1.then: t2
> > > >        transforms.t1.then.t2.type: ExtractField$Key
> > > >        transforms.t1.then.t2.field: c1
> > >
> > > >        Example of 'if/else':
> > > >        Current:
> > > >        transforms: t2,t3
> > > >        transforms.t2.?predicate: !has-my-prefix
> > > >        transforms.t2.type: ExtractField$Key
> > > >        transforms.t2.field: c1
> > > >        transforms.t3.?predicate: has-my-prefix
> > > >        transforms.t3.type: ExtractField$Key
> > > >        transforms.t3.field: c2
> > > >        ?predicates: has-my-prefix
> > > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > > >        Original:
> > > >        transforms: t1
> > > >        transforms.t1.type: If
> > > >        transforms.t1.!test.type: TopicNameMatch
> > > >        transforms.t1.!test.pattern: my-prefix-.*
> > > >        transforms.t1.then: t2
> > > >        transforms.t1.then.t2.type: ExtractField$Key
> > > >        transforms.t1.then.t2.field: c1
> > > >        transforms.t1.else: t3
> > > >        transforms.t1.else.t3.type: ExtractField$Key
> > > >        transforms.t1.else.t3.field: c2
> > >
> > > >        Example of guarding a >1 SMT
> > > >        Current:
> > > >        transforms: t2,t3
> > > >        transforms.t2.?predicate: !has-my-prefix
> > > >        transforms.t2.type: ExtractField$Key
> > > >        transforms.t2.field: c1
> > > >        transforms.t3.?predicate: !has-my-prefix
> > > >        transforms.t3.type: ExtractField$Value
> > > >        transforms.t3.field: c2
> > > >        ?predicates: has-my-prefix
> > > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > > >        Original:
> > > >        transforms: t1
> > > >        transforms.t1.type: If
> > > >        transforms.t1.!test.type: TopicNameMatch
> > > >        transforms.t1.!test.pattern: my-prefix-.*
> > > >        transforms.t1.then: t2,t3
> > > >        transforms.t1.then.t2.type: ExtractField$Key
> > > >        transforms.t1.then.t2.field: c1
> > > >        transforms.t1.then.t3.type: ExtractField$Value
> > > >        transforms.t1.then.t3.field: c2
> > >
> > > >        Notes:
> > > >        * I'm assuming we would negate predicates in the Current
> > proposal
> > > by
> > > >        prefixing the predicate name with !
> > > >        * I'm using "test" rather than "predicate" in the Original
> > > because it looks
> > > >        a bit nicer having the same number of letters as "then" and
> > > "else".
> > > >        * I'm negating the test in the Original by using "!test"
> rather
> > > than "test".
> > >
> > > >        Those notes aside, I'd make the following assertions:
> > >
> > > >        1. The Original examples require the same number of keys (or
> one
> > > less line
> > > >        in the last example).
> > > >        2. The Original examples are more guessable if you'd never
> seen
> > > this KIP
> > > >        before. The SMT name (If) and its config parameter names
> ('test'
> > > and 'then'
> > > >        and maybe also 'else') all give quite strong clues about
> what's
> > > happening.
> > > >        I'd also argue that using !test is less cryptic than
> > > '?predicate', since
> > > >        the purpose of the '!' fits with its use in most programming
> > > languages. The
> > > >        purpose of the '?' in '?predicate' is to reduce the chance of
> a
> > > naming
> > > >        collision, which is not something which people could guess.
> > > >        3. The Original examples cannot have config parameters naming
> > > conflicts
> > > >        with either Connectors or Transformations.
> > >
> > > >        Although I don't think it's a strong argument I would also
> add:
> > >
> > > >        4. Because the original syntax supports nesting of 'If' SMTs
> > it's
> > > actually
> > > >        more powerful.
> > >
> > > >        So, I know people really didn't like the original suggestion,
> > but
> > > if you
> > > >        accept these assertions then the current proposal is worse.
> Can
> > > anyone
> > > >        point to some objective ways in which the current proposal is
> > > better than
> > > >        the original one?
> > >
> > > >        If people really think the current proposal is better then I'm
> > > happy to go
> > > >        with that, call a vote and see if it has any traction with
> > people
> > > outside
> > > >        this discussion. Ultimately I just want to get support for
> this
> > > feature
> > > >        into Kafka, however it's configured.
> > >
> > > >        Kind regards,
> > >
> > > >        Tom
> > >
> > > >        On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <
> > > andrew_schofield@live.com>
> > > >        wrote:
> > >
> > > >        > Hi Tom,
> > > >        > Referring back to Chris's points.
> > > >        >
> > > >        > 1) Exception handling - yes, agreed. This KIP can just say
> the
> > > exception
> > > >        > handling for
> > > >        > predicates is the same as transformations.
> > > >        >
> > > >        > 2) Worker plugins - agree too.
> > > >        >
> > > >        > 3) Filter SMT - If the Filter SMT is just becoming a
> > > transformation that
> > > >        > drops any records that
> > > >        > make it through the predicate, it's probably actually a Drop
> > > >        > transformation. If it's
> > > >        > not guarded with a predicate, it drops every record, but
> with
> > a
> > > predicate
> > > >        > it just
> > > >        > drops records that match.
> > > >        >
> > > >        > 4) Syntax - I quite like the idea of a separate namespace
> for
> > > predicates,
> > > >        > just like for transformations.
> > > >        > I particularly like how you could have 2 transformations
> > > referencing the
> > > >        > same predicate and config.
> > > >        >
> > > >        > There is the scope for collision with connector
> configuration,
> > > but my
> > > >        > hunch is that
> > > >        > "predicates" is less risky that simply "predicate" since it
> > > would only
> > > >        > clash with a connector
> > > >        > that happened to have multiple predicates.
> > > >        >
> > > >        > But I think there is a non-zero chance that an existing
> > > transformation
> > > >        > might have used
> > > >        > "predicate" as a config. I have actually written a filter
> > > transformation
> > > >        > myself. It didn't use
> > > >        > "predicate" but it almost did. That's why I suggested
> > > "?predicate" which I
> > > >        > know is a bit
> > > >        > cryptic, but it's also not going to clash.
> > > >        >
> > > >        > Here are two serving suggestions:
> > > >        >
> > > >        > A) "?predicate" to name the predicate alias to use
> > > >        > transforms: t2
> > > >        > transforms.t2.?predicate: has-my-prefix
> > > >        > transforms.t2.type:ExtractField$Key
> > > >        > transforms.t2.field: c1
> > > >        >
> > > >        > B) "?" to name the predicate alias to use
> > > >        > transforms: t2
> > > >        > transforms.t2.?: has-my-prefix
> > > >        > transforms.t2.type:ExtractField$Key
> > > >        > transforms.t2.field: c1
> > > >        >
> > > >        >
> > > >        > One more question.
> > > >        >
> > > >        > Do people think the negation ought to be in the predicate
> > > config or in the
> > > >        > referring transformation
> > > >        > config? We might want to make:
> > > >        >
> > > >        >   predicate(a)=>transform(T1)
> > > >        >   predicate(negate(a))=>transform(T2)
> > > >        >
> > > >        > easier than having to use two separate predicate aliases
> with
> > > identical
> > > >        > configuration, except for
> > > >        > the negation.
> > > >        >
> > > >        > Thanks,
> > > >        > Andrew
> > > >        >
> > > >        > On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com>
> > > wrote:
> > > >        >
> > > >        >     Hi Chris,
> > > >        >
> > > >        >     Thanks for the feedback. On points 1 and 2 I agree.
> > > >        >
> > > >        >     3. This is a great point. I'm happy to add this to the
> KIP
> > > too, but
> > > >        > note it
> > > >        >     removes the need to add validate(Map<String, String>) to
> > > both
> > > >        >     Transformation and Predicate. That doesn't mean we
> > > shouldn't still do
> > > >        > it,
> > > >        >     just that the change is not motivated by the features
> > being
> > > added in
> > > >        > the
> > > >        >     KIP. So I've removed validate(Map<String, String>) for
> > now.
> > > >        >
> > > >        >     4. Your proposal would further reduce the chance of
> > > Transformation
> > > >        > config
> > > >        >     conflict (because just the 'predicate' parameter might
> > > conflict), but
> > > >        > it
> > > >        >     also adds the chance of conflict with Connector
> parameters
> > > (all the
> > > >        >     'predicates.*' parameters). Connector implementations
> are
> > > far more
> > > >        > numerous
> > > >        >     than Transformation implementations, so the risk is
> > > proportionately
> > > >        >     greater. As you say we could try to reduce the chances
> of
> > > conflict
> > > >        > (e.g.
> > > >        >     '?predicate' parameter on transformations and
> > '?predicates'
> > > prefix on
> > > >        >     connectors), but I'm still left with uneasy feelings.
> I'd
> > > like to hear
> > > >        >     about what others think before I change this one.
> > > >        >
> > > >        >     Cheers,
> > > >        >
> > > >        >     Tom
> > > >        >
> > > >        >
> > > >        >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
> > > >        > chrise@confluent.io>
> > > >        >     wrote:
> > > >        >
> > > >        >     > Hi all,
> > > >        >     >
> > > >        >     > This has certainly taken quite a turn! I wish we could
> > > get this done
> > > >        >     > without adding another pluggable interface but if the
> > > benefit is
> > > >        > that now
> > > >        >     > any SMT--new or pre-existing--can be conditionally
> > > applied, it seems
> > > >        > like
> > > >        >     > it might be worth it.
> > > >        >     >
> > > >        >     > As far as the proposed design goes, some thoughts:
> > > >        >     >
> > > >        >     > 1. There isn't a clear reason why exceptions thrown
> from
> > > predicates
> > > >        > should
> > > >        >     > be treated differently from exceptions thrown by
> > > transformations. We
> > > >        > have
> > > >        >     > reasonable error-tolerance mechanisms in the Connect
> > > framework; why
> > > >        >     > silently swallow errors, especially if the risk is
> > > publishing
> > > >        > potentially
> > > >        >     > corrupted data to Kafka or an external system because
> an
> > > SMT that
> > > >        > should
> > > >        >     > have been applied wasn't?
> > > >        >     >
> > > >        >     > 2. It might be worth noting that the interface is a
> > > worker plugin,
> > > >        > and that
> > > >        >     > presumably it would be loaded in the same way as other
> > > worker
> > > >        > plugins such
> > > >        >     > as converters, connectors, and REST extensions. This
> > > would include
> > > >        > aliasing
> > > >        >     > behavior that would allow users to specify predicates
> > > using their
> > > >        > simple
> > > >        >     > class names as long as no two predicate plugins with
> the
> > > same simple
> > > >        > name
> > > >        >     > were available on the worker.
> > > >        >     >
> > > >        >     > 3. Why would the Filter SMT explicitly take in a
> > > predicate in its
> > > >        >     > configuration if predicates can be applied to all
> SMTs?
> > > Just reading
> > > >        > the
> > > >        >     > idea for a filter SMT, it seemed like the behavior
> would
> > > be that the
> > > >        > SMT
> > > >        >     > would take in no configuration parameters and just
> > > blindly drop
> > > >        > everything
> > > >        >     > that it sees, but typical use would involve pairing it
> > > with a
> > > >        > predicate.
> > > >        >     >
> > > >        >     > 4. The question mark syntax seems a little cryptic,
> and
> > > combining the
> > > >        >     > configuration properties for the predicate and the
> > > transformation in
> > > >        > the
> > > >        >     > same namespace ("transforms.<name>.*) seems a little
> > > noisy. What do
> > > >        > you
> > > >        >     > think about allowing predicates to be configured in
> > their
> > > own
> > > >        > namespace,
> > > >        >     > and referenced by name in a single "predicate" (or
> maybe
> > > >        > "?predicate" if we
> > > >        >     > really want to avoid risking backwards compatibility
> > > concerns)
> > > >        > property?
> > > >        >     > This would also enable them to be more easily reused
> > > across several
> > > >        > SMTs.
> > > >        >     >
> > > >        >     > For example, you might configure a worker with these
> > > properties
> > > >        > (assuming
> > > >        >     > plugin aliasing is supported for predicates in the
> same
> > > way that it
> > > >        > is for
> > > >        >     > transformations):
> > > >        >     >
> > > >        >     > transforms: t2
> > > >        >     > transforms.t2.predicate: has-my-prefix
> > > >        >     > transforms.t2.type:ExtractField$Key
> > > >        >     > transforms.t2.field: c1
> > > >        >     >
> > > >        >     > predicates: has-my-prefix
> > > >        >     > predicates.has-my-prefix.type: TopicNameMatch
> > > >        >     > predicates.has-my-prefix.negate: true
> > > >        >     > predicates.has-my-prefix.pattern: my-prefix-.*
> > > >        >     >
> > > >        >     > Excited to see how this is evolving and I think we're
> > > heading towards
> > > >        >     > something really useful for the Connect framework.
> > > >        >     >
> > > >        >     > Cheers,
> > > >        >     >
> > > >        >     > Chris
> > > >        >     >
> > > >        >     >
> > > >        >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
> > > >        > andrew_schofield@live.com
> > > >        >     > >
> > > >        >     > wrote:
> > > >        >     >
> > > >        >     > > Hi Tom,
> > > >        >     > > Looks good to me.
> > > >        >     > >
> > > >        >     > > Thanks,
> > > >        >     > > Andrew
> > > >        >     > >
> > > >        >     > > On 01/05/2020, 16:24, "Tom Bentley" <
> > > tbentley@redhat.com> wrote:
> > > >        >     > >
> > > >        >     > >     Hi,
> > > >        >     > >
> > > >        >     > >     I've updated the KIP to reflect recent
> > discussions.
> > > Please
> > > >        > note the
> > > >        >     > >     following:
> > > >        >     > >
> > > >        >     > >     1. I've described a HasHeaderKey predicate
> rather
> > > than
> > > >        >     > HeaderKeyMatches
> > > >        >     > >     because at the moment Headers doesn't expose the
> > > whole set of
> > > >        >     > headers,
> > > >        >     > > only
> > > >        >     > >     those with a specific name. Exposing this would
> > > significantly
> > > >        >     > increase
> > > >        >     > > the
> > > >        >     > >     scope of this KIP but relatively little extra
> > > benefit.
> > > >        >     > >     2. I used
> > > org.apache.kafka.connect.transformer.predicates
> > > >        > rather than
> > > >        >     > >     org.apache.kafka.connect.predicates, since I
> > > thought it better
> > > >        >     > > reflected
> > > >        >     > >     the use of predicates within transforms. I'm
> > > flexible on this
> > > >        > one if
> > > >        >     > >     someone has a good case for the original name.
> For
> > > example the
> > > >        >     > original
> > > >        >     > >     package name would be more appropriate if we
> were
> > > expecting
> > > >        >     > Connectors
> > > >        >     > > to
> > > >        >     > >     make use of Predicates somehow.
> > > >        >     > >     3. I've dropped the special case of not needing
> to
> > > provide a
> > > >        > FQN for
> > > >        >     > >     predicate classes in that package, since this
> > isn't
> > > something
> > > >        >     > > supported by
> > > >        >     > >     transformations themselves as far as I know. I
> > like
> > > the idea,
> > > >        > but it
> > > >        >     > > seemed
> > > >        >     > >     inconsistent to need it for transformations but
> > not
> > > for
> > > >        > predicates.
> > > >        >     > >     4. Chris, I've used your suggestions for a
> > > validate() method
> > > >        > for the
> > > >        >     > > extra
> > > >        >     > >     configurability needed. I've also added this to
> > > Predicate (even
> > > >        >     > though
> > > >        >     > > I
> > > >        >     > >     don't need it) for consistency with Connector
> and
> > > >        > Transformation.
> > > >        >     > >     5. For negating the result of a predicate I
> > decided
> > > it was
> > > >        > clearer to
> > > >        >     > > just
> > > >        >     > >     have a "negate" config parameter, supported by
> all
> > > predicates,
> > > >        > than
> > > >        >     > to
> > > >        >     > >     reach for more punctuation.
> > > >        >     > >
> > > >        >     > >     I'd be grateful for any more feedback people
> might
> > > have.
> > > >        >     > >
> > > >        >     > >     Kind regards,
> > > >        >     > >
> > > >        >     > >     Tom
> > > >        >     > >
> > > >        >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew
> Schofield <
> > > >        >     > > andrew_schofield@live.com>
> > > >        >     > >     wrote:
> > > >        >     > >
> > > >        >     > >     > Hi Tom,
> > > >        >     > >     > I think we should go for a consistent naming
> > > convention for
> > > >        > the
> > > >        >     > >     > predicates, maybe so they
> > > >        >     > >     > read nicely if you say "IF" first. Here's a
> > > suggestion for
> > > >        >     > > predicates that
> > > >        >     > >     > the KIP could introduce:
> > > >        >     > >     >  - TopicNameMatches
> > > >        >     > >     >  - HeaderKeyMatches
> > > >        >     > >     >  - RecordIsTombstone
> > > >        >     > >     >
> > > >        >     > >     > On naming, I also suggest using
> > > >        >     > >     > org.apache.kafka.connect.predicates.Predicate
> > > >        >     > >     > as the name for the interface.
> > > >        >     > >     >
> > > >        >     > >     > I hadn't settled on a preference for
> negation. I
> > > suppose
> > > >        > there are
> > > >        >     > > three
> > > >        >     > >     > results
> > > >        >     > >     > from evaluating a predicate - true, false or
> an
> > > exception
> > > >        > caught by
> > > >        >     > > Kafka
> > > >        >     > >     > Connect.
> > > >        >     > >     > I suggest the exception case is always a
> > > predicate fail so
> > > >        > the
> > > >        >     > > guarded
> > > >        >     > >     > transformation
> > > >        >     > >     > is skipped. Initially I'd preferred having the
> > > predicate be
> > > >        >     > > configured to
> > > >        >     > >     > return a
> > > >        >     > >     > negated result to give a "not match"
> behaviour.
> > > Thinking
> > > >        > again
> > > >        >     > now, I
> > > >        >     > >     > prefer something like:
> > > >        >     > >     >
> > > >        >     > >     > transforms: t1
> > > >        >     > >     > transforms.t1.?type: RecordIsTombstone
> > > >        >     > >     > transforms.t1.type:
> > > >        >     > >
> org.apache.kafka.connect.transforms.InsertField$Value
> > > >        >     > >     > transforms.t1.static.field: deleted
> > > >        >     > >     > transforms.t1.static.value: true
> > > >        >     > >     >
> > > >        >     > >     > The "?" means the transform guarded by the
> > > predicate runs in
> > > >        > the
> > > >        >     > > Predicate
> > > >        >     > >     > returns true.
> > > >        >     > >     >
> > > >        >     > >     > transforms: t2
> > > >        >     > >     > transforms.t2.?!type:
> > > >        >     > > org.apache.kafka.connect.predicates.TopicNameMatch
> > > >        >     > >     > transforms.t2.?regex: my-prefix-.*
> > > >        >     > >     > transforms.t2.type:
> > > >        >     > > org.apache.kafka.connect.transforms.ExtractField$Key
> > > >        >     > >     > transforms.t2.field: c1
> > > >        >     > >     >
> > > >        >     > >     > The "?!" means the transform guarded by the
> > > predicate runs
> > > >        > if the
> > > >        >     > > Predicate
> > > >        >     > >     > returns false.
> > > >        >     > >     >
> > > >        >     > >     > I think adding a Filter SMT that uses a
> > Predicate
> > > is a nice
> > > >        > idea.
> > > >        >     > >     >
> > > >        >     > >     > Cheers,
> > > >        >     > >     > Andrew
> > > >        >     > >     >
> > > >        >     >
> > > >        >     >
> > > >        >
> > > >        >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Konstantine Karantasis <ko...@confluent.io>.
Hi Tom.

Thanks for the KIP. I like how the proposal has ended up to be and I think
it describes a practical approach.

I have to say that, for a moment, earlier in the discussion I thought we
were leaning a bit towards an unconventional mini assembly language based
on java properties.
The reliance on java properties for SMT definition and configuration has
drawn some criticism in the past, so keeping the model lean and appealing
as well as expressive is worth the effort in my opinion.

With that in mind, I feel that we could move forward without the question
mark notation. That approach would be similar to what was introduced when
SMTs were implemented by adding the `transforms` prefix. We'll have a top
level property called `predicates` and a namespace starting with the
`predicates.` prefix followed by aliases. Avoiding the '?' in the
transforms seems also ok to me in order to keep things simple and
intuitive.

Besides these suggestions, I have a few additional minor comments:

1. Should we extend the `AutoCloseable` interface instead of `Closeable`
since we ignore the Exception in the signature anyways? I know this will
make Predicate different from Transformation but it also corresponds better
to the intended use.
2. As Andrew mentioned, Predicate needs to be declared an interface in the
KIP.
3. Currently examples are neither java properties nor json. Should we
comply with one of the two formats to avoid confusing users that will read
the KIP before trying this feature?

Thanks again for driving this proposal.

Konstantine

On Mon, May 11, 2020 at 7:57 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Andrew,
>
> That works nicely enough for the proposal where the predicate is configured
> directly on the transformation. But I thought there was more consensus
> around the proposal to have the transformation configuration refer to a
> predicate indirectly, defined with the ?predicates key. I guess an example
> would look like this:
>
> transforms: t2
> transforms.t2?predicate: has-prefix
> transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.t2.field: c1
> ?predicates: has-prefix
> ?predicates.has-prefix.type:
> org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> ?predicates.has-prefix.negate: true
> ?predicates.has-prefix.pattern: my-prefix-.*
>
> I agree that this seems to reduce the chance of conflict to practically
> nothing. I'm happy enough to go with this and I've updated the KIP
> accordingly.
>
> Assuming no one raises more concerns I'll start a vote soon.
>
> Kind regards,
>
> Tom
>
> On Mon, May 11, 2020 at 10:54 AM Andrew Schofield <
> andrew_schofield@live.com>
> wrote:
>
> > Hi,
> > I have implemented some of this and configured some predicates and I
> > prefer this slight modification of the ? syntax:
> >
> > transforms: t2
> > transforms.t2?type:
> > org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> > transforms.t2?negate: true
> > transforms.t2?pattern: my-prefix-.*
> > transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> > transforms.t2.field: c1
> >
> > So, "transforms.<alias>." introduces the config for the transformation,
> > while "transforms.<alias>?" introduces the
> > config for the predicate. The risk of a clash now is that someone has
> used
> > a ? in the alias name, which is unlikely.
> >
> > Also, there's a slight typo in the Java definition of the Predicate
> > interface. It should be:
> >
> > public interface Predicate<R extends ConnectRecord<R>> extends
> > Configurable, Closeable
> >
> > Looks like discussion has died down so I wonder whether it's time to
> begin
> > a vote.
> >
> > Cheers,
> > Andrew
> >
> > On 06/05/2020, 08:57, "Andrew Schofield" <an...@live.com>
> > wrote:
> >
> > >    Hi Tom,
> > >    I think that either proposal is sufficiently readable and they both
> > have elements of
> > >    cryptic syntax. I remember talking to an engineering director once
> > who said he didn't hire
> > >    "curly braces people" any more. This is an interface for curly
> braces
> > people whichever
> > >    way we go here.
> >
> > >    I slightly prefer the predicates as a top-level concept rather than
> > as a feature
> > >    of the If transform. I also don't like the namespacing behaviour of
> > the If transform which
> > >    I guess works like variable declaration so each scope has its own
> > namespace of transform
> > >    aliases. I expect the following is valid but ill-advised.
> >
> > >        transforms: t1
> > >        transforms.t1.type: If
> > >        transforms.t1.!test.type: TopicNameMatch
> > >        transforms.t1.!test.pattern: my-prefix-.*
> > >        transforms.t1.then: t1
> > >        transforms.t1.then.t1.type: ExtractField$Key
> > >        transforms.t1.then.t1.field: c1
> >
> > >    I would love to see conditional application of transforms in SMT in
> > Kafka 2.6.
> > >    Let's see what other people think and get a consensus view on how to
> > spell it.
> >
> > >    Cheers,
> > >    Andrew
> >
> > >    On 05/05/2020, 15:20, "Tom Bentley" <tb...@redhat.com> wrote:
> >
> > >        Hi,
> >
> > >        Again, reusing the existing numbering...
> >
> > >        3) I had exactly the same thought. The reason I didn't already
> > rename it
> > >        was because :
> >
> > >        * Filter is not exactly wrong (by default it just filters out
> all
> > the
> > >        messages)
> > >        * Filter is a useful hint about how it's intended to be used.
> >
> > >        But I don't have a strong opinion, so I'm happy to rename it if
> > people
> > >        think that Drop is better.
> >
> > >        4) Putting the negation on the use site–rather than the
> > declaration
> > >        site–seems like a good idea, since it makes it easier to get
> > 'else'-like
> > >        behaviour.
> >
> > >        But, playing devil's advocate, I wondered how all this compares
> > with the
> > >        original proposal, so I went to the trouble of writing out a few
> > examples
> > >        to compare the two
> >
> > >        Example of a basic 'if'
> > >        Current proposal:
> > >        transforms: t2
> > >        transforms.t2.?predicate: !has-my-prefix
> > >        transforms.t2.type: ExtractField$Key
> > >        transforms.t2.field: c1
> > >        ?predicates: has-my-prefix
> > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > >        Original (or originalish, see notes below):
> > >        transforms: t1
> > >        transforms.t1.type: If
> > >        transforms.t1.!test.type: TopicNameMatch
> > >        transforms.t1.!test.pattern: my-prefix-.*
> > >        transforms.t1.then: t2
> > >        transforms.t1.then.t2.type: ExtractField$Key
> > >        transforms.t1.then.t2.field: c1
> >
> > >        Example of 'if/else':
> > >        Current:
> > >        transforms: t2,t3
> > >        transforms.t2.?predicate: !has-my-prefix
> > >        transforms.t2.type: ExtractField$Key
> > >        transforms.t2.field: c1
> > >        transforms.t3.?predicate: has-my-prefix
> > >        transforms.t3.type: ExtractField$Key
> > >        transforms.t3.field: c2
> > >        ?predicates: has-my-prefix
> > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > >        Original:
> > >        transforms: t1
> > >        transforms.t1.type: If
> > >        transforms.t1.!test.type: TopicNameMatch
> > >        transforms.t1.!test.pattern: my-prefix-.*
> > >        transforms.t1.then: t2
> > >        transforms.t1.then.t2.type: ExtractField$Key
> > >        transforms.t1.then.t2.field: c1
> > >        transforms.t1.else: t3
> > >        transforms.t1.else.t3.type: ExtractField$Key
> > >        transforms.t1.else.t3.field: c2
> >
> > >        Example of guarding a >1 SMT
> > >        Current:
> > >        transforms: t2,t3
> > >        transforms.t2.?predicate: !has-my-prefix
> > >        transforms.t2.type: ExtractField$Key
> > >        transforms.t2.field: c1
> > >        transforms.t3.?predicate: !has-my-prefix
> > >        transforms.t3.type: ExtractField$Value
> > >        transforms.t3.field: c2
> > >        ?predicates: has-my-prefix
> > >        ?predicates.has-my-prefix.type: TopicNameMatch
> > >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> > >        Original:
> > >        transforms: t1
> > >        transforms.t1.type: If
> > >        transforms.t1.!test.type: TopicNameMatch
> > >        transforms.t1.!test.pattern: my-prefix-.*
> > >        transforms.t1.then: t2,t3
> > >        transforms.t1.then.t2.type: ExtractField$Key
> > >        transforms.t1.then.t2.field: c1
> > >        transforms.t1.then.t3.type: ExtractField$Value
> > >        transforms.t1.then.t3.field: c2
> >
> > >        Notes:
> > >        * I'm assuming we would negate predicates in the Current
> proposal
> > by
> > >        prefixing the predicate name with !
> > >        * I'm using "test" rather than "predicate" in the Original
> > because it looks
> > >        a bit nicer having the same number of letters as "then" and
> > "else".
> > >        * I'm negating the test in the Original by using "!test" rather
> > than "test".
> >
> > >        Those notes aside, I'd make the following assertions:
> >
> > >        1. The Original examples require the same number of keys (or one
> > less line
> > >        in the last example).
> > >        2. The Original examples are more guessable if you'd never seen
> > this KIP
> > >        before. The SMT name (If) and its config parameter names ('test'
> > and 'then'
> > >        and maybe also 'else') all give quite strong clues about what's
> > happening.
> > >        I'd also argue that using !test is less cryptic than
> > '?predicate', since
> > >        the purpose of the '!' fits with its use in most programming
> > languages. The
> > >        purpose of the '?' in '?predicate' is to reduce the chance of a
> > naming
> > >        collision, which is not something which people could guess.
> > >        3. The Original examples cannot have config parameters naming
> > conflicts
> > >        with either Connectors or Transformations.
> >
> > >        Although I don't think it's a strong argument I would also add:
> >
> > >        4. Because the original syntax supports nesting of 'If' SMTs
> it's
> > actually
> > >        more powerful.
> >
> > >        So, I know people really didn't like the original suggestion,
> but
> > if you
> > >        accept these assertions then the current proposal is worse. Can
> > anyone
> > >        point to some objective ways in which the current proposal is
> > better than
> > >        the original one?
> >
> > >        If people really think the current proposal is better then I'm
> > happy to go
> > >        with that, call a vote and see if it has any traction with
> people
> > outside
> > >        this discussion. Ultimately I just want to get support for this
> > feature
> > >        into Kafka, however it's configured.
> >
> > >        Kind regards,
> >
> > >        Tom
> >
> > >        On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <
> > andrew_schofield@live.com>
> > >        wrote:
> >
> > >        > Hi Tom,
> > >        > Referring back to Chris's points.
> > >        >
> > >        > 1) Exception handling - yes, agreed. This KIP can just say the
> > exception
> > >        > handling for
> > >        > predicates is the same as transformations.
> > >        >
> > >        > 2) Worker plugins - agree too.
> > >        >
> > >        > 3) Filter SMT - If the Filter SMT is just becoming a
> > transformation that
> > >        > drops any records that
> > >        > make it through the predicate, it's probably actually a Drop
> > >        > transformation. If it's
> > >        > not guarded with a predicate, it drops every record, but with
> a
> > predicate
> > >        > it just
> > >        > drops records that match.
> > >        >
> > >        > 4) Syntax - I quite like the idea of a separate namespace for
> > predicates,
> > >        > just like for transformations.
> > >        > I particularly like how you could have 2 transformations
> > referencing the
> > >        > same predicate and config.
> > >        >
> > >        > There is the scope for collision with connector configuration,
> > but my
> > >        > hunch is that
> > >        > "predicates" is less risky that simply "predicate" since it
> > would only
> > >        > clash with a connector
> > >        > that happened to have multiple predicates.
> > >        >
> > >        > But I think there is a non-zero chance that an existing
> > transformation
> > >        > might have used
> > >        > "predicate" as a config. I have actually written a filter
> > transformation
> > >        > myself. It didn't use
> > >        > "predicate" but it almost did. That's why I suggested
> > "?predicate" which I
> > >        > know is a bit
> > >        > cryptic, but it's also not going to clash.
> > >        >
> > >        > Here are two serving suggestions:
> > >        >
> > >        > A) "?predicate" to name the predicate alias to use
> > >        > transforms: t2
> > >        > transforms.t2.?predicate: has-my-prefix
> > >        > transforms.t2.type:ExtractField$Key
> > >        > transforms.t2.field: c1
> > >        >
> > >        > B) "?" to name the predicate alias to use
> > >        > transforms: t2
> > >        > transforms.t2.?: has-my-prefix
> > >        > transforms.t2.type:ExtractField$Key
> > >        > transforms.t2.field: c1
> > >        >
> > >        >
> > >        > One more question.
> > >        >
> > >        > Do people think the negation ought to be in the predicate
> > config or in the
> > >        > referring transformation
> > >        > config? We might want to make:
> > >        >
> > >        >   predicate(a)=>transform(T1)
> > >        >   predicate(negate(a))=>transform(T2)
> > >        >
> > >        > easier than having to use two separate predicate aliases with
> > identical
> > >        > configuration, except for
> > >        > the negation.
> > >        >
> > >        > Thanks,
> > >        > Andrew
> > >        >
> > >        > On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com>
> > wrote:
> > >        >
> > >        >     Hi Chris,
> > >        >
> > >        >     Thanks for the feedback. On points 1 and 2 I agree.
> > >        >
> > >        >     3. This is a great point. I'm happy to add this to the KIP
> > too, but
> > >        > note it
> > >        >     removes the need to add validate(Map<String, String>) to
> > both
> > >        >     Transformation and Predicate. That doesn't mean we
> > shouldn't still do
> > >        > it,
> > >        >     just that the change is not motivated by the features
> being
> > added in
> > >        > the
> > >        >     KIP. So I've removed validate(Map<String, String>) for
> now.
> > >        >
> > >        >     4. Your proposal would further reduce the chance of
> > Transformation
> > >        > config
> > >        >     conflict (because just the 'predicate' parameter might
> > conflict), but
> > >        > it
> > >        >     also adds the chance of conflict with Connector parameters
> > (all the
> > >        >     'predicates.*' parameters). Connector implementations are
> > far more
> > >        > numerous
> > >        >     than Transformation implementations, so the risk is
> > proportionately
> > >        >     greater. As you say we could try to reduce the chances of
> > conflict
> > >        > (e.g.
> > >        >     '?predicate' parameter on transformations and
> '?predicates'
> > prefix on
> > >        >     connectors), but I'm still left with uneasy feelings. I'd
> > like to hear
> > >        >     about what others think before I change this one.
> > >        >
> > >        >     Cheers,
> > >        >
> > >        >     Tom
> > >        >
> > >        >
> > >        >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
> > >        > chrise@confluent.io>
> > >        >     wrote:
> > >        >
> > >        >     > Hi all,
> > >        >     >
> > >        >     > This has certainly taken quite a turn! I wish we could
> > get this done
> > >        >     > without adding another pluggable interface but if the
> > benefit is
> > >        > that now
> > >        >     > any SMT--new or pre-existing--can be conditionally
> > applied, it seems
> > >        > like
> > >        >     > it might be worth it.
> > >        >     >
> > >        >     > As far as the proposed design goes, some thoughts:
> > >        >     >
> > >        >     > 1. There isn't a clear reason why exceptions thrown from
> > predicates
> > >        > should
> > >        >     > be treated differently from exceptions thrown by
> > transformations. We
> > >        > have
> > >        >     > reasonable error-tolerance mechanisms in the Connect
> > framework; why
> > >        >     > silently swallow errors, especially if the risk is
> > publishing
> > >        > potentially
> > >        >     > corrupted data to Kafka or an external system because an
> > SMT that
> > >        > should
> > >        >     > have been applied wasn't?
> > >        >     >
> > >        >     > 2. It might be worth noting that the interface is a
> > worker plugin,
> > >        > and that
> > >        >     > presumably it would be loaded in the same way as other
> > worker
> > >        > plugins such
> > >        >     > as converters, connectors, and REST extensions. This
> > would include
> > >        > aliasing
> > >        >     > behavior that would allow users to specify predicates
> > using their
> > >        > simple
> > >        >     > class names as long as no two predicate plugins with the
> > same simple
> > >        > name
> > >        >     > were available on the worker.
> > >        >     >
> > >        >     > 3. Why would the Filter SMT explicitly take in a
> > predicate in its
> > >        >     > configuration if predicates can be applied to all SMTs?
> > Just reading
> > >        > the
> > >        >     > idea for a filter SMT, it seemed like the behavior would
> > be that the
> > >        > SMT
> > >        >     > would take in no configuration parameters and just
> > blindly drop
> > >        > everything
> > >        >     > that it sees, but typical use would involve pairing it
> > with a
> > >        > predicate.
> > >        >     >
> > >        >     > 4. The question mark syntax seems a little cryptic, and
> > combining the
> > >        >     > configuration properties for the predicate and the
> > transformation in
> > >        > the
> > >        >     > same namespace ("transforms.<name>.*) seems a little
> > noisy. What do
> > >        > you
> > >        >     > think about allowing predicates to be configured in
> their
> > own
> > >        > namespace,
> > >        >     > and referenced by name in a single "predicate" (or maybe
> > >        > "?predicate" if we
> > >        >     > really want to avoid risking backwards compatibility
> > concerns)
> > >        > property?
> > >        >     > This would also enable them to be more easily reused
> > across several
> > >        > SMTs.
> > >        >     >
> > >        >     > For example, you might configure a worker with these
> > properties
> > >        > (assuming
> > >        >     > plugin aliasing is supported for predicates in the same
> > way that it
> > >        > is for
> > >        >     > transformations):
> > >        >     >
> > >        >     > transforms: t2
> > >        >     > transforms.t2.predicate: has-my-prefix
> > >        >     > transforms.t2.type:ExtractField$Key
> > >        >     > transforms.t2.field: c1
> > >        >     >
> > >        >     > predicates: has-my-prefix
> > >        >     > predicates.has-my-prefix.type: TopicNameMatch
> > >        >     > predicates.has-my-prefix.negate: true
> > >        >     > predicates.has-my-prefix.pattern: my-prefix-.*
> > >        >     >
> > >        >     > Excited to see how this is evolving and I think we're
> > heading towards
> > >        >     > something really useful for the Connect framework.
> > >        >     >
> > >        >     > Cheers,
> > >        >     >
> > >        >     > Chris
> > >        >     >
> > >        >     >
> > >        >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
> > >        > andrew_schofield@live.com
> > >        >     > >
> > >        >     > wrote:
> > >        >     >
> > >        >     > > Hi Tom,
> > >        >     > > Looks good to me.
> > >        >     > >
> > >        >     > > Thanks,
> > >        >     > > Andrew
> > >        >     > >
> > >        >     > > On 01/05/2020, 16:24, "Tom Bentley" <
> > tbentley@redhat.com> wrote:
> > >        >     > >
> > >        >     > >     Hi,
> > >        >     > >
> > >        >     > >     I've updated the KIP to reflect recent
> discussions.
> > Please
> > >        > note the
> > >        >     > >     following:
> > >        >     > >
> > >        >     > >     1. I've described a HasHeaderKey predicate rather
> > than
> > >        >     > HeaderKeyMatches
> > >        >     > >     because at the moment Headers doesn't expose the
> > whole set of
> > >        >     > headers,
> > >        >     > > only
> > >        >     > >     those with a specific name. Exposing this would
> > significantly
> > >        >     > increase
> > >        >     > > the
> > >        >     > >     scope of this KIP but relatively little extra
> > benefit.
> > >        >     > >     2. I used
> > org.apache.kafka.connect.transformer.predicates
> > >        > rather than
> > >        >     > >     org.apache.kafka.connect.predicates, since I
> > thought it better
> > >        >     > > reflected
> > >        >     > >     the use of predicates within transforms. I'm
> > flexible on this
> > >        > one if
> > >        >     > >     someone has a good case for the original name. For
> > example the
> > >        >     > original
> > >        >     > >     package name would be more appropriate if we were
> > expecting
> > >        >     > Connectors
> > >        >     > > to
> > >        >     > >     make use of Predicates somehow.
> > >        >     > >     3. I've dropped the special case of not needing to
> > provide a
> > >        > FQN for
> > >        >     > >     predicate classes in that package, since this
> isn't
> > something
> > >        >     > > supported by
> > >        >     > >     transformations themselves as far as I know. I
> like
> > the idea,
> > >        > but it
> > >        >     > > seemed
> > >        >     > >     inconsistent to need it for transformations but
> not
> > for
> > >        > predicates.
> > >        >     > >     4. Chris, I've used your suggestions for a
> > validate() method
> > >        > for the
> > >        >     > > extra
> > >        >     > >     configurability needed. I've also added this to
> > Predicate (even
> > >        >     > though
> > >        >     > > I
> > >        >     > >     don't need it) for consistency with Connector and
> > >        > Transformation.
> > >        >     > >     5. For negating the result of a predicate I
> decided
> > it was
> > >        > clearer to
> > >        >     > > just
> > >        >     > >     have a "negate" config parameter, supported by all
> > predicates,
> > >        > than
> > >        >     > to
> > >        >     > >     reach for more punctuation.
> > >        >     > >
> > >        >     > >     I'd be grateful for any more feedback people might
> > have.
> > >        >     > >
> > >        >     > >     Kind regards,
> > >        >     > >
> > >        >     > >     Tom
> > >        >     > >
> > >        >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
> > >        >     > > andrew_schofield@live.com>
> > >        >     > >     wrote:
> > >        >     > >
> > >        >     > >     > Hi Tom,
> > >        >     > >     > I think we should go for a consistent naming
> > convention for
> > >        > the
> > >        >     > >     > predicates, maybe so they
> > >        >     > >     > read nicely if you say "IF" first. Here's a
> > suggestion for
> > >        >     > > predicates that
> > >        >     > >     > the KIP could introduce:
> > >        >     > >     >  - TopicNameMatches
> > >        >     > >     >  - HeaderKeyMatches
> > >        >     > >     >  - RecordIsTombstone
> > >        >     > >     >
> > >        >     > >     > On naming, I also suggest using
> > >        >     > >     > org.apache.kafka.connect.predicates.Predicate
> > >        >     > >     > as the name for the interface.
> > >        >     > >     >
> > >        >     > >     > I hadn't settled on a preference for negation. I
> > suppose
> > >        > there are
> > >        >     > > three
> > >        >     > >     > results
> > >        >     > >     > from evaluating a predicate - true, false or an
> > exception
> > >        > caught by
> > >        >     > > Kafka
> > >        >     > >     > Connect.
> > >        >     > >     > I suggest the exception case is always a
> > predicate fail so
> > >        > the
> > >        >     > > guarded
> > >        >     > >     > transformation
> > >        >     > >     > is skipped. Initially I'd preferred having the
> > predicate be
> > >        >     > > configured to
> > >        >     > >     > return a
> > >        >     > >     > negated result to give a "not match" behaviour.
> > Thinking
> > >        > again
> > >        >     > now, I
> > >        >     > >     > prefer something like:
> > >        >     > >     >
> > >        >     > >     > transforms: t1
> > >        >     > >     > transforms.t1.?type: RecordIsTombstone
> > >        >     > >     > transforms.t1.type:
> > >        >     > > org.apache.kafka.connect.transforms.InsertField$Value
> > >        >     > >     > transforms.t1.static.field: deleted
> > >        >     > >     > transforms.t1.static.value: true
> > >        >     > >     >
> > >        >     > >     > The "?" means the transform guarded by the
> > predicate runs in
> > >        > the
> > >        >     > > Predicate
> > >        >     > >     > returns true.
> > >        >     > >     >
> > >        >     > >     > transforms: t2
> > >        >     > >     > transforms.t2.?!type:
> > >        >     > > org.apache.kafka.connect.predicates.TopicNameMatch
> > >        >     > >     > transforms.t2.?regex: my-prefix-.*
> > >        >     > >     > transforms.t2.type:
> > >        >     > > org.apache.kafka.connect.transforms.ExtractField$Key
> > >        >     > >     > transforms.t2.field: c1
> > >        >     > >     >
> > >        >     > >     > The "?!" means the transform guarded by the
> > predicate runs
> > >        > if the
> > >        >     > > Predicate
> > >        >     > >     > returns false.
> > >        >     > >     >
> > >        >     > >     > I think adding a Filter SMT that uses a
> Predicate
> > is a nice
> > >        > idea.
> > >        >     > >     >
> > >        >     > >     > Cheers,
> > >        >     > >     > Andrew
> > >        >     > >     >
> > >        >     >
> > >        >     >
> > >        >
> > >        >
> >
> >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Andrew,

That works nicely enough for the proposal where the predicate is configured
directly on the transformation. But I thought there was more consensus
around the proposal to have the transformation configuration refer to a
predicate indirectly, defined with the ?predicates key. I guess an example
would look like this:

transforms: t2
transforms.t2?predicate: has-prefix
transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.t2.field: c1
?predicates: has-prefix
?predicates.has-prefix.type:
org.apache.kafka.connect.transforms.predicates.TopicNameMatch
?predicates.has-prefix.negate: true
?predicates.has-prefix.pattern: my-prefix-.*

I agree that this seems to reduce the chance of conflict to practically
nothing. I'm happy enough to go with this and I've updated the KIP
accordingly.

Assuming no one raises more concerns I'll start a vote soon.

Kind regards,

Tom

On Mon, May 11, 2020 at 10:54 AM Andrew Schofield <an...@live.com>
wrote:

> Hi,
> I have implemented some of this and configured some predicates and I
> prefer this slight modification of the ? syntax:
>
> transforms: t2
> transforms.t2?type:
> org.apache.kafka.connect.transforms.predicates.TopicNameMatch
> transforms.t2?negate: true
> transforms.t2?pattern: my-prefix-.*
> transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.t2.field: c1
>
> So, "transforms.<alias>." introduces the config for the transformation,
> while "transforms.<alias>?" introduces the
> config for the predicate. The risk of a clash now is that someone has used
> a ? in the alias name, which is unlikely.
>
> Also, there's a slight typo in the Java definition of the Predicate
> interface. It should be:
>
> public interface Predicate<R extends ConnectRecord<R>> extends
> Configurable, Closeable
>
> Looks like discussion has died down so I wonder whether it's time to begin
> a vote.
>
> Cheers,
> Andrew
>
> On 06/05/2020, 08:57, "Andrew Schofield" <an...@live.com>
> wrote:
>
> >    Hi Tom,
> >    I think that either proposal is sufficiently readable and they both
> have elements of
> >    cryptic syntax. I remember talking to an engineering director once
> who said he didn't hire
> >    "curly braces people" any more. This is an interface for curly braces
> people whichever
> >    way we go here.
>
> >    I slightly prefer the predicates as a top-level concept rather than
> as a feature
> >    of the If transform. I also don't like the namespacing behaviour of
> the If transform which
> >    I guess works like variable declaration so each scope has its own
> namespace of transform
> >    aliases. I expect the following is valid but ill-advised.
>
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t1
> >        transforms.t1.then.t1.type: ExtractField$Key
> >        transforms.t1.then.t1.field: c1
>
> >    I would love to see conditional application of transforms in SMT in
> Kafka 2.6.
> >    Let's see what other people think and get a consensus view on how to
> spell it.
>
> >    Cheers,
> >    Andrew
>
> >    On 05/05/2020, 15:20, "Tom Bentley" <tb...@redhat.com> wrote:
>
> >        Hi,
>
> >        Again, reusing the existing numbering...
>
> >        3) I had exactly the same thought. The reason I didn't already
> rename it
> >        was because :
>
> >        * Filter is not exactly wrong (by default it just filters out all
> the
> >        messages)
> >        * Filter is a useful hint about how it's intended to be used.
>
> >        But I don't have a strong opinion, so I'm happy to rename it if
> people
> >        think that Drop is better.
>
> >        4) Putting the negation on the use site–rather than the
> declaration
> >        site–seems like a good idea, since it makes it easier to get
> 'else'-like
> >        behaviour.
>
> >        But, playing devil's advocate, I wondered how all this compares
> with the
> >        original proposal, so I went to the trouble of writing out a few
> examples
> >        to compare the two
>
> >        Example of a basic 'if'
> >        Current proposal:
> >        transforms: t2
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original (or originalish, see notes below):
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
>
> >        Example of 'if/else':
> >        Current:
> >        transforms: t2,t3
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        transforms.t3.?predicate: has-my-prefix
> >        transforms.t3.type: ExtractField$Key
> >        transforms.t3.field: c2
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original:
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
> >        transforms.t1.else: t3
> >        transforms.t1.else.t3.type: ExtractField$Key
> >        transforms.t1.else.t3.field: c2
>
> >        Example of guarding a >1 SMT
> >        Current:
> >        transforms: t2,t3
> >        transforms.t2.?predicate: !has-my-prefix
> >        transforms.t2.type: ExtractField$Key
> >        transforms.t2.field: c1
> >        transforms.t3.?predicate: !has-my-prefix
> >        transforms.t3.type: ExtractField$Value
> >        transforms.t3.field: c2
> >        ?predicates: has-my-prefix
> >        ?predicates.has-my-prefix.type: TopicNameMatch
> >        ?predicates.has-my-prefix.pattern: my-prefix-.*
> >        Original:
> >        transforms: t1
> >        transforms.t1.type: If
> >        transforms.t1.!test.type: TopicNameMatch
> >        transforms.t1.!test.pattern: my-prefix-.*
> >        transforms.t1.then: t2,t3
> >        transforms.t1.then.t2.type: ExtractField$Key
> >        transforms.t1.then.t2.field: c1
> >        transforms.t1.then.t3.type: ExtractField$Value
> >        transforms.t1.then.t3.field: c2
>
> >        Notes:
> >        * I'm assuming we would negate predicates in the Current proposal
> by
> >        prefixing the predicate name with !
> >        * I'm using "test" rather than "predicate" in the Original
> because it looks
> >        a bit nicer having the same number of letters as "then" and
> "else".
> >        * I'm negating the test in the Original by using "!test" rather
> than "test".
>
> >        Those notes aside, I'd make the following assertions:
>
> >        1. The Original examples require the same number of keys (or one
> less line
> >        in the last example).
> >        2. The Original examples are more guessable if you'd never seen
> this KIP
> >        before. The SMT name (If) and its config parameter names ('test'
> and 'then'
> >        and maybe also 'else') all give quite strong clues about what's
> happening.
> >        I'd also argue that using !test is less cryptic than
> '?predicate', since
> >        the purpose of the '!' fits with its use in most programming
> languages. The
> >        purpose of the '?' in '?predicate' is to reduce the chance of a
> naming
> >        collision, which is not something which people could guess.
> >        3. The Original examples cannot have config parameters naming
> conflicts
> >        with either Connectors or Transformations.
>
> >        Although I don't think it's a strong argument I would also add:
>
> >        4. Because the original syntax supports nesting of 'If' SMTs it's
> actually
> >        more powerful.
>
> >        So, I know people really didn't like the original suggestion, but
> if you
> >        accept these assertions then the current proposal is worse. Can
> anyone
> >        point to some objective ways in which the current proposal is
> better than
> >        the original one?
>
> >        If people really think the current proposal is better then I'm
> happy to go
> >        with that, call a vote and see if it has any traction with people
> outside
> >        this discussion. Ultimately I just want to get support for this
> feature
> >        into Kafka, however it's configured.
>
> >        Kind regards,
>
> >        Tom
>
> >        On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <
> andrew_schofield@live.com>
> >        wrote:
>
> >        > Hi Tom,
> >        > Referring back to Chris's points.
> >        >
> >        > 1) Exception handling - yes, agreed. This KIP can just say the
> exception
> >        > handling for
> >        > predicates is the same as transformations.
> >        >
> >        > 2) Worker plugins - agree too.
> >        >
> >        > 3) Filter SMT - If the Filter SMT is just becoming a
> transformation that
> >        > drops any records that
> >        > make it through the predicate, it's probably actually a Drop
> >        > transformation. If it's
> >        > not guarded with a predicate, it drops every record, but with a
> predicate
> >        > it just
> >        > drops records that match.
> >        >
> >        > 4) Syntax - I quite like the idea of a separate namespace for
> predicates,
> >        > just like for transformations.
> >        > I particularly like how you could have 2 transformations
> referencing the
> >        > same predicate and config.
> >        >
> >        > There is the scope for collision with connector configuration,
> but my
> >        > hunch is that
> >        > "predicates" is less risky that simply "predicate" since it
> would only
> >        > clash with a connector
> >        > that happened to have multiple predicates.
> >        >
> >        > But I think there is a non-zero chance that an existing
> transformation
> >        > might have used
> >        > "predicate" as a config. I have actually written a filter
> transformation
> >        > myself. It didn't use
> >        > "predicate" but it almost did. That's why I suggested
> "?predicate" which I
> >        > know is a bit
> >        > cryptic, but it's also not going to clash.
> >        >
> >        > Here are two serving suggestions:
> >        >
> >        > A) "?predicate" to name the predicate alias to use
> >        > transforms: t2
> >        > transforms.t2.?predicate: has-my-prefix
> >        > transforms.t2.type:ExtractField$Key
> >        > transforms.t2.field: c1
> >        >
> >        > B) "?" to name the predicate alias to use
> >        > transforms: t2
> >        > transforms.t2.?: has-my-prefix
> >        > transforms.t2.type:ExtractField$Key
> >        > transforms.t2.field: c1
> >        >
> >        >
> >        > One more question.
> >        >
> >        > Do people think the negation ought to be in the predicate
> config or in the
> >        > referring transformation
> >        > config? We might want to make:
> >        >
> >        >   predicate(a)=>transform(T1)
> >        >   predicate(negate(a))=>transform(T2)
> >        >
> >        > easier than having to use two separate predicate aliases with
> identical
> >        > configuration, except for
> >        > the negation.
> >        >
> >        > Thanks,
> >        > Andrew
> >        >
> >        > On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com>
> wrote:
> >        >
> >        >     Hi Chris,
> >        >
> >        >     Thanks for the feedback. On points 1 and 2 I agree.
> >        >
> >        >     3. This is a great point. I'm happy to add this to the KIP
> too, but
> >        > note it
> >        >     removes the need to add validate(Map<String, String>) to
> both
> >        >     Transformation and Predicate. That doesn't mean we
> shouldn't still do
> >        > it,
> >        >     just that the change is not motivated by the features being
> added in
> >        > the
> >        >     KIP. So I've removed validate(Map<String, String>) for now.
> >        >
> >        >     4. Your proposal would further reduce the chance of
> Transformation
> >        > config
> >        >     conflict (because just the 'predicate' parameter might
> conflict), but
> >        > it
> >        >     also adds the chance of conflict with Connector parameters
> (all the
> >        >     'predicates.*' parameters). Connector implementations are
> far more
> >        > numerous
> >        >     than Transformation implementations, so the risk is
> proportionately
> >        >     greater. As you say we could try to reduce the chances of
> conflict
> >        > (e.g.
> >        >     '?predicate' parameter on transformations and '?predicates'
> prefix on
> >        >     connectors), but I'm still left with uneasy feelings. I'd
> like to hear
> >        >     about what others think before I change this one.
> >        >
> >        >     Cheers,
> >        >
> >        >     Tom
> >        >
> >        >
> >        >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
> >        > chrise@confluent.io>
> >        >     wrote:
> >        >
> >        >     > Hi all,
> >        >     >
> >        >     > This has certainly taken quite a turn! I wish we could
> get this done
> >        >     > without adding another pluggable interface but if the
> benefit is
> >        > that now
> >        >     > any SMT--new or pre-existing--can be conditionally
> applied, it seems
> >        > like
> >        >     > it might be worth it.
> >        >     >
> >        >     > As far as the proposed design goes, some thoughts:
> >        >     >
> >        >     > 1. There isn't a clear reason why exceptions thrown from
> predicates
> >        > should
> >        >     > be treated differently from exceptions thrown by
> transformations. We
> >        > have
> >        >     > reasonable error-tolerance mechanisms in the Connect
> framework; why
> >        >     > silently swallow errors, especially if the risk is
> publishing
> >        > potentially
> >        >     > corrupted data to Kafka or an external system because an
> SMT that
> >        > should
> >        >     > have been applied wasn't?
> >        >     >
> >        >     > 2. It might be worth noting that the interface is a
> worker plugin,
> >        > and that
> >        >     > presumably it would be loaded in the same way as other
> worker
> >        > plugins such
> >        >     > as converters, connectors, and REST extensions. This
> would include
> >        > aliasing
> >        >     > behavior that would allow users to specify predicates
> using their
> >        > simple
> >        >     > class names as long as no two predicate plugins with the
> same simple
> >        > name
> >        >     > were available on the worker.
> >        >     >
> >        >     > 3. Why would the Filter SMT explicitly take in a
> predicate in its
> >        >     > configuration if predicates can be applied to all SMTs?
> Just reading
> >        > the
> >        >     > idea for a filter SMT, it seemed like the behavior would
> be that the
> >        > SMT
> >        >     > would take in no configuration parameters and just
> blindly drop
> >        > everything
> >        >     > that it sees, but typical use would involve pairing it
> with a
> >        > predicate.
> >        >     >
> >        >     > 4. The question mark syntax seems a little cryptic, and
> combining the
> >        >     > configuration properties for the predicate and the
> transformation in
> >        > the
> >        >     > same namespace ("transforms.<name>.*) seems a little
> noisy. What do
> >        > you
> >        >     > think about allowing predicates to be configured in their
> own
> >        > namespace,
> >        >     > and referenced by name in a single "predicate" (or maybe
> >        > "?predicate" if we
> >        >     > really want to avoid risking backwards compatibility
> concerns)
> >        > property?
> >        >     > This would also enable them to be more easily reused
> across several
> >        > SMTs.
> >        >     >
> >        >     > For example, you might configure a worker with these
> properties
> >        > (assuming
> >        >     > plugin aliasing is supported for predicates in the same
> way that it
> >        > is for
> >        >     > transformations):
> >        >     >
> >        >     > transforms: t2
> >        >     > transforms.t2.predicate: has-my-prefix
> >        >     > transforms.t2.type:ExtractField$Key
> >        >     > transforms.t2.field: c1
> >        >     >
> >        >     > predicates: has-my-prefix
> >        >     > predicates.has-my-prefix.type: TopicNameMatch
> >        >     > predicates.has-my-prefix.negate: true
> >        >     > predicates.has-my-prefix.pattern: my-prefix-.*
> >        >     >
> >        >     > Excited to see how this is evolving and I think we're
> heading towards
> >        >     > something really useful for the Connect framework.
> >        >     >
> >        >     > Cheers,
> >        >     >
> >        >     > Chris
> >        >     >
> >        >     >
> >        >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
> >        > andrew_schofield@live.com
> >        >     > >
> >        >     > wrote:
> >        >     >
> >        >     > > Hi Tom,
> >        >     > > Looks good to me.
> >        >     > >
> >        >     > > Thanks,
> >        >     > > Andrew
> >        >     > >
> >        >     > > On 01/05/2020, 16:24, "Tom Bentley" <
> tbentley@redhat.com> wrote:
> >        >     > >
> >        >     > >     Hi,
> >        >     > >
> >        >     > >     I've updated the KIP to reflect recent discussions.
> Please
> >        > note the
> >        >     > >     following:
> >        >     > >
> >        >     > >     1. I've described a HasHeaderKey predicate rather
> than
> >        >     > HeaderKeyMatches
> >        >     > >     because at the moment Headers doesn't expose the
> whole set of
> >        >     > headers,
> >        >     > > only
> >        >     > >     those with a specific name. Exposing this would
> significantly
> >        >     > increase
> >        >     > > the
> >        >     > >     scope of this KIP but relatively little extra
> benefit.
> >        >     > >     2. I used
> org.apache.kafka.connect.transformer.predicates
> >        > rather than
> >        >     > >     org.apache.kafka.connect.predicates, since I
> thought it better
> >        >     > > reflected
> >        >     > >     the use of predicates within transforms. I'm
> flexible on this
> >        > one if
> >        >     > >     someone has a good case for the original name. For
> example the
> >        >     > original
> >        >     > >     package name would be more appropriate if we were
> expecting
> >        >     > Connectors
> >        >     > > to
> >        >     > >     make use of Predicates somehow.
> >        >     > >     3. I've dropped the special case of not needing to
> provide a
> >        > FQN for
> >        >     > >     predicate classes in that package, since this isn't
> something
> >        >     > > supported by
> >        >     > >     transformations themselves as far as I know. I like
> the idea,
> >        > but it
> >        >     > > seemed
> >        >     > >     inconsistent to need it for transformations but not
> for
> >        > predicates.
> >        >     > >     4. Chris, I've used your suggestions for a
> validate() method
> >        > for the
> >        >     > > extra
> >        >     > >     configurability needed. I've also added this to
> Predicate (even
> >        >     > though
> >        >     > > I
> >        >     > >     don't need it) for consistency with Connector and
> >        > Transformation.
> >        >     > >     5. For negating the result of a predicate I decided
> it was
> >        > clearer to
> >        >     > > just
> >        >     > >     have a "negate" config parameter, supported by all
> predicates,
> >        > than
> >        >     > to
> >        >     > >     reach for more punctuation.
> >        >     > >
> >        >     > >     I'd be grateful for any more feedback people might
> have.
> >        >     > >
> >        >     > >     Kind regards,
> >        >     > >
> >        >     > >     Tom
> >        >     > >
> >        >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
> >        >     > > andrew_schofield@live.com>
> >        >     > >     wrote:
> >        >     > >
> >        >     > >     > Hi Tom,
> >        >     > >     > I think we should go for a consistent naming
> convention for
> >        > the
> >        >     > >     > predicates, maybe so they
> >        >     > >     > read nicely if you say "IF" first. Here's a
> suggestion for
> >        >     > > predicates that
> >        >     > >     > the KIP could introduce:
> >        >     > >     >  - TopicNameMatches
> >        >     > >     >  - HeaderKeyMatches
> >        >     > >     >  - RecordIsTombstone
> >        >     > >     >
> >        >     > >     > On naming, I also suggest using
> >        >     > >     > org.apache.kafka.connect.predicates.Predicate
> >        >     > >     > as the name for the interface.
> >        >     > >     >
> >        >     > >     > I hadn't settled on a preference for negation. I
> suppose
> >        > there are
> >        >     > > three
> >        >     > >     > results
> >        >     > >     > from evaluating a predicate - true, false or an
> exception
> >        > caught by
> >        >     > > Kafka
> >        >     > >     > Connect.
> >        >     > >     > I suggest the exception case is always a
> predicate fail so
> >        > the
> >        >     > > guarded
> >        >     > >     > transformation
> >        >     > >     > is skipped. Initially I'd preferred having the
> predicate be
> >        >     > > configured to
> >        >     > >     > return a
> >        >     > >     > negated result to give a "not match" behaviour.
> Thinking
> >        > again
> >        >     > now, I
> >        >     > >     > prefer something like:
> >        >     > >     >
> >        >     > >     > transforms: t1
> >        >     > >     > transforms.t1.?type: RecordIsTombstone
> >        >     > >     > transforms.t1.type:
> >        >     > > org.apache.kafka.connect.transforms.InsertField$Value
> >        >     > >     > transforms.t1.static.field: deleted
> >        >     > >     > transforms.t1.static.value: true
> >        >     > >     >
> >        >     > >     > The "?" means the transform guarded by the
> predicate runs in
> >        > the
> >        >     > > Predicate
> >        >     > >     > returns true.
> >        >     > >     >
> >        >     > >     > transforms: t2
> >        >     > >     > transforms.t2.?!type:
> >        >     > > org.apache.kafka.connect.predicates.TopicNameMatch
> >        >     > >     > transforms.t2.?regex: my-prefix-.*
> >        >     > >     > transforms.t2.type:
> >        >     > > org.apache.kafka.connect.transforms.ExtractField$Key
> >        >     > >     > transforms.t2.field: c1
> >        >     > >     >
> >        >     > >     > The "?!" means the transform guarded by the
> predicate runs
> >        > if the
> >        >     > > Predicate
> >        >     > >     > returns false.
> >        >     > >     >
> >        >     > >     > I think adding a Filter SMT that uses a Predicate
> is a nice
> >        > idea.
> >        >     > >     >
> >        >     > >     > Cheers,
> >        >     > >     > Andrew
> >        >     > >     >
> >        >     >
> >        >     >
> >        >
> >        >
>
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi,
I have implemented some of this and configured some predicates and I prefer this slight modification of the ? syntax:

transforms: t2
transforms.t2?type: org.apache.kafka.connect.transforms.predicates.TopicNameMatch
transforms.t2?negate: true
transforms.t2?pattern: my-prefix-.*
transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.t2.field: c1

So, "transforms.<alias>." introduces the config for the transformation, while "transforms.<alias>?" introduces the
config for the predicate. The risk of a clash now is that someone has used a ? in the alias name, which is unlikely.

Also, there's a slight typo in the Java definition of the Predicate interface. It should be:

public interface Predicate<R extends ConnectRecord<R>> extends Configurable, Closeable

Looks like discussion has died down so I wonder whether it's time to begin a vote.

Cheers,
Andrew

On 06/05/2020, 08:57, "Andrew Schofield" <an...@live.com> wrote:

>    Hi Tom,
>    I think that either proposal is sufficiently readable and they both have elements of
>    cryptic syntax. I remember talking to an engineering director once who said he didn't hire
>    "curly braces people" any more. This is an interface for curly braces people whichever
>    way we go here.

>    I slightly prefer the predicates as a top-level concept rather than as a feature
>    of the If transform. I also don't like the namespacing behaviour of the If transform which
>    I guess works like variable declaration so each scope has its own namespace of transform
>    aliases. I expect the following is valid but ill-advised.

>        transforms: t1
>        transforms.t1.type: If
>        transforms.t1.!test.type: TopicNameMatch
>        transforms.t1.!test.pattern: my-prefix-.*
>        transforms.t1.then: t1
>        transforms.t1.then.t1.type: ExtractField$Key
>        transforms.t1.then.t1.field: c1

>    I would love to see conditional application of transforms in SMT in Kafka 2.6.
>    Let's see what other people think and get a consensus view on how to spell it.

>    Cheers,
>    Andrew

>    On 05/05/2020, 15:20, "Tom Bentley" <tb...@redhat.com> wrote:

>        Hi,

>        Again, reusing the existing numbering...

>        3) I had exactly the same thought. The reason I didn't already rename it
>        was because :

>        * Filter is not exactly wrong (by default it just filters out all the
>        messages)
>        * Filter is a useful hint about how it's intended to be used.

>        But I don't have a strong opinion, so I'm happy to rename it if people
>        think that Drop is better.

>        4) Putting the negation on the use site–rather than the declaration
>        site–seems like a good idea, since it makes it easier to get 'else'-like
>        behaviour.

>        But, playing devil's advocate, I wondered how all this compares with the
>        original proposal, so I went to the trouble of writing out a few examples
>        to compare the two

>        Example of a basic 'if'
>        Current proposal:
>        transforms: t2
>        transforms.t2.?predicate: !has-my-prefix
>        transforms.t2.type: ExtractField$Key
>        transforms.t2.field: c1
>        ?predicates: has-my-prefix
>        ?predicates.has-my-prefix.type: TopicNameMatch
>        ?predicates.has-my-prefix.pattern: my-prefix-.*
>        Original (or originalish, see notes below):
>        transforms: t1
>        transforms.t1.type: If
>        transforms.t1.!test.type: TopicNameMatch
>        transforms.t1.!test.pattern: my-prefix-.*
>        transforms.t1.then: t2
>        transforms.t1.then.t2.type: ExtractField$Key
>        transforms.t1.then.t2.field: c1

>        Example of 'if/else':
>        Current:
>        transforms: t2,t3
>        transforms.t2.?predicate: !has-my-prefix
>        transforms.t2.type: ExtractField$Key
>        transforms.t2.field: c1
>        transforms.t3.?predicate: has-my-prefix
>        transforms.t3.type: ExtractField$Key
>        transforms.t3.field: c2
>        ?predicates: has-my-prefix
>        ?predicates.has-my-prefix.type: TopicNameMatch
>        ?predicates.has-my-prefix.pattern: my-prefix-.*
>        Original:
>        transforms: t1
>        transforms.t1.type: If
>        transforms.t1.!test.type: TopicNameMatch
>        transforms.t1.!test.pattern: my-prefix-.*
>        transforms.t1.then: t2
>        transforms.t1.then.t2.type: ExtractField$Key
>        transforms.t1.then.t2.field: c1
>        transforms.t1.else: t3
>        transforms.t1.else.t3.type: ExtractField$Key
>        transforms.t1.else.t3.field: c2

>        Example of guarding a >1 SMT
>        Current:
>        transforms: t2,t3
>        transforms.t2.?predicate: !has-my-prefix
>        transforms.t2.type: ExtractField$Key
>        transforms.t2.field: c1
>        transforms.t3.?predicate: !has-my-prefix
>        transforms.t3.type: ExtractField$Value
>        transforms.t3.field: c2
>        ?predicates: has-my-prefix
>        ?predicates.has-my-prefix.type: TopicNameMatch
>        ?predicates.has-my-prefix.pattern: my-prefix-.*
>        Original:
>        transforms: t1
>        transforms.t1.type: If
>        transforms.t1.!test.type: TopicNameMatch
>        transforms.t1.!test.pattern: my-prefix-.*
>        transforms.t1.then: t2,t3
>        transforms.t1.then.t2.type: ExtractField$Key
>        transforms.t1.then.t2.field: c1
>        transforms.t1.then.t3.type: ExtractField$Value
>        transforms.t1.then.t3.field: c2

>        Notes:
>        * I'm assuming we would negate predicates in the Current proposal by
>        prefixing the predicate name with !
>        * I'm using "test" rather than "predicate" in the Original because it looks
>        a bit nicer having the same number of letters as "then" and "else".
>        * I'm negating the test in the Original by using "!test" rather than "test".

>        Those notes aside, I'd make the following assertions:

>        1. The Original examples require the same number of keys (or one less line
>        in the last example).
>        2. The Original examples are more guessable if you'd never seen this KIP
>        before. The SMT name (If) and its config parameter names ('test' and 'then'
>        and maybe also 'else') all give quite strong clues about what's happening.
>        I'd also argue that using !test is less cryptic than '?predicate', since
>        the purpose of the '!' fits with its use in most programming languages. The
>        purpose of the '?' in '?predicate' is to reduce the chance of a naming
>        collision, which is not something which people could guess.
>        3. The Original examples cannot have config parameters naming conflicts
>        with either Connectors or Transformations.

>        Although I don't think it's a strong argument I would also add:

>        4. Because the original syntax supports nesting of 'If' SMTs it's actually
>        more powerful.

>        So, I know people really didn't like the original suggestion, but if you
>        accept these assertions then the current proposal is worse. Can anyone
>        point to some objective ways in which the current proposal is better than
>        the original one?

>        If people really think the current proposal is better then I'm happy to go
>        with that, call a vote and see if it has any traction with people outside
>        this discussion. Ultimately I just want to get support for this feature
>        into Kafka, however it's configured.

>        Kind regards,

>        Tom

>        On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <an...@live.com>
>        wrote:

>        > Hi Tom,
>        > Referring back to Chris's points.
>        >
>        > 1) Exception handling - yes, agreed. This KIP can just say the exception
>        > handling for
>        > predicates is the same as transformations.
>        >
>        > 2) Worker plugins - agree too.
>        >
>        > 3) Filter SMT - If the Filter SMT is just becoming a transformation that
>        > drops any records that
>        > make it through the predicate, it's probably actually a Drop
>        > transformation. If it's
>        > not guarded with a predicate, it drops every record, but with a predicate
>        > it just
>        > drops records that match.
>        >
>        > 4) Syntax - I quite like the idea of a separate namespace for predicates,
>        > just like for transformations.
>        > I particularly like how you could have 2 transformations referencing the
>        > same predicate and config.
>        >
>        > There is the scope for collision with connector configuration, but my
>        > hunch is that
>        > "predicates" is less risky that simply "predicate" since it would only
>        > clash with a connector
>        > that happened to have multiple predicates.
>        >
>        > But I think there is a non-zero chance that an existing transformation
>        > might have used
>        > "predicate" as a config. I have actually written a filter transformation
>        > myself. It didn't use
>        > "predicate" but it almost did. That's why I suggested "?predicate" which I
>        > know is a bit
>        > cryptic, but it's also not going to clash.
>        >
>        > Here are two serving suggestions:
>        >
>        > A) "?predicate" to name the predicate alias to use
>        > transforms: t2
>        > transforms.t2.?predicate: has-my-prefix
>        > transforms.t2.type:ExtractField$Key
>        > transforms.t2.field: c1
>        >
>        > B) "?" to name the predicate alias to use
>        > transforms: t2
>        > transforms.t2.?: has-my-prefix
>        > transforms.t2.type:ExtractField$Key
>        > transforms.t2.field: c1
>        >
>        >
>        > One more question.
>        >
>        > Do people think the negation ought to be in the predicate config or in the
>        > referring transformation
>        > config? We might want to make:
>        >
>        >   predicate(a)=>transform(T1)
>        >   predicate(negate(a))=>transform(T2)
>        >
>        > easier than having to use two separate predicate aliases with identical
>        > configuration, except for
>        > the negation.
>        >
>        > Thanks,
>        > Andrew
>        >
>        > On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com> wrote:
>        >
>        >     Hi Chris,
>        >
>        >     Thanks for the feedback. On points 1 and 2 I agree.
>        >
>        >     3. This is a great point. I'm happy to add this to the KIP too, but
>        > note it
>        >     removes the need to add validate(Map<String, String>) to both
>        >     Transformation and Predicate. That doesn't mean we shouldn't still do
>        > it,
>        >     just that the change is not motivated by the features being added in
>        > the
>        >     KIP. So I've removed validate(Map<String, String>) for now.
>        >
>        >     4. Your proposal would further reduce the chance of Transformation
>        > config
>        >     conflict (because just the 'predicate' parameter might conflict), but
>        > it
>        >     also adds the chance of conflict with Connector parameters (all the
>        >     'predicates.*' parameters). Connector implementations are far more
>        > numerous
>        >     than Transformation implementations, so the risk is proportionately
>        >     greater. As you say we could try to reduce the chances of conflict
>        > (e.g.
>        >     '?predicate' parameter on transformations and '?predicates' prefix on
>        >     connectors), but I'm still left with uneasy feelings. I'd like to hear
>        >     about what others think before I change this one.
>        >
>        >     Cheers,
>        >
>        >     Tom
>        >
>        >
>        >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
>        > chrise@confluent.io>
>        >     wrote:
>        >
>        >     > Hi all,
>        >     >
>        >     > This has certainly taken quite a turn! I wish we could get this done
>        >     > without adding another pluggable interface but if the benefit is
>        > that now
>        >     > any SMT--new or pre-existing--can be conditionally applied, it seems
>        > like
>        >     > it might be worth it.
>        >     >
>        >     > As far as the proposed design goes, some thoughts:
>        >     >
>        >     > 1. There isn't a clear reason why exceptions thrown from predicates
>        > should
>        >     > be treated differently from exceptions thrown by transformations. We
>        > have
>        >     > reasonable error-tolerance mechanisms in the Connect framework; why
>        >     > silently swallow errors, especially if the risk is publishing
>        > potentially
>        >     > corrupted data to Kafka or an external system because an SMT that
>        > should
>        >     > have been applied wasn't?
>        >     >
>        >     > 2. It might be worth noting that the interface is a worker plugin,
>        > and that
>        >     > presumably it would be loaded in the same way as other worker
>        > plugins such
>        >     > as converters, connectors, and REST extensions. This would include
>        > aliasing
>        >     > behavior that would allow users to specify predicates using their
>        > simple
>        >     > class names as long as no two predicate plugins with the same simple
>        > name
>        >     > were available on the worker.
>        >     >
>        >     > 3. Why would the Filter SMT explicitly take in a predicate in its
>        >     > configuration if predicates can be applied to all SMTs? Just reading
>        > the
>        >     > idea for a filter SMT, it seemed like the behavior would be that the
>        > SMT
>        >     > would take in no configuration parameters and just blindly drop
>        > everything
>        >     > that it sees, but typical use would involve pairing it with a
>        > predicate.
>        >     >
>        >     > 4. The question mark syntax seems a little cryptic, and combining the
>        >     > configuration properties for the predicate and the transformation in
>        > the
>        >     > same namespace ("transforms.<name>.*) seems a little noisy. What do
>        > you
>        >     > think about allowing predicates to be configured in their own
>        > namespace,
>        >     > and referenced by name in a single "predicate" (or maybe
>        > "?predicate" if we
>        >     > really want to avoid risking backwards compatibility concerns)
>        > property?
>        >     > This would also enable them to be more easily reused across several
>        > SMTs.
>        >     >
>        >     > For example, you might configure a worker with these properties
>        > (assuming
>        >     > plugin aliasing is supported for predicates in the same way that it
>        > is for
>        >     > transformations):
>        >     >
>        >     > transforms: t2
>        >     > transforms.t2.predicate: has-my-prefix
>        >     > transforms.t2.type:ExtractField$Key
>        >     > transforms.t2.field: c1
>        >     >
>        >     > predicates: has-my-prefix
>        >     > predicates.has-my-prefix.type: TopicNameMatch
>        >     > predicates.has-my-prefix.negate: true
>        >     > predicates.has-my-prefix.pattern: my-prefix-.*
>        >     >
>        >     > Excited to see how this is evolving and I think we're heading towards
>        >     > something really useful for the Connect framework.
>        >     >
>        >     > Cheers,
>        >     >
>        >     > Chris
>        >     >
>        >     >
>        >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
>        > andrew_schofield@live.com
>        >     > >
>        >     > wrote:
>        >     >
>        >     > > Hi Tom,
>        >     > > Looks good to me.
>        >     > >
>        >     > > Thanks,
>        >     > > Andrew
>        >     > >
>        >     > > On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
>        >     > >
>        >     > >     Hi,
>        >     > >
>        >     > >     I've updated the KIP to reflect recent discussions. Please
>        > note the
>        >     > >     following:
>        >     > >
>        >     > >     1. I've described a HasHeaderKey predicate rather than
>        >     > HeaderKeyMatches
>        >     > >     because at the moment Headers doesn't expose the whole set of
>        >     > headers,
>        >     > > only
>        >     > >     those with a specific name. Exposing this would significantly
>        >     > increase
>        >     > > the
>        >     > >     scope of this KIP but relatively little extra benefit.
>        >     > >     2. I used org.apache.kafka.connect.transformer.predicates
>        > rather than
>        >     > >     org.apache.kafka.connect.predicates, since I thought it better
>        >     > > reflected
>        >     > >     the use of predicates within transforms. I'm flexible on this
>        > one if
>        >     > >     someone has a good case for the original name. For example the
>        >     > original
>        >     > >     package name would be more appropriate if we were expecting
>        >     > Connectors
>        >     > > to
>        >     > >     make use of Predicates somehow.
>        >     > >     3. I've dropped the special case of not needing to provide a
>        > FQN for
>        >     > >     predicate classes in that package, since this isn't something
>        >     > > supported by
>        >     > >     transformations themselves as far as I know. I like the idea,
>        > but it
>        >     > > seemed
>        >     > >     inconsistent to need it for transformations but not for
>        > predicates.
>        >     > >     4. Chris, I've used your suggestions for a validate() method
>        > for the
>        >     > > extra
>        >     > >     configurability needed. I've also added this to Predicate (even
>        >     > though
>        >     > > I
>        >     > >     don't need it) for consistency with Connector and
>        > Transformation.
>        >     > >     5. For negating the result of a predicate I decided it was
>        > clearer to
>        >     > > just
>        >     > >     have a "negate" config parameter, supported by all predicates,
>        > than
>        >     > to
>        >     > >     reach for more punctuation.
>        >     > >
>        >     > >     I'd be grateful for any more feedback people might have.
>        >     > >
>        >     > >     Kind regards,
>        >     > >
>        >     > >     Tom
>        >     > >
>        >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
>        >     > > andrew_schofield@live.com>
>        >     > >     wrote:
>        >     > >
>        >     > >     > Hi Tom,
>        >     > >     > I think we should go for a consistent naming convention for
>        > the
>        >     > >     > predicates, maybe so they
>        >     > >     > read nicely if you say "IF" first. Here's a suggestion for
>        >     > > predicates that
>        >     > >     > the KIP could introduce:
>        >     > >     >  - TopicNameMatches
>        >     > >     >  - HeaderKeyMatches
>        >     > >     >  - RecordIsTombstone
>        >     > >     >
>        >     > >     > On naming, I also suggest using
>        >     > >     > org.apache.kafka.connect.predicates.Predicate
>        >     > >     > as the name for the interface.
>        >     > >     >
>        >     > >     > I hadn't settled on a preference for negation. I suppose
>        > there are
>        >     > > three
>        >     > >     > results
>        >     > >     > from evaluating a predicate - true, false or an exception
>        > caught by
>        >     > > Kafka
>        >     > >     > Connect.
>        >     > >     > I suggest the exception case is always a predicate fail so
>        > the
>        >     > > guarded
>        >     > >     > transformation
>        >     > >     > is skipped. Initially I'd preferred having the predicate be
>        >     > > configured to
>        >     > >     > return a
>        >     > >     > negated result to give a "not match" behaviour. Thinking
>        > again
>        >     > now, I
>        >     > >     > prefer something like:
>        >     > >     >
>        >     > >     > transforms: t1
>        >     > >     > transforms.t1.?type: RecordIsTombstone
>        >     > >     > transforms.t1.type:
>        >     > > org.apache.kafka.connect.transforms.InsertField$Value
>        >     > >     > transforms.t1.static.field: deleted
>        >     > >     > transforms.t1.static.value: true
>        >     > >     >
>        >     > >     > The "?" means the transform guarded by the predicate runs in
>        > the
>        >     > > Predicate
>        >     > >     > returns true.
>        >     > >     >
>        >     > >     > transforms: t2
>        >     > >     > transforms.t2.?!type:
>        >     > > org.apache.kafka.connect.predicates.TopicNameMatch
>        >     > >     > transforms.t2.?regex: my-prefix-.*
>        >     > >     > transforms.t2.type:
>        >     > > org.apache.kafka.connect.transforms.ExtractField$Key
>        >     > >     > transforms.t2.field: c1
>        >     > >     >
>        >     > >     > The "?!" means the transform guarded by the predicate runs
>        > if the
>        >     > > Predicate
>        >     > >     > returns false.
>        >     > >     >
>        >     > >     > I think adding a Filter SMT that uses a Predicate is a nice
>        > idea.
>        >     > >     >
>        >     > >     > Cheers,
>        >     > >     > Andrew
>        >     > >     >
>        >     >
>        >     >
>        >
>        >



Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi Tom,
I think that either proposal is sufficiently readable and they both have elements of
cryptic syntax. I remember talking to an engineering director once who said he didn't hire
"curly braces people" any more. This is an interface for curly braces people whichever
way we go here.

I slightly prefer the predicates as a top-level concept rather than as a feature
of the If transform. I also don't like the namespacing behaviour of the If transform which
I guess works like variable declaration so each scope has its own namespace of transform
aliases. I expect the following is valid but ill-advised.

    transforms: t1
    transforms.t1.type: If
    transforms.t1.!test.type: TopicNameMatch
    transforms.t1.!test.pattern: my-prefix-.*
    transforms.t1.then: t1
    transforms.t1.then.t1.type: ExtractField$Key
    transforms.t1.then.t1.field: c1

I would love to see conditional application of transforms in SMT in Kafka 2.6.
Let's see what other people think and get a consensus view on how to spell it.

Cheers,
Andrew

On 05/05/2020, 15:20, "Tom Bentley" <tb...@redhat.com> wrote:

    Hi,

    Again, reusing the existing numbering...

    3) I had exactly the same thought. The reason I didn't already rename it
    was because :

    * Filter is not exactly wrong (by default it just filters out all the
    messages)
    * Filter is a useful hint about how it's intended to be used.

    But I don't have a strong opinion, so I'm happy to rename it if people
    think that Drop is better.

    4) Putting the negation on the use site–rather than the declaration
    site–seems like a good idea, since it makes it easier to get 'else'-like
    behaviour.

    But, playing devil's advocate, I wondered how all this compares with the
    original proposal, so I went to the trouble of writing out a few examples
    to compare the two

    Example of a basic 'if'
    Current proposal:
    transforms: t2
    transforms.t2.?predicate: !has-my-prefix
    transforms.t2.type: ExtractField$Key
    transforms.t2.field: c1
    ?predicates: has-my-prefix
    ?predicates.has-my-prefix.type: TopicNameMatch
    ?predicates.has-my-prefix.pattern: my-prefix-.*
    Original (or originalish, see notes below):
    transforms: t1
    transforms.t1.type: If
    transforms.t1.!test.type: TopicNameMatch
    transforms.t1.!test.pattern: my-prefix-.*
    transforms.t1.then: t2
    transforms.t1.then.t2.type: ExtractField$Key
    transforms.t1.then.t2.field: c1

    Example of 'if/else':
    Current:
    transforms: t2,t3
    transforms.t2.?predicate: !has-my-prefix
    transforms.t2.type: ExtractField$Key
    transforms.t2.field: c1
    transforms.t3.?predicate: has-my-prefix
    transforms.t3.type: ExtractField$Key
    transforms.t3.field: c2
    ?predicates: has-my-prefix
    ?predicates.has-my-prefix.type: TopicNameMatch
    ?predicates.has-my-prefix.pattern: my-prefix-.*
    Original:
    transforms: t1
    transforms.t1.type: If
    transforms.t1.!test.type: TopicNameMatch
    transforms.t1.!test.pattern: my-prefix-.*
    transforms.t1.then: t2
    transforms.t1.then.t2.type: ExtractField$Key
    transforms.t1.then.t2.field: c1
    transforms.t1.else: t3
    transforms.t1.else.t3.type: ExtractField$Key
    transforms.t1.else.t3.field: c2

    Example of guarding a >1 SMT
    Current:
    transforms: t2,t3
    transforms.t2.?predicate: !has-my-prefix
    transforms.t2.type: ExtractField$Key
    transforms.t2.field: c1
    transforms.t3.?predicate: !has-my-prefix
    transforms.t3.type: ExtractField$Value
    transforms.t3.field: c2
    ?predicates: has-my-prefix
    ?predicates.has-my-prefix.type: TopicNameMatch
    ?predicates.has-my-prefix.pattern: my-prefix-.*
    Original:
    transforms: t1
    transforms.t1.type: If
    transforms.t1.!test.type: TopicNameMatch
    transforms.t1.!test.pattern: my-prefix-.*
    transforms.t1.then: t2,t3
    transforms.t1.then.t2.type: ExtractField$Key
    transforms.t1.then.t2.field: c1
    transforms.t1.then.t3.type: ExtractField$Value
    transforms.t1.then.t3.field: c2

    Notes:
    * I'm assuming we would negate predicates in the Current proposal by
    prefixing the predicate name with !
    * I'm using "test" rather than "predicate" in the Original because it looks
    a bit nicer having the same number of letters as "then" and "else".
    * I'm negating the test in the Original by using "!test" rather than "test".

    Those notes aside, I'd make the following assertions:

    1. The Original examples require the same number of keys (or one less line
    in the last example).
    2. The Original examples are more guessable if you'd never seen this KIP
    before. The SMT name (If) and its config parameter names ('test' and 'then'
    and maybe also 'else') all give quite strong clues about what's happening.
    I'd also argue that using !test is less cryptic than '?predicate', since
    the purpose of the '!' fits with its use in most programming languages. The
    purpose of the '?' in '?predicate' is to reduce the chance of a naming
    collision, which is not something which people could guess.
    3. The Original examples cannot have config parameters naming conflicts
    with either Connectors or Transformations.

    Although I don't think it's a strong argument I would also add:

    4. Because the original syntax supports nesting of 'If' SMTs it's actually
    more powerful.

    So, I know people really didn't like the original suggestion, but if you
    accept these assertions then the current proposal is worse. Can anyone
    point to some objective ways in which the current proposal is better than
    the original one?

    If people really think the current proposal is better then I'm happy to go
    with that, call a vote and see if it has any traction with people outside
    this discussion. Ultimately I just want to get support for this feature
    into Kafka, however it's configured.

    Kind regards,

    Tom

    On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <an...@live.com>
    wrote:

    > Hi Tom,
    > Referring back to Chris's points.
    >
    > 1) Exception handling - yes, agreed. This KIP can just say the exception
    > handling for
    > predicates is the same as transformations.
    >
    > 2) Worker plugins - agree too.
    >
    > 3) Filter SMT - If the Filter SMT is just becoming a transformation that
    > drops any records that
    > make it through the predicate, it's probably actually a Drop
    > transformation. If it's
    > not guarded with a predicate, it drops every record, but with a predicate
    > it just
    > drops records that match.
    >
    > 4) Syntax - I quite like the idea of a separate namespace for predicates,
    > just like for transformations.
    > I particularly like how you could have 2 transformations referencing the
    > same predicate and config.
    >
    > There is the scope for collision with connector configuration, but my
    > hunch is that
    > "predicates" is less risky that simply "predicate" since it would only
    > clash with a connector
    > that happened to have multiple predicates.
    >
    > But I think there is a non-zero chance that an existing transformation
    > might have used
    > "predicate" as a config. I have actually written a filter transformation
    > myself. It didn't use
    > "predicate" but it almost did. That's why I suggested "?predicate" which I
    > know is a bit
    > cryptic, but it's also not going to clash.
    >
    > Here are two serving suggestions:
    >
    > A) "?predicate" to name the predicate alias to use
    > transforms: t2
    > transforms.t2.?predicate: has-my-prefix
    > transforms.t2.type:ExtractField$Key
    > transforms.t2.field: c1
    >
    > B) "?" to name the predicate alias to use
    > transforms: t2
    > transforms.t2.?: has-my-prefix
    > transforms.t2.type:ExtractField$Key
    > transforms.t2.field: c1
    >
    >
    > One more question.
    >
    > Do people think the negation ought to be in the predicate config or in the
    > referring transformation
    > config? We might want to make:
    >
    >   predicate(a)=>transform(T1)
    >   predicate(negate(a))=>transform(T2)
    >
    > easier than having to use two separate predicate aliases with identical
    > configuration, except for
    > the negation.
    >
    > Thanks,
    > Andrew
    >
    > On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com> wrote:
    >
    >     Hi Chris,
    >
    >     Thanks for the feedback. On points 1 and 2 I agree.
    >
    >     3. This is a great point. I'm happy to add this to the KIP too, but
    > note it
    >     removes the need to add validate(Map<String, String>) to both
    >     Transformation and Predicate. That doesn't mean we shouldn't still do
    > it,
    >     just that the change is not motivated by the features being added in
    > the
    >     KIP. So I've removed validate(Map<String, String>) for now.
    >
    >     4. Your proposal would further reduce the chance of Transformation
    > config
    >     conflict (because just the 'predicate' parameter might conflict), but
    > it
    >     also adds the chance of conflict with Connector parameters (all the
    >     'predicates.*' parameters). Connector implementations are far more
    > numerous
    >     than Transformation implementations, so the risk is proportionately
    >     greater. As you say we could try to reduce the chances of conflict
    > (e.g.
    >     '?predicate' parameter on transformations and '?predicates' prefix on
    >     connectors), but I'm still left with uneasy feelings. I'd like to hear
    >     about what others think before I change this one.
    >
    >     Cheers,
    >
    >     Tom
    >
    >
    >     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
    > chrise@confluent.io>
    >     wrote:
    >
    >     > Hi all,
    >     >
    >     > This has certainly taken quite a turn! I wish we could get this done
    >     > without adding another pluggable interface but if the benefit is
    > that now
    >     > any SMT--new or pre-existing--can be conditionally applied, it seems
    > like
    >     > it might be worth it.
    >     >
    >     > As far as the proposed design goes, some thoughts:
    >     >
    >     > 1. There isn't a clear reason why exceptions thrown from predicates
    > should
    >     > be treated differently from exceptions thrown by transformations. We
    > have
    >     > reasonable error-tolerance mechanisms in the Connect framework; why
    >     > silently swallow errors, especially if the risk is publishing
    > potentially
    >     > corrupted data to Kafka or an external system because an SMT that
    > should
    >     > have been applied wasn't?
    >     >
    >     > 2. It might be worth noting that the interface is a worker plugin,
    > and that
    >     > presumably it would be loaded in the same way as other worker
    > plugins such
    >     > as converters, connectors, and REST extensions. This would include
    > aliasing
    >     > behavior that would allow users to specify predicates using their
    > simple
    >     > class names as long as no two predicate plugins with the same simple
    > name
    >     > were available on the worker.
    >     >
    >     > 3. Why would the Filter SMT explicitly take in a predicate in its
    >     > configuration if predicates can be applied to all SMTs? Just reading
    > the
    >     > idea for a filter SMT, it seemed like the behavior would be that the
    > SMT
    >     > would take in no configuration parameters and just blindly drop
    > everything
    >     > that it sees, but typical use would involve pairing it with a
    > predicate.
    >     >
    >     > 4. The question mark syntax seems a little cryptic, and combining the
    >     > configuration properties for the predicate and the transformation in
    > the
    >     > same namespace ("transforms.<name>.*) seems a little noisy. What do
    > you
    >     > think about allowing predicates to be configured in their own
    > namespace,
    >     > and referenced by name in a single "predicate" (or maybe
    > "?predicate" if we
    >     > really want to avoid risking backwards compatibility concerns)
    > property?
    >     > This would also enable them to be more easily reused across several
    > SMTs.
    >     >
    >     > For example, you might configure a worker with these properties
    > (assuming
    >     > plugin aliasing is supported for predicates in the same way that it
    > is for
    >     > transformations):
    >     >
    >     > transforms: t2
    >     > transforms.t2.predicate: has-my-prefix
    >     > transforms.t2.type:ExtractField$Key
    >     > transforms.t2.field: c1
    >     >
    >     > predicates: has-my-prefix
    >     > predicates.has-my-prefix.type: TopicNameMatch
    >     > predicates.has-my-prefix.negate: true
    >     > predicates.has-my-prefix.pattern: my-prefix-.*
    >     >
    >     > Excited to see how this is evolving and I think we're heading towards
    >     > something really useful for the Connect framework.
    >     >
    >     > Cheers,
    >     >
    >     > Chris
    >     >
    >     >
    >     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
    > andrew_schofield@live.com
    >     > >
    >     > wrote:
    >     >
    >     > > Hi Tom,
    >     > > Looks good to me.
    >     > >
    >     > > Thanks,
    >     > > Andrew
    >     > >
    >     > > On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
    >     > >
    >     > >     Hi,
    >     > >
    >     > >     I've updated the KIP to reflect recent discussions. Please
    > note the
    >     > >     following:
    >     > >
    >     > >     1. I've described a HasHeaderKey predicate rather than
    >     > HeaderKeyMatches
    >     > >     because at the moment Headers doesn't expose the whole set of
    >     > headers,
    >     > > only
    >     > >     those with a specific name. Exposing this would significantly
    >     > increase
    >     > > the
    >     > >     scope of this KIP but relatively little extra benefit.
    >     > >     2. I used org.apache.kafka.connect.transformer.predicates
    > rather than
    >     > >     org.apache.kafka.connect.predicates, since I thought it better
    >     > > reflected
    >     > >     the use of predicates within transforms. I'm flexible on this
    > one if
    >     > >     someone has a good case for the original name. For example the
    >     > original
    >     > >     package name would be more appropriate if we were expecting
    >     > Connectors
    >     > > to
    >     > >     make use of Predicates somehow.
    >     > >     3. I've dropped the special case of not needing to provide a
    > FQN for
    >     > >     predicate classes in that package, since this isn't something
    >     > > supported by
    >     > >     transformations themselves as far as I know. I like the idea,
    > but it
    >     > > seemed
    >     > >     inconsistent to need it for transformations but not for
    > predicates.
    >     > >     4. Chris, I've used your suggestions for a validate() method
    > for the
    >     > > extra
    >     > >     configurability needed. I've also added this to Predicate (even
    >     > though
    >     > > I
    >     > >     don't need it) for consistency with Connector and
    > Transformation.
    >     > >     5. For negating the result of a predicate I decided it was
    > clearer to
    >     > > just
    >     > >     have a "negate" config parameter, supported by all predicates,
    > than
    >     > to
    >     > >     reach for more punctuation.
    >     > >
    >     > >     I'd be grateful for any more feedback people might have.
    >     > >
    >     > >     Kind regards,
    >     > >
    >     > >     Tom
    >     > >
    >     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
    >     > > andrew_schofield@live.com>
    >     > >     wrote:
    >     > >
    >     > >     > Hi Tom,
    >     > >     > I think we should go for a consistent naming convention for
    > the
    >     > >     > predicates, maybe so they
    >     > >     > read nicely if you say "IF" first. Here's a suggestion for
    >     > > predicates that
    >     > >     > the KIP could introduce:
    >     > >     >  - TopicNameMatches
    >     > >     >  - HeaderKeyMatches
    >     > >     >  - RecordIsTombstone
    >     > >     >
    >     > >     > On naming, I also suggest using
    >     > >     > org.apache.kafka.connect.predicates.Predicate
    >     > >     > as the name for the interface.
    >     > >     >
    >     > >     > I hadn't settled on a preference for negation. I suppose
    > there are
    >     > > three
    >     > >     > results
    >     > >     > from evaluating a predicate - true, false or an exception
    > caught by
    >     > > Kafka
    >     > >     > Connect.
    >     > >     > I suggest the exception case is always a predicate fail so
    > the
    >     > > guarded
    >     > >     > transformation
    >     > >     > is skipped. Initially I'd preferred having the predicate be
    >     > > configured to
    >     > >     > return a
    >     > >     > negated result to give a "not match" behaviour. Thinking
    > again
    >     > now, I
    >     > >     > prefer something like:
    >     > >     >
    >     > >     > transforms: t1
    >     > >     > transforms.t1.?type: RecordIsTombstone
    >     > >     > transforms.t1.type:
    >     > > org.apache.kafka.connect.transforms.InsertField$Value
    >     > >     > transforms.t1.static.field: deleted
    >     > >     > transforms.t1.static.value: true
    >     > >     >
    >     > >     > The "?" means the transform guarded by the predicate runs in
    > the
    >     > > Predicate
    >     > >     > returns true.
    >     > >     >
    >     > >     > transforms: t2
    >     > >     > transforms.t2.?!type:
    >     > > org.apache.kafka.connect.predicates.TopicNameMatch
    >     > >     > transforms.t2.?regex: my-prefix-.*
    >     > >     > transforms.t2.type:
    >     > > org.apache.kafka.connect.transforms.ExtractField$Key
    >     > >     > transforms.t2.field: c1
    >     > >     >
    >     > >     > The "?!" means the transform guarded by the predicate runs
    > if the
    >     > > Predicate
    >     > >     > returns false.
    >     > >     >
    >     > >     > I think adding a Filter SMT that uses a Predicate is a nice
    > idea.
    >     > >     >
    >     > >     > Cheers,
    >     > >     > Andrew
    >     > >     >
    >     >
    >     >
    >
    >


Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi,

Again, reusing the existing numbering...

3) I had exactly the same thought. The reason I didn't already rename it
was because :

* Filter is not exactly wrong (by default it just filters out all the
messages)
* Filter is a useful hint about how it's intended to be used.

But I don't have a strong opinion, so I'm happy to rename it if people
think that Drop is better.

4) Putting the negation on the use site–rather than the declaration
site–seems like a good idea, since it makes it easier to get 'else'-like
behaviour.

But, playing devil's advocate, I wondered how all this compares with the
original proposal, so I went to the trouble of writing out a few examples
to compare the two

Example of a basic 'if'
Current proposal:
transforms: t2
transforms.t2.?predicate: !has-my-prefix
transforms.t2.type: ExtractField$Key
transforms.t2.field: c1
?predicates: has-my-prefix
?predicates.has-my-prefix.type: TopicNameMatch
?predicates.has-my-prefix.pattern: my-prefix-.*
Original (or originalish, see notes below):
transforms: t1
transforms.t1.type: If
transforms.t1.!test.type: TopicNameMatch
transforms.t1.!test.pattern: my-prefix-.*
transforms.t1.then: t2
transforms.t1.then.t2.type: ExtractField$Key
transforms.t1.then.t2.field: c1

Example of 'if/else':
Current:
transforms: t2,t3
transforms.t2.?predicate: !has-my-prefix
transforms.t2.type: ExtractField$Key
transforms.t2.field: c1
transforms.t3.?predicate: has-my-prefix
transforms.t3.type: ExtractField$Key
transforms.t3.field: c2
?predicates: has-my-prefix
?predicates.has-my-prefix.type: TopicNameMatch
?predicates.has-my-prefix.pattern: my-prefix-.*
Original:
transforms: t1
transforms.t1.type: If
transforms.t1.!test.type: TopicNameMatch
transforms.t1.!test.pattern: my-prefix-.*
transforms.t1.then: t2
transforms.t1.then.t2.type: ExtractField$Key
transforms.t1.then.t2.field: c1
transforms.t1.else: t3
transforms.t1.else.t3.type: ExtractField$Key
transforms.t1.else.t3.field: c2

Example of guarding a >1 SMT
Current:
transforms: t2,t3
transforms.t2.?predicate: !has-my-prefix
transforms.t2.type: ExtractField$Key
transforms.t2.field: c1
transforms.t3.?predicate: !has-my-prefix
transforms.t3.type: ExtractField$Value
transforms.t3.field: c2
?predicates: has-my-prefix
?predicates.has-my-prefix.type: TopicNameMatch
?predicates.has-my-prefix.pattern: my-prefix-.*
Original:
transforms: t1
transforms.t1.type: If
transforms.t1.!test.type: TopicNameMatch
transforms.t1.!test.pattern: my-prefix-.*
transforms.t1.then: t2,t3
transforms.t1.then.t2.type: ExtractField$Key
transforms.t1.then.t2.field: c1
transforms.t1.then.t3.type: ExtractField$Value
transforms.t1.then.t3.field: c2

Notes:
* I'm assuming we would negate predicates in the Current proposal by
prefixing the predicate name with !
* I'm using "test" rather than "predicate" in the Original because it looks
a bit nicer having the same number of letters as "then" and "else".
* I'm negating the test in the Original by using "!test" rather than "test".

Those notes aside, I'd make the following assertions:

1. The Original examples require the same number of keys (or one less line
in the last example).
2. The Original examples are more guessable if you'd never seen this KIP
before. The SMT name (If) and its config parameter names ('test' and 'then'
and maybe also 'else') all give quite strong clues about what's happening.
I'd also argue that using !test is less cryptic than '?predicate', since
the purpose of the '!' fits with its use in most programming languages. The
purpose of the '?' in '?predicate' is to reduce the chance of a naming
collision, which is not something which people could guess.
3. The Original examples cannot have config parameters naming conflicts
with either Connectors or Transformations.

Although I don't think it's a strong argument I would also add:

4. Because the original syntax supports nesting of 'If' SMTs it's actually
more powerful.

So, I know people really didn't like the original suggestion, but if you
accept these assertions then the current proposal is worse. Can anyone
point to some objective ways in which the current proposal is better than
the original one?

If people really think the current proposal is better then I'm happy to go
with that, call a vote and see if it has any traction with people outside
this discussion. Ultimately I just want to get support for this feature
into Kafka, however it's configured.

Kind regards,

Tom

On Mon, May 4, 2020 at 12:35 PM Andrew Schofield <an...@live.com>
wrote:

> Hi Tom,
> Referring back to Chris's points.
>
> 1) Exception handling - yes, agreed. This KIP can just say the exception
> handling for
> predicates is the same as transformations.
>
> 2) Worker plugins - agree too.
>
> 3) Filter SMT - If the Filter SMT is just becoming a transformation that
> drops any records that
> make it through the predicate, it's probably actually a Drop
> transformation. If it's
> not guarded with a predicate, it drops every record, but with a predicate
> it just
> drops records that match.
>
> 4) Syntax - I quite like the idea of a separate namespace for predicates,
> just like for transformations.
> I particularly like how you could have 2 transformations referencing the
> same predicate and config.
>
> There is the scope for collision with connector configuration, but my
> hunch is that
> "predicates" is less risky that simply "predicate" since it would only
> clash with a connector
> that happened to have multiple predicates.
>
> But I think there is a non-zero chance that an existing transformation
> might have used
> "predicate" as a config. I have actually written a filter transformation
> myself. It didn't use
> "predicate" but it almost did. That's why I suggested "?predicate" which I
> know is a bit
> cryptic, but it's also not going to clash.
>
> Here are two serving suggestions:
>
> A) "?predicate" to name the predicate alias to use
> transforms: t2
> transforms.t2.?predicate: has-my-prefix
> transforms.t2.type:ExtractField$Key
> transforms.t2.field: c1
>
> B) "?" to name the predicate alias to use
> transforms: t2
> transforms.t2.?: has-my-prefix
> transforms.t2.type:ExtractField$Key
> transforms.t2.field: c1
>
>
> One more question.
>
> Do people think the negation ought to be in the predicate config or in the
> referring transformation
> config? We might want to make:
>
>   predicate(a)=>transform(T1)
>   predicate(negate(a))=>transform(T2)
>
> easier than having to use two separate predicate aliases with identical
> configuration, except for
> the negation.
>
> Thanks,
> Andrew
>
> On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com> wrote:
>
>     Hi Chris,
>
>     Thanks for the feedback. On points 1 and 2 I agree.
>
>     3. This is a great point. I'm happy to add this to the KIP too, but
> note it
>     removes the need to add validate(Map<String, String>) to both
>     Transformation and Predicate. That doesn't mean we shouldn't still do
> it,
>     just that the change is not motivated by the features being added in
> the
>     KIP. So I've removed validate(Map<String, String>) for now.
>
>     4. Your proposal would further reduce the chance of Transformation
> config
>     conflict (because just the 'predicate' parameter might conflict), but
> it
>     also adds the chance of conflict with Connector parameters (all the
>     'predicates.*' parameters). Connector implementations are far more
> numerous
>     than Transformation implementations, so the risk is proportionately
>     greater. As you say we could try to reduce the chances of conflict
> (e.g.
>     '?predicate' parameter on transformations and '?predicates' prefix on
>     connectors), but I'm still left with uneasy feelings. I'd like to hear
>     about what others think before I change this one.
>
>     Cheers,
>
>     Tom
>
>
>     On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <
> chrise@confluent.io>
>     wrote:
>
>     > Hi all,
>     >
>     > This has certainly taken quite a turn! I wish we could get this done
>     > without adding another pluggable interface but if the benefit is
> that now
>     > any SMT--new or pre-existing--can be conditionally applied, it seems
> like
>     > it might be worth it.
>     >
>     > As far as the proposed design goes, some thoughts:
>     >
>     > 1. There isn't a clear reason why exceptions thrown from predicates
> should
>     > be treated differently from exceptions thrown by transformations. We
> have
>     > reasonable error-tolerance mechanisms in the Connect framework; why
>     > silently swallow errors, especially if the risk is publishing
> potentially
>     > corrupted data to Kafka or an external system because an SMT that
> should
>     > have been applied wasn't?
>     >
>     > 2. It might be worth noting that the interface is a worker plugin,
> and that
>     > presumably it would be loaded in the same way as other worker
> plugins such
>     > as converters, connectors, and REST extensions. This would include
> aliasing
>     > behavior that would allow users to specify predicates using their
> simple
>     > class names as long as no two predicate plugins with the same simple
> name
>     > were available on the worker.
>     >
>     > 3. Why would the Filter SMT explicitly take in a predicate in its
>     > configuration if predicates can be applied to all SMTs? Just reading
> the
>     > idea for a filter SMT, it seemed like the behavior would be that the
> SMT
>     > would take in no configuration parameters and just blindly drop
> everything
>     > that it sees, but typical use would involve pairing it with a
> predicate.
>     >
>     > 4. The question mark syntax seems a little cryptic, and combining the
>     > configuration properties for the predicate and the transformation in
> the
>     > same namespace ("transforms.<name>.*) seems a little noisy. What do
> you
>     > think about allowing predicates to be configured in their own
> namespace,
>     > and referenced by name in a single "predicate" (or maybe
> "?predicate" if we
>     > really want to avoid risking backwards compatibility concerns)
> property?
>     > This would also enable them to be more easily reused across several
> SMTs.
>     >
>     > For example, you might configure a worker with these properties
> (assuming
>     > plugin aliasing is supported for predicates in the same way that it
> is for
>     > transformations):
>     >
>     > transforms: t2
>     > transforms.t2.predicate: has-my-prefix
>     > transforms.t2.type:ExtractField$Key
>     > transforms.t2.field: c1
>     >
>     > predicates: has-my-prefix
>     > predicates.has-my-prefix.type: TopicNameMatch
>     > predicates.has-my-prefix.negate: true
>     > predicates.has-my-prefix.pattern: my-prefix-.*
>     >
>     > Excited to see how this is evolving and I think we're heading towards
>     > something really useful for the Connect framework.
>     >
>     > Cheers,
>     >
>     > Chris
>     >
>     >
>     > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <
> andrew_schofield@live.com
>     > >
>     > wrote:
>     >
>     > > Hi Tom,
>     > > Looks good to me.
>     > >
>     > > Thanks,
>     > > Andrew
>     > >
>     > > On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
>     > >
>     > >     Hi,
>     > >
>     > >     I've updated the KIP to reflect recent discussions. Please
> note the
>     > >     following:
>     > >
>     > >     1. I've described a HasHeaderKey predicate rather than
>     > HeaderKeyMatches
>     > >     because at the moment Headers doesn't expose the whole set of
>     > headers,
>     > > only
>     > >     those with a specific name. Exposing this would significantly
>     > increase
>     > > the
>     > >     scope of this KIP but relatively little extra benefit.
>     > >     2. I used org.apache.kafka.connect.transformer.predicates
> rather than
>     > >     org.apache.kafka.connect.predicates, since I thought it better
>     > > reflected
>     > >     the use of predicates within transforms. I'm flexible on this
> one if
>     > >     someone has a good case for the original name. For example the
>     > original
>     > >     package name would be more appropriate if we were expecting
>     > Connectors
>     > > to
>     > >     make use of Predicates somehow.
>     > >     3. I've dropped the special case of not needing to provide a
> FQN for
>     > >     predicate classes in that package, since this isn't something
>     > > supported by
>     > >     transformations themselves as far as I know. I like the idea,
> but it
>     > > seemed
>     > >     inconsistent to need it for transformations but not for
> predicates.
>     > >     4. Chris, I've used your suggestions for a validate() method
> for the
>     > > extra
>     > >     configurability needed. I've also added this to Predicate (even
>     > though
>     > > I
>     > >     don't need it) for consistency with Connector and
> Transformation.
>     > >     5. For negating the result of a predicate I decided it was
> clearer to
>     > > just
>     > >     have a "negate" config parameter, supported by all predicates,
> than
>     > to
>     > >     reach for more punctuation.
>     > >
>     > >     I'd be grateful for any more feedback people might have.
>     > >
>     > >     Kind regards,
>     > >
>     > >     Tom
>     > >
>     > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
>     > > andrew_schofield@live.com>
>     > >     wrote:
>     > >
>     > >     > Hi Tom,
>     > >     > I think we should go for a consistent naming convention for
> the
>     > >     > predicates, maybe so they
>     > >     > read nicely if you say "IF" first. Here's a suggestion for
>     > > predicates that
>     > >     > the KIP could introduce:
>     > >     >  - TopicNameMatches
>     > >     >  - HeaderKeyMatches
>     > >     >  - RecordIsTombstone
>     > >     >
>     > >     > On naming, I also suggest using
>     > >     > org.apache.kafka.connect.predicates.Predicate
>     > >     > as the name for the interface.
>     > >     >
>     > >     > I hadn't settled on a preference for negation. I suppose
> there are
>     > > three
>     > >     > results
>     > >     > from evaluating a predicate - true, false or an exception
> caught by
>     > > Kafka
>     > >     > Connect.
>     > >     > I suggest the exception case is always a predicate fail so
> the
>     > > guarded
>     > >     > transformation
>     > >     > is skipped. Initially I'd preferred having the predicate be
>     > > configured to
>     > >     > return a
>     > >     > negated result to give a "not match" behaviour. Thinking
> again
>     > now, I
>     > >     > prefer something like:
>     > >     >
>     > >     > transforms: t1
>     > >     > transforms.t1.?type: RecordIsTombstone
>     > >     > transforms.t1.type:
>     > > org.apache.kafka.connect.transforms.InsertField$Value
>     > >     > transforms.t1.static.field: deleted
>     > >     > transforms.t1.static.value: true
>     > >     >
>     > >     > The "?" means the transform guarded by the predicate runs in
> the
>     > > Predicate
>     > >     > returns true.
>     > >     >
>     > >     > transforms: t2
>     > >     > transforms.t2.?!type:
>     > > org.apache.kafka.connect.predicates.TopicNameMatch
>     > >     > transforms.t2.?regex: my-prefix-.*
>     > >     > transforms.t2.type:
>     > > org.apache.kafka.connect.transforms.ExtractField$Key
>     > >     > transforms.t2.field: c1
>     > >     >
>     > >     > The "?!" means the transform guarded by the predicate runs
> if the
>     > > Predicate
>     > >     > returns false.
>     > >     >
>     > >     > I think adding a Filter SMT that uses a Predicate is a nice
> idea.
>     > >     >
>     > >     > Cheers,
>     > >     > Andrew
>     > >     >
>     >
>     >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi Tom,
Referring back to Chris's points.

1) Exception handling - yes, agreed. This KIP can just say the exception handling for
predicates is the same as transformations.

2) Worker plugins - agree too.

3) Filter SMT - If the Filter SMT is just becoming a transformation that drops any records that
make it through the predicate, it's probably actually a Drop transformation. If it's
not guarded with a predicate, it drops every record, but with a predicate it just
drops records that match.

4) Syntax - I quite like the idea of a separate namespace for predicates, just like for transformations.
I particularly like how you could have 2 transformations referencing the same predicate and config.

There is the scope for collision with connector configuration, but my hunch is that
"predicates" is less risky that simply "predicate" since it would only clash with a connector
that happened to have multiple predicates.

But I think there is a non-zero chance that an existing transformation might have used
"predicate" as a config. I have actually written a filter transformation myself. It didn't use
"predicate" but it almost did. That's why I suggested "?predicate" which I know is a bit
cryptic, but it's also not going to clash.

Here are two serving suggestions:

A) "?predicate" to name the predicate alias to use
transforms: t2
transforms.t2.?predicate: has-my-prefix
transforms.t2.type:ExtractField$Key
transforms.t2.field: c1

B) "?" to name the predicate alias to use
transforms: t2
transforms.t2.?: has-my-prefix
transforms.t2.type:ExtractField$Key
transforms.t2.field: c1


One more question.

Do people think the negation ought to be in the predicate config or in the referring transformation
config? We might want to make:

  predicate(a)=>transform(T1)
  predicate(negate(a))=>transform(T2)

easier than having to use two separate predicate aliases with identical configuration, except for
the negation.

Thanks,
Andrew

On 04/05/2020, 10:22, "Tom Bentley" <tb...@redhat.com> wrote:

    Hi Chris,

    Thanks for the feedback. On points 1 and 2 I agree.

    3. This is a great point. I'm happy to add this to the KIP too, but note it
    removes the need to add validate(Map<String, String>) to both
    Transformation and Predicate. That doesn't mean we shouldn't still do it,
    just that the change is not motivated by the features being added in the
    KIP. So I've removed validate(Map<String, String>) for now.

    4. Your proposal would further reduce the chance of Transformation config
    conflict (because just the 'predicate' parameter might conflict), but it
    also adds the chance of conflict with Connector parameters (all the
    'predicates.*' parameters). Connector implementations are far more numerous
    than Transformation implementations, so the risk is proportionately
    greater. As you say we could try to reduce the chances of conflict (e.g.
    '?predicate' parameter on transformations and '?predicates' prefix on
    connectors), but I'm still left with uneasy feelings. I'd like to hear
    about what others think before I change this one.

    Cheers,

    Tom


    On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <ch...@confluent.io>
    wrote:

    > Hi all,
    >
    > This has certainly taken quite a turn! I wish we could get this done
    > without adding another pluggable interface but if the benefit is that now
    > any SMT--new or pre-existing--can be conditionally applied, it seems like
    > it might be worth it.
    >
    > As far as the proposed design goes, some thoughts:
    >
    > 1. There isn't a clear reason why exceptions thrown from predicates should
    > be treated differently from exceptions thrown by transformations. We have
    > reasonable error-tolerance mechanisms in the Connect framework; why
    > silently swallow errors, especially if the risk is publishing potentially
    > corrupted data to Kafka or an external system because an SMT that should
    > have been applied wasn't?
    >
    > 2. It might be worth noting that the interface is a worker plugin, and that
    > presumably it would be loaded in the same way as other worker plugins such
    > as converters, connectors, and REST extensions. This would include aliasing
    > behavior that would allow users to specify predicates using their simple
    > class names as long as no two predicate plugins with the same simple name
    > were available on the worker.
    >
    > 3. Why would the Filter SMT explicitly take in a predicate in its
    > configuration if predicates can be applied to all SMTs? Just reading the
    > idea for a filter SMT, it seemed like the behavior would be that the SMT
    > would take in no configuration parameters and just blindly drop everything
    > that it sees, but typical use would involve pairing it with a predicate.
    >
    > 4. The question mark syntax seems a little cryptic, and combining the
    > configuration properties for the predicate and the transformation in the
    > same namespace ("transforms.<name>.*) seems a little noisy. What do you
    > think about allowing predicates to be configured in their own namespace,
    > and referenced by name in a single "predicate" (or maybe "?predicate" if we
    > really want to avoid risking backwards compatibility concerns) property?
    > This would also enable them to be more easily reused across several SMTs.
    >
    > For example, you might configure a worker with these properties (assuming
    > plugin aliasing is supported for predicates in the same way that it is for
    > transformations):
    >
    > transforms: t2
    > transforms.t2.predicate: has-my-prefix
    > transforms.t2.type:ExtractField$Key
    > transforms.t2.field: c1
    >
    > predicates: has-my-prefix
    > predicates.has-my-prefix.type: TopicNameMatch
    > predicates.has-my-prefix.negate: true
    > predicates.has-my-prefix.pattern: my-prefix-.*
    >
    > Excited to see how this is evolving and I think we're heading towards
    > something really useful for the Connect framework.
    >
    > Cheers,
    >
    > Chris
    >
    >
    > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <andrew_schofield@live.com
    > >
    > wrote:
    >
    > > Hi Tom,
    > > Looks good to me.
    > >
    > > Thanks,
    > > Andrew
    > >
    > > On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
    > >
    > >     Hi,
    > >
    > >     I've updated the KIP to reflect recent discussions. Please note the
    > >     following:
    > >
    > >     1. I've described a HasHeaderKey predicate rather than
    > HeaderKeyMatches
    > >     because at the moment Headers doesn't expose the whole set of
    > headers,
    > > only
    > >     those with a specific name. Exposing this would significantly
    > increase
    > > the
    > >     scope of this KIP but relatively little extra benefit.
    > >     2. I used org.apache.kafka.connect.transformer.predicates rather than
    > >     org.apache.kafka.connect.predicates, since I thought it better
    > > reflected
    > >     the use of predicates within transforms. I'm flexible on this one if
    > >     someone has a good case for the original name. For example the
    > original
    > >     package name would be more appropriate if we were expecting
    > Connectors
    > > to
    > >     make use of Predicates somehow.
    > >     3. I've dropped the special case of not needing to provide a FQN for
    > >     predicate classes in that package, since this isn't something
    > > supported by
    > >     transformations themselves as far as I know. I like the idea, but it
    > > seemed
    > >     inconsistent to need it for transformations but not for predicates.
    > >     4. Chris, I've used your suggestions for a validate() method for the
    > > extra
    > >     configurability needed. I've also added this to Predicate (even
    > though
    > > I
    > >     don't need it) for consistency with Connector and Transformation.
    > >     5. For negating the result of a predicate I decided it was clearer to
    > > just
    > >     have a "negate" config parameter, supported by all predicates, than
    > to
    > >     reach for more punctuation.
    > >
    > >     I'd be grateful for any more feedback people might have.
    > >
    > >     Kind regards,
    > >
    > >     Tom
    > >
    > >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
    > > andrew_schofield@live.com>
    > >     wrote:
    > >
    > >     > Hi Tom,
    > >     > I think we should go for a consistent naming convention for the
    > >     > predicates, maybe so they
    > >     > read nicely if you say "IF" first. Here's a suggestion for
    > > predicates that
    > >     > the KIP could introduce:
    > >     >  - TopicNameMatches
    > >     >  - HeaderKeyMatches
    > >     >  - RecordIsTombstone
    > >     >
    > >     > On naming, I also suggest using
    > >     > org.apache.kafka.connect.predicates.Predicate
    > >     > as the name for the interface.
    > >     >
    > >     > I hadn't settled on a preference for negation. I suppose there are
    > > three
    > >     > results
    > >     > from evaluating a predicate - true, false or an exception caught by
    > > Kafka
    > >     > Connect.
    > >     > I suggest the exception case is always a predicate fail so the
    > > guarded
    > >     > transformation
    > >     > is skipped. Initially I'd preferred having the predicate be
    > > configured to
    > >     > return a
    > >     > negated result to give a "not match" behaviour. Thinking again
    > now, I
    > >     > prefer something like:
    > >     >
    > >     > transforms: t1
    > >     > transforms.t1.?type: RecordIsTombstone
    > >     > transforms.t1.type:
    > > org.apache.kafka.connect.transforms.InsertField$Value
    > >     > transforms.t1.static.field: deleted
    > >     > transforms.t1.static.value: true
    > >     >
    > >     > The "?" means the transform guarded by the predicate runs in the
    > > Predicate
    > >     > returns true.
    > >     >
    > >     > transforms: t2
    > >     > transforms.t2.?!type:
    > > org.apache.kafka.connect.predicates.TopicNameMatch
    > >     > transforms.t2.?regex: my-prefix-.*
    > >     > transforms.t2.type:
    > > org.apache.kafka.connect.transforms.ExtractField$Key
    > >     > transforms.t2.field: c1
    > >     >
    > >     > The "?!" means the transform guarded by the predicate runs if the
    > > Predicate
    > >     > returns false.
    > >     >
    > >     > I think adding a Filter SMT that uses a Predicate is a nice idea.
    > >     >
    > >     > Cheers,
    > >     > Andrew
    > >     >
    >
    >


Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Chris,

Thanks for the feedback. On points 1 and 2 I agree.

3. This is a great point. I'm happy to add this to the KIP too, but note it
removes the need to add validate(Map<String, String>) to both
Transformation and Predicate. That doesn't mean we shouldn't still do it,
just that the change is not motivated by the features being added in the
KIP. So I've removed validate(Map<String, String>) for now.

4. Your proposal would further reduce the chance of Transformation config
conflict (because just the 'predicate' parameter might conflict), but it
also adds the chance of conflict with Connector parameters (all the
'predicates.*' parameters). Connector implementations are far more numerous
than Transformation implementations, so the risk is proportionately
greater. As you say we could try to reduce the chances of conflict (e.g.
'?predicate' parameter on transformations and '?predicates' prefix on
connectors), but I'm still left with uneasy feelings. I'd like to hear
about what others think before I change this one.

Cheers,

Tom


On Mon, May 4, 2020 at 4:57 AM Christopher Egerton <ch...@confluent.io>
wrote:

> Hi all,
>
> This has certainly taken quite a turn! I wish we could get this done
> without adding another pluggable interface but if the benefit is that now
> any SMT--new or pre-existing--can be conditionally applied, it seems like
> it might be worth it.
>
> As far as the proposed design goes, some thoughts:
>
> 1. There isn't a clear reason why exceptions thrown from predicates should
> be treated differently from exceptions thrown by transformations. We have
> reasonable error-tolerance mechanisms in the Connect framework; why
> silently swallow errors, especially if the risk is publishing potentially
> corrupted data to Kafka or an external system because an SMT that should
> have been applied wasn't?
>
> 2. It might be worth noting that the interface is a worker plugin, and that
> presumably it would be loaded in the same way as other worker plugins such
> as converters, connectors, and REST extensions. This would include aliasing
> behavior that would allow users to specify predicates using their simple
> class names as long as no two predicate plugins with the same simple name
> were available on the worker.
>
> 3. Why would the Filter SMT explicitly take in a predicate in its
> configuration if predicates can be applied to all SMTs? Just reading the
> idea for a filter SMT, it seemed like the behavior would be that the SMT
> would take in no configuration parameters and just blindly drop everything
> that it sees, but typical use would involve pairing it with a predicate.
>
> 4. The question mark syntax seems a little cryptic, and combining the
> configuration properties for the predicate and the transformation in the
> same namespace ("transforms.<name>.*) seems a little noisy. What do you
> think about allowing predicates to be configured in their own namespace,
> and referenced by name in a single "predicate" (or maybe "?predicate" if we
> really want to avoid risking backwards compatibility concerns) property?
> This would also enable them to be more easily reused across several SMTs.
>
> For example, you might configure a worker with these properties (assuming
> plugin aliasing is supported for predicates in the same way that it is for
> transformations):
>
> transforms: t2
> transforms.t2.predicate: has-my-prefix
> transforms.t2.type:ExtractField$Key
> transforms.t2.field: c1
>
> predicates: has-my-prefix
> predicates.has-my-prefix.type: TopicNameMatch
> predicates.has-my-prefix.negate: true
> predicates.has-my-prefix.pattern: my-prefix-.*
>
> Excited to see how this is evolving and I think we're heading towards
> something really useful for the Connect framework.
>
> Cheers,
>
> Chris
>
>
> On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <andrew_schofield@live.com
> >
> wrote:
>
> > Hi Tom,
> > Looks good to me.
> >
> > Thanks,
> > Andrew
> >
> > On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
> >
> >     Hi,
> >
> >     I've updated the KIP to reflect recent discussions. Please note the
> >     following:
> >
> >     1. I've described a HasHeaderKey predicate rather than
> HeaderKeyMatches
> >     because at the moment Headers doesn't expose the whole set of
> headers,
> > only
> >     those with a specific name. Exposing this would significantly
> increase
> > the
> >     scope of this KIP but relatively little extra benefit.
> >     2. I used org.apache.kafka.connect.transformer.predicates rather than
> >     org.apache.kafka.connect.predicates, since I thought it better
> > reflected
> >     the use of predicates within transforms. I'm flexible on this one if
> >     someone has a good case for the original name. For example the
> original
> >     package name would be more appropriate if we were expecting
> Connectors
> > to
> >     make use of Predicates somehow.
> >     3. I've dropped the special case of not needing to provide a FQN for
> >     predicate classes in that package, since this isn't something
> > supported by
> >     transformations themselves as far as I know. I like the idea, but it
> > seemed
> >     inconsistent to need it for transformations but not for predicates.
> >     4. Chris, I've used your suggestions for a validate() method for the
> > extra
> >     configurability needed. I've also added this to Predicate (even
> though
> > I
> >     don't need it) for consistency with Connector and Transformation.
> >     5. For negating the result of a predicate I decided it was clearer to
> > just
> >     have a "negate" config parameter, supported by all predicates, than
> to
> >     reach for more punctuation.
> >
> >     I'd be grateful for any more feedback people might have.
> >
> >     Kind regards,
> >
> >     Tom
> >
> >     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
> > andrew_schofield@live.com>
> >     wrote:
> >
> >     > Hi Tom,
> >     > I think we should go for a consistent naming convention for the
> >     > predicates, maybe so they
> >     > read nicely if you say "IF" first. Here's a suggestion for
> > predicates that
> >     > the KIP could introduce:
> >     >  - TopicNameMatches
> >     >  - HeaderKeyMatches
> >     >  - RecordIsTombstone
> >     >
> >     > On naming, I also suggest using
> >     > org.apache.kafka.connect.predicates.Predicate
> >     > as the name for the interface.
> >     >
> >     > I hadn't settled on a preference for negation. I suppose there are
> > three
> >     > results
> >     > from evaluating a predicate - true, false or an exception caught by
> > Kafka
> >     > Connect.
> >     > I suggest the exception case is always a predicate fail so the
> > guarded
> >     > transformation
> >     > is skipped. Initially I'd preferred having the predicate be
> > configured to
> >     > return a
> >     > negated result to give a "not match" behaviour. Thinking again
> now, I
> >     > prefer something like:
> >     >
> >     > transforms: t1
> >     > transforms.t1.?type: RecordIsTombstone
> >     > transforms.t1.type:
> > org.apache.kafka.connect.transforms.InsertField$Value
> >     > transforms.t1.static.field: deleted
> >     > transforms.t1.static.value: true
> >     >
> >     > The "?" means the transform guarded by the predicate runs in the
> > Predicate
> >     > returns true.
> >     >
> >     > transforms: t2
> >     > transforms.t2.?!type:
> > org.apache.kafka.connect.predicates.TopicNameMatch
> >     > transforms.t2.?regex: my-prefix-.*
> >     > transforms.t2.type:
> > org.apache.kafka.connect.transforms.ExtractField$Key
> >     > transforms.t2.field: c1
> >     >
> >     > The "?!" means the transform guarded by the predicate runs if the
> > Predicate
> >     > returns false.
> >     >
> >     > I think adding a Filter SMT that uses a Predicate is a nice idea.
> >     >
> >     > Cheers,
> >     > Andrew
> >     >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Christopher Egerton <ch...@confluent.io>.
Hi all,

This has certainly taken quite a turn! I wish we could get this done
without adding another pluggable interface but if the benefit is that now
any SMT--new or pre-existing--can be conditionally applied, it seems like
it might be worth it.

As far as the proposed design goes, some thoughts:

1. There isn't a clear reason why exceptions thrown from predicates should
be treated differently from exceptions thrown by transformations. We have
reasonable error-tolerance mechanisms in the Connect framework; why
silently swallow errors, especially if the risk is publishing potentially
corrupted data to Kafka or an external system because an SMT that should
have been applied wasn't?

2. It might be worth noting that the interface is a worker plugin, and that
presumably it would be loaded in the same way as other worker plugins such
as converters, connectors, and REST extensions. This would include aliasing
behavior that would allow users to specify predicates using their simple
class names as long as no two predicate plugins with the same simple name
were available on the worker.

3. Why would the Filter SMT explicitly take in a predicate in its
configuration if predicates can be applied to all SMTs? Just reading the
idea for a filter SMT, it seemed like the behavior would be that the SMT
would take in no configuration parameters and just blindly drop everything
that it sees, but typical use would involve pairing it with a predicate.

4. The question mark syntax seems a little cryptic, and combining the
configuration properties for the predicate and the transformation in the
same namespace ("transforms.<name>.*) seems a little noisy. What do you
think about allowing predicates to be configured in their own namespace,
and referenced by name in a single "predicate" (or maybe "?predicate" if we
really want to avoid risking backwards compatibility concerns) property?
This would also enable them to be more easily reused across several SMTs.

For example, you might configure a worker with these properties (assuming
plugin aliasing is supported for predicates in the same way that it is for
transformations):

transforms: t2
transforms.t2.predicate: has-my-prefix
transforms.t2.type:ExtractField$Key
transforms.t2.field: c1

predicates: has-my-prefix
predicates.has-my-prefix.type: TopicNameMatch
predicates.has-my-prefix.negate: true
predicates.has-my-prefix.pattern: my-prefix-.*

Excited to see how this is evolving and I think we're heading towards
something really useful for the Connect framework.

Cheers,

Chris


On Fri, May 1, 2020 at 9:51 AM Andrew Schofield <an...@live.com>
wrote:

> Hi Tom,
> Looks good to me.
>
> Thanks,
> Andrew
>
> On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:
>
>     Hi,
>
>     I've updated the KIP to reflect recent discussions. Please note the
>     following:
>
>     1. I've described a HasHeaderKey predicate rather than HeaderKeyMatches
>     because at the moment Headers doesn't expose the whole set of headers,
> only
>     those with a specific name. Exposing this would significantly increase
> the
>     scope of this KIP but relatively little extra benefit.
>     2. I used org.apache.kafka.connect.transformer.predicates rather than
>     org.apache.kafka.connect.predicates, since I thought it better
> reflected
>     the use of predicates within transforms. I'm flexible on this one if
>     someone has a good case for the original name. For example the original
>     package name would be more appropriate if we were expecting Connectors
> to
>     make use of Predicates somehow.
>     3. I've dropped the special case of not needing to provide a FQN for
>     predicate classes in that package, since this isn't something
> supported by
>     transformations themselves as far as I know. I like the idea, but it
> seemed
>     inconsistent to need it for transformations but not for predicates.
>     4. Chris, I've used your suggestions for a validate() method for the
> extra
>     configurability needed. I've also added this to Predicate (even though
> I
>     don't need it) for consistency with Connector and Transformation.
>     5. For negating the result of a predicate I decided it was clearer to
> just
>     have a "negate" config parameter, supported by all predicates, than to
>     reach for more punctuation.
>
>     I'd be grateful for any more feedback people might have.
>
>     Kind regards,
>
>     Tom
>
>     On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <
> andrew_schofield@live.com>
>     wrote:
>
>     > Hi Tom,
>     > I think we should go for a consistent naming convention for the
>     > predicates, maybe so they
>     > read nicely if you say "IF" first. Here's a suggestion for
> predicates that
>     > the KIP could introduce:
>     >  - TopicNameMatches
>     >  - HeaderKeyMatches
>     >  - RecordIsTombstone
>     >
>     > On naming, I also suggest using
>     > org.apache.kafka.connect.predicates.Predicate
>     > as the name for the interface.
>     >
>     > I hadn't settled on a preference for negation. I suppose there are
> three
>     > results
>     > from evaluating a predicate - true, false or an exception caught by
> Kafka
>     > Connect.
>     > I suggest the exception case is always a predicate fail so the
> guarded
>     > transformation
>     > is skipped. Initially I'd preferred having the predicate be
> configured to
>     > return a
>     > negated result to give a "not match" behaviour. Thinking again now, I
>     > prefer something like:
>     >
>     > transforms: t1
>     > transforms.t1.?type: RecordIsTombstone
>     > transforms.t1.type:
> org.apache.kafka.connect.transforms.InsertField$Value
>     > transforms.t1.static.field: deleted
>     > transforms.t1.static.value: true
>     >
>     > The "?" means the transform guarded by the predicate runs in the
> Predicate
>     > returns true.
>     >
>     > transforms: t2
>     > transforms.t2.?!type:
> org.apache.kafka.connect.predicates.TopicNameMatch
>     > transforms.t2.?regex: my-prefix-.*
>     > transforms.t2.type:
> org.apache.kafka.connect.transforms.ExtractField$Key
>     > transforms.t2.field: c1
>     >
>     > The "?!" means the transform guarded by the predicate runs if the
> Predicate
>     > returns false.
>     >
>     > I think adding a Filter SMT that uses a Predicate is a nice idea.
>     >
>     > Cheers,
>     > Andrew
>     >
>     > On 28/04/2020, 09:44, "Tom Bentley" <tb...@redhat.com> wrote:
>     >
>     >     Hi Andrew and Chris,
>     >
>     >     Firstly, thanks for the input.
>     >
>     >     To me, the ?type syntax has some pros and cons. On the pros side:
>     >     * it's pretty succinct
>     >     * it's flexible enough for the use cases we've identified so far.
>     >
>     >     On the cons side:
>     >     * it's a bit cryptic; I don't think people who didn't know it
> already
>     > would
>     >     be able to guess what ?type meant.
>     >     * it doesn't work so well if there is a whole chain of
> transformations
>     >     which need to be applied conditionally (that can still be done,
> of
>     > course,
>     >     but the user has to duplicate the configuration for each SMT).
> Nor
>     > does it
>     >     work if you need something akin to a "nested if" (you could fake
> it
>     > using
>     >     And), but we don't really expect people to need that.
>     >
>     >     Andrew, you didn't give an example of a negated match. Would you
> use
>     >     something like !?type for that, or did you have another idea?
>     >
>     >     On balance, I think this is a reasonable trade off.
>     >
>     >     I agree that exposing RecordPredicate as a public API, while
> providing
>     >     common implementation(s) would allow most use cases to be covered
>     > without
>     >     coding, but also allow people with specific unaddressed needs to
>     > implement
>     >     their own without having to write a streaming application to do
> so.
>     > Again,
>     >     I think this is a reasonable trade off.
>     >
>     >     It would be helpful to have a consensus about exactly which
> condition
>     >     implementations the KIP should provide. I think we all agree that
>     >     TopicMatches (or should it be TopicNameMatches?) should be
> included.
>     >     HasHeader (or should it be HasHeaderWithName or
> HasHeaderMatching for a
>     >     regex version?), And and Or all seem like they could be useful,
> even
>     > though
>     >     no one has pointed to concrete use cases. It might also be worth
>     >     considering that it would be natural to add a Filter SMT which
> filtered
>     >     records using a RecordPredicate. Perhaps that provides
> motivation for
>     >     conditions on the record key as well? If people think the Filter
> SMT
>     > would
>     >     be useful I'm willing to include it in this KIP, since it seems
>     >     straightforward.
>     >
>     >     If people are happy with this approach, I'll update the KIP.
>     >
>     >     Kind regards,
>     >
>     >     Tom
>     >
>     >     On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <
>     > andrew_schofield@live.com>
>     >     wrote:
>     >
>     >     > Hi,
>     >     > I'm suggesting that RecordPredicate be another pluggable
> interface
>     > that
>     >     > users
>     >     > could conceivably implement on their own, and that the KIP
>     > introduces the
>     >     > most likely ones as supplied implementations, much as there
> are SMT
>     >     > implementations such as HoistField. I don't really see users
>     > implementing
>     >     > their own RecordPredicate, but the supplied ones would
> probably have
>     > a
>     >     > common
>     >     > interface and it seems a pity not to expose that as an
> external API.
>     >     >
>     >     > SMT is essentially a toolkit for Kafka Connect and I think
>     > RecordPredicate
>     >     > would be a nice minor enhancement and I'd be happy to
> contribute
>     > code.
>     >     >
>     >     > In order to make it simpler to configure, I suggest that the
> class
>     > name
>     >     > configured
>     >     > using "?type" would be assumed to be in package
>     >     > org.apache.kafka.connect.predicates
>     >     > in the absence of a package name so that shorter names such as
>     >     > "TopicMatches"
>     >     > could be used.
>     >     >
>     >     > Cheers,
>     >     > Andrew Schofield
>     >     > IBM Event Streams
>     >     >
>     >     > On 25/04/2020, 21:22, "Christopher Egerton" <
> chrise@confluent.io>
>     > wrote:
>     >     >
>     >     >     Hi Andrew,
>     >     >
>     >     >     I know a DSL seems like overkill, and I'm not attached to
> it by
>     > any
>     >     > means,
>     >     >     but I do think it serves a vital purpose in that it allows
>     > people who
>     >     > don't
>     >     >     know how or have the time to write Java code to act on
> data being
>     >     > processed
>     >     >     by connectors.
>     >     >
>     >     >     Are you proposing that the "RecordPredicate" be another
> pluggable
>     >     > interface
>     >     >     that users could then implement on their own? Or, would
> this be
>     > purely
>     >     > a
>     >     >     syntactic expansion to SMT configuration with some subset
> of
>     > hard-coded
>     >     >     predicates provided by the framework? I think there's
> value in
>     > the
>     >     > latter,
>     >     >     but the former doesn't seem like it'd bring much to the
> table as
>     > far as
>     >     >     users are concerned, and for developers, it is an
> improvement,
>     > but not
>     >     > a
>     >     >     major one.
>     >     >
>     >     >     Cheers,
>     >     >
>     >     >     Chris
>     >     >
>     >     >     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
>     >     >     schofieldandrewj@gmail.com> wrote:
>     >     >
>     >     >     > I wonder whether we're getting a bit overcomplicated with
>     > this. I
>     >     > think all
>     >     >     > that's required here is to add an optional guard
> predicate for
>     > a
>     >     >     > Transformation.
>     >     >     > The predicate cannot end the Transformation chain, but
> it can
>     > allow
>     >     > or
>     >     >     > prevent a particular
>     >     >     > Transformation from running.
>     >     >     >
>     >     >     > How about this as syntax?
>     >     >     > transforms: extractInt
>     >     >     > transforms.extractInt.?type:
>     >     >     > org.apache.kafka.connect.predicates.TopicMatches
>     >     >     > transforms.extractInt.?regex: my-prefix-.*
>     >     >     > transforms.extractInt.type:
>     >     >     > org.apache.kafka.connect.transforms.ExtractField$Key
>     >     >     > transforms.extractInt.field: c1
>     >     >     >
>     >     >     > The idea is that a Transformation can have an optional
>     >     > RecordPredicate.
>     >     >     > The RecordPredicate
>     >     >     > can have configuration items, in a similar way to a
>     > Transformation.
>     >     > The
>     >     >     > syntax of
>     >     >     > using a '?' prefix to separate configuration for the
>     > RecordPredicate
>     >     > from
>     >     >     > the configuration
>     >     >     > for the Transformation could conceivably clash with an
> existing
>     >     >     > Transformation
>     >     >     > but the chances are minimal.
>     >     >     >
>     >     >     > This syntax doesn't allow for an 'else', but if the KIP
> offers
>     > say a
>     >     >     > TopicMatches predicate
>     >     >     > then that can be configured to return FALSE if the
> predicate
>     > matches.
>     >     >     >
>     >     >     > I feel that a DSL for SMTs is overkill. If you need
> something
>     > that
>     >     >     > complex, it's
>     >     >     > perhaps too complex for a transformation chain and it's
> really
>     > a
>     >     > streaming
>     >     >     > application.
>     >     >     >
>     >     >     > Andrew Schofield
>     >     >     > IBM Event Streams
>     >     >     >
>     >     >     > On 2020/04/08 21:39:31, Christopher Egerton <
>     > chrise@confluent.io>
>     >     > wrote:
>     >     >     > > Hi Tom,
>     >     >     > >
>     >     >     > > With regards to the potential Transformation::validate
>     > method, I
>     >     > don't
>     >     >     > > quite follow your objections. The AbstractHerder class,
>     >     > ConnectorConfig
>     >     >     > > class, and embedding of ConfigDefs that happens with
> the two
>     > is all
>     >     >     > > internal logic and we're allowed to modify it however
> we
>     > want, as
>     >     > long as
>     >     >     > > it doesn't alter any public-facing APIs (and if it
> does, we
>     > just
>     >     > have to
>     >     >     > > document those changes in a KIP). We don't have to
> embed the
>     >     > ConfigDef
>     >     >     > for
>     >     >     > > a transformation chain inside another ConfigDef if the
> API
>     > we want
>     >     > to
>     >     >     > > present to our users doesn't play well with that part
> of the
>     > code
>     >     > base.
>     >     >     > > Additionally, beyond aligning with the existing API
> provided
>     > for
>     >     >     > > connectors, another advantage is that it becomes
> possible to
>     >     > validate
>     >     >     > > properties for a configuration in the context of all
> other
>     >     > properties, so
>     >     >     > > to be clear, it's not just about preserving what may be
>     > perceived
>     >     > as a
>     >     >     > > superficial similarity and comes with real functional
>     > benefits
>     >     > that can't
>     >     >     > > be provided (at least, not as easily) by dynamic
>     > construction of a
>     >     >     > > ConfigDef object.
>     >     >     > >
>     >     >     > > As far as the new proposal goes, I hate to say it, but
> I
>     > think
>     >     > we're
>     >     >     > stuck
>     >     >     > > with the worst of both worlds here. Adding the new
>     > RecordPredicate
>     >     >     > > interface seems like it defeats the whole purpose of
> SMTs,
>     > which
>     >     > is to
>     >     >     > > allow manipulation of a connector's event stream by
> users
>     > who don't
>     >     >     > > necessarily know how or have the time to write Java
> code of
>     > their
>     >     > own.
>     >     >     > This
>     >     >     > > is also why I'm in favor of adding a lightweight DSL
> for the
>     >     > condition;
>     >     >     > > emphasizing readability for people who aren't very
> familiar
>     > with
>     >     > Connect
>     >     >     > > and just want to get something going quickly should be
> a
>     > priority
>     >     > for
>     >     >     > SMTs.
>     >     >     > > But if that's not going to happen with this KIP, I'd
> still
>     > choose
>     >     > the
>     >     >     > > simpler, less-flexible approach initially outlined, in
> order
>     > to
>     >     > keep
>     >     >     > things
>     >     >     > > simple for people creating connectors and try to let
> them
>     >     > accomplish what
>     >     >     > > they want via configuration, not code.
>     >     >     > >
>     >     >     > > With regards to the question about where the line
> should be
>     > drawn
>     >     > and how
>     >     >     > > much is too much and comparisons to other stream
> processing
>     >     > frameworks, I
>     >     >     > > think the nature of SMTs draws the line quite nicely:
> you
>     > can only
>     >     >     > process
>     >     >     > > one message at a time. There's plenty of use cases out
> there
>     > for
>     >     >     > > heavier-duty processing frameworks like Kafka Streams,
> with
>     >     > aggregate
>     >     >     > > operations, joining of streams, expanding a single
> message
>     > into
>     >     > multiple
>     >     >     > > messages, etc. With SMTs, none of this is possible; the
>     > general
>     >     > use case
>     >     >     > is
>     >     >     > > to filter and clean a stream of data. If any of the
>     > heavier-weight
>     >     > logic
>     >     >     > > provided by, e.g., Streams, isn't required for a
> project, it
>     >     > should be
>     >     >     > > possible to get along with just a collection of sink
>     > connectors,
>     >     > source
>     >     >     > > connectors, a converter or two, and SMTs that smooth
> over any
>     >     > differences
>     >     >     > > in data format between what source connectors produce
> and
>     > what sink
>     >     >     > > connectors expect. This is why I'm comfortable
> suggesting
>     > heavy
>     >     > expansion
>     >     >     > > of the out-of-the-box SMTs that we provide with
> Connect; as
>     > long as
>     >     >     > they're
>     >     >     > > reasonable to configure, they can greatly reduce the
>     > operational
>     >     > burden
>     >     >     > on
>     >     >     > > anyone running and/or using a Connect cluster since
> they
>     > might
>     >     > entirely
>     >     >     > > replace additional services that would otherwise be
> required.
>     >     >     > >
>     >     >     > > All that said--this is a huge ask of someone who just
> wants
>     > to
>     >     > support
>     >     >     > the
>     >     >     > > SMT equivalent of an "if" statement, and it's totally
>     >     > understandable if
>     >     >     > > that's too much to ask. The one concern I have left is
> that
>     > if we
>     >     > expand
>     >     >     > > the SMT in the future, there become compatibility
> concerns
>     > since
>     >     > SMTs are
>     >     >     > > pretty tightly-coupled with the worker on which they
> run
>     > (although
>     >     >     > > technically, with some classpath/plugin path
> shuffling, you
>     > can run
>     >     >     > > different versions of the out-of-the-box SMTs from the
>     > Connect
>     >     > worker on
>     >     >     > > which they're run). Someone might write a connector
> config
>     > with the
>     >     >     > > If/Conditional/Whatever SMT with a condition type that
> works
>     > on one
>     >     >     > worker,
>     >     >     > > but that doesn't work on another worker that's running
> an
>     > earlier
>     >     > version
>     >     >     > > of Connect. This is why I'm in favor of adding extra
>     > predicates now
>     >     >     > instead
>     >     >     > > of later; if we're going to implement what has the
> potential
>     > to be
>     >     > a bit
>     >     >     > of
>     >     >     > > a Swiss army knife SMT with room for future expansion
> of
>     >     > configuration,
>     >     >     > and
>     >     >     > > we can think of a reasonable way to add that
> functionality
>     > now, it
>     >     > seems
>     >     >     > > better for users and administrators of Connect to try
> to do
>     > that
>     >     > now
>     >     >     > > instead of later.
>     >     >     > >
>     >     >     > > Cheers,
>     >     >     > >
>     >     >     > > Chris
>     >     >     > >
>     >     >     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <
>     > tbentley@redhat.com>
>     >     > wrote:
>     >     >     > >
>     >     >     > > > Since no one objected I've updated the KIP with the
>     > revised way
>     >     > to
>     >     >     > > > configure this transformation.
>     >     >     > > >
>     >     >     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <
>     > tbentley@redhat.com>
>     >     >     > wrote:
>     >     >     > > >
>     >     >     > > > > To come back about a point Chris made:
>     >     >     > > > >
>     >     >     > > > > 1. Instead of the new "ConfigDef config(Map<String,
>     > String>
>     >     > props)"
>     >     >     > > > method,
>     >     >     > > > >> what would you think about adopting a similar
> approach
>     > as the
>     >     >     > framework
>     >     >     > > > >> uses with connectors, by adding a "Config
>     > validate(Map<String,
>     >     >     > String>
>     >     >     > > > >> props)" method that can perform custom validation
>     > outside of
>     >     > what
>     >     >     > can be
>     >     >     > > > >> performed by the ConfigDef's
> single-property-at-a-time
>     >     > validation?
>     >     >     > It
>     >     >     > > > may
>     >     >     > > > >> be a little heavyweight for use with this
> particular
>     > SMT, but
>     >     > it'd
>     >     >     > > > provide
>     >     >     > > > >> more flexibility for other SMT implementations
> and would
>     >     > mirror an
>     >     >     > API
>     >     >     > > > >> that
>     >     >     > > > >> developers targeting the framework are likely
> already
>     >     > familiar with.
>     >     >     > > > >>
>     >     >     > > > >
>     >     >     > > > > The validate() + config() approach taken for
> Connectors
>     >     > doesn't quite
>     >     >     > > > work
>     >     >     > > > > for Transformations.
>     >     >     > > > >
>     >     >     > > > > The default Connector.validate() basically just
> calls
>     >     >     > > > > `config().validate(connectorConfigs)` and returns a
>     > Config.
>     >     >     > Separately
>     >     >     > > > the
>     >     >     > > > > AbstractHeader also calls `config()` and uses the
> Config
>     > and
>     >     >     > ConfigDef to
>     >     >     > > > > build the ConfigInfos. The contract is not really
>     > defined well
>     >     >     > enough in
>     >     >     > > > > the javadoc, but this means a connector can use
>     > validate() to
>     >     > build
>     >     >     > the
>     >     >     > > > > ConfigDef which it will return from config().
>     >     >     > > > >
>     >     >     > > > > For Transformations we don't really want
> validate() to
>     > perform
>     >     >     > validation
>     >     >     > > > > at all since ultimately the transformations config
> will
>     > be
>     >     > embedded
>     >     >     > in
>     >     >     > > > the
>     >     >     > > > > connector's and will be validated by the connector
>     > itself. It
>     >     >     > wouldn't be
>     >     >     > > > > harmful if it _did_ perform validation, just
>     > unnecessary. But
>     >     > having
>     >     >     > a
>     >     >     > > > > validate() method that ought not to validate()
> seems
>     > like a
>     >     > recipe
>     >     >     > for
>     >     >     > > > > confusion.
>     >     >     > > > >
>     >     >     > > > > Also problematic is that there's no use for the
> Config
>     >     > returned from
>     >     >     > > > > Transformations.validate().
>     >     >     > > > >
>     >     >     > > > > So I'm not convinced that the superficial
> similarity
>     > really
>     >     > gains
>     >     >     > > > anything.
>     >     >     > > > >
>     >     >     > > > > Kind regards,
>     >     >     > > > >
>     >     >     > > > > Tom
>     >     >     > > > >
>     >     >     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
>     >     > tbentley@redhat.com>
>     >     >     > wrote:
>     >     >     > > > >
>     >     >     > > > >> Hi,
>     >     >     > > > >>
>     >     >     > > > >> Hi all,
>     >     >     > > > >>
>     >     >     > > > >> Thanks for the discussion so far.
>     >     >     > > > >>
>     >     >     > > > >> It seems a bit weird to me that when configuring
> the
>     >     > Conditional SMT
>     >     >     > > > with
>     >     >     > > > >> a DSL you would use a concise, intuitive DSL for
>     > expressing
>     >     > the
>     >     >     > > > condition,
>     >     >     > > > >> but not for the transforms that it's guarding. It
> also
>     > seems
>     >     >     > natural, if
>     >     >     > > > >> you support this for conditionally applying SMTs,
> that
>     > you'd
>     >     > soon
>     >     >     > want
>     >     >     > > > to
>     >     >     > > > >> support the same thing for a generic filter
> transformer.
>     >     > Then, if
>     >     >     > you
>     >     >     > > > can
>     >     >     > > > >> configure a filter transformation using this DSL,
> it
>     > becomes
>     >     > odd
>     >     >     > that
>     >     >     > > > you
>     >     >     > > > >> can't do this for mapping transformations. I
> think it
>     > would
>     >     > be a
>     >     >     > > > mistake to
>     >     >     > > > >> go specifying an expression language for
> conditions when
>     >     > really that
>     >     >     > > > might
>     >     >     > > > >> just be part of a language for transformations.
>     >     >     > > > >>
>     >     >     > > > >> I think it would be possible today to write an
> SMT which
>     >     > allowed
>     >     >     > you to
>     >     >     > > > >> express the transformation in a DSL. Concretely,
> it's
>     >     > possible to
>     >     >     > > > imagine a
>     >     >     > > > >> single transformation taking a DSL something like
> this:
>     >     >     > > > >>
>     >     >     > > > >>   compose(
>     >     >     > > > >>     Flatten.Key(delimiter: ':'),
>     >     >     > > > >>     If(condition: TopicMatches(pattern:
> "blah-.*"),
>     >     >     > > > >>        then: Flatten.Value(delimiter: '/')))
>     >     >     > > > >>
>     >     >     > > > >> All I've really done here, beyond what was already
>     > proposed,
>     >     > is
>     >     >     > rename
>     >     >     > > > >> Conditional to If, imagine a couple more bits of
> syntax
>     >     >     > > > >> (SMTs being constructed by class name and named
> argument
>     >     > invocation
>     >     >     > > > >> syntax to support SMT constructors) and add a
> higher
>     > level
>     >     > compose()
>     >     >     > > > >> function for chaining SMTs (which could easily be
>     > replaced
>     >     > with
>     >     >     > brace
>     >     >     > > > >> delimited blocks).
>     >     >     > > > >>
>     >     >     > > > >> That may be a discussion we should have, but I
> think in
>     >     > _this_ KIP
>     >     >     > we
>     >     >     > > > >> should focus on the conditional part, since the
> above
>     > example
>     >     >     > hopefully
>     >     >     > > > >> shows that it would be possible to reuse it in a
> DSL if
>     > there
>     >     > was
>     >     >     > > > appetite
>     >     >     > > > >> for that.
>     >     >     > > > >>
>     >     >     > > > >> With that in mind, and since my original
> suggestion for
>     > an
>     >     >     > abbreviated
>     >     >     > > > >> config syntax didn't appeal to people, I propose
> that we
>     >     > stick with
>     >     >     > the
>     >     >     > > > >> existing norms for configuring this.
>     >     >     > > > >> The previous example would look like this:
>     >     >     > > > >>
>     >     >     > > > >> transformation: flattenKey,if
>     >     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     >     > > > >> transformation.flattenKey.delimiter: :
>     >     >     > > > >> transformation.if.type: If
>     >     >     > > > >> transformation.if.condition.type: TopicMatches
>     >     >     > > > >> transformation.if.condition.pattern: blah-.*
>     >     >     > > > >> transformation.if.then: flattenValue
>     >     >     > > > >> transformation.if.then.flattenValue.type:
> Flatten.Value
>     >     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     >     > > > >>
>     >     >     > > > >> Also, I'm inclined to think we should stick to
> just
>     > supporting
>     >     >     > > > >> TopicMatches and Not, since we've not identified
> an
>     > actual
>     >     > need for
>     >     >     > > > >> 'has-header', 'or' and 'and'.
>     >     >     > > > >>
>     >     >     > > > >> An example of the usage of Not would look like
> this:
>     >     >     > > > >>
>     >     >     > > > >> transformation: flattenKey,if
>     >     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     >     > > > >> transformation.flattenKey.delimiter: :
>     >     >     > > > >> transformation.if.type: If
>     >     >     > > > >> transformation.if.condition.type: Not
>     >     >     > > > >> transformation.if.condition.operand:
> startsWithBlah
>     >     >     > > > >>
> transformation.if.condition.operand.startsWithBlah.type:
>     >     >     > TopicMatches
>     >     >     > > > >>
>     > transformation.if.condition.operand.startsWithBlah.pattern:
>     >     > blah-.*
>     >     >     > > > >> transformation.if.then: flattenValue
>     >     >     > > > >> transformation.if.then.flattenValue.type:
> Flatten.Value
>     >     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     >     > > > >>
>     >     >     > > > >> Hopefully we can agree that this allows the
> conditional
>     > SMT
>     >     > part to
>     >     >     > make
>     >     >     > > > >> progress without getting bogged down.
>     >     >     > > > >>
>     >     >     > > > >> Thoughts? If this seems broadly reasonable, I'll
> update
>     > the
>     >     > KIP.
>     >     >     > > > >>
>     >     >     > > > >> Kind regards,
>     >     >     > > > >>
>     >     >     > > > >> Tom
>     >     >     > > > >>
>     >     >     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
>     >     > gunnar@hibernate.org
>     >     >     > >
>     >     >     > > > >> wrote:
>     >     >     > > > >>
>     >     >     > > > >>> Hi all,
>     >     >     > > > >>>
>     >     >     > > > >>> Thanks a lot for this initiative, Tom!
>     >     >     > > > >>>
>     >     >     > > > >>> To shed some light, the use case where this
> first came
>     > up,
>     >     > were
>     >     >     > issues
>     >     >     > > > >>> we saw with SMTs being applied to the different
> topics
>     >     > produced by
>     >     >     > the
>     >     >     > > > >>> Debezium change data capture connectors. There
> are
>     > different
>     >     > kinds
>     >     >     > of
>     >     >     > > > >>> topics (for change data, schema history,
> heartbeats
>     > etc.)
>     >     > and the
>     >     >     > > > >>> record structure to expect may vary between
> those.
>     > Hence we
>     >     > saw
>     >     >     > issues
>     >     >     > > > >>> with SMTs like ExtractField, which for instance
> only
>     > should
>     >     > be
>     >     >     > applied
>     >     >     > > > >>> to all change data topics but not the other ones.
>     >     >     > > > >>>
>     >     >     > > > >>> I like the overall approach; for Debezium's
> purposes,
>     > the
>     >     > simple
>     >     >     > topic
>     >     >     > > > >>> matching and negation operators would be
> sufficient
>     > already.
>     >     > I
>     >     >     > agree
>     >     >     > > > >>> with Chris and would prefer one single condition
>     > attribute,
>     >     > which
>     >     >     > > > >>> contains a single condition or potentially a
> logical
>     >     > expression
>     >     >     > with
>     >     >     > > > >>> not, and, etc. I think it's less ambiguous, in
>     > particular
>     >     > when it
>     >     >     > > > >>> comes to ordering of the different conditions and
>     >     > determining their
>     >     >     > > > >>> precedence.
>     >     >     > > > >>>
>     >     >     > > > >>> Would love to see this feature in one or another
> way in
>     >     > Connect.
>     >     >     > > > >>>
>     >     >     > > > >>> Best,
>     >     >     > > > >>>
>     >     >     > > > >>> --Gunnar
>     >     >     > > > >>>
>     >     >     > > > >>>
>     >     >     > > > >>>
>     >     >     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom
> Bentley <
>     >     >     > > > >>> tbentley@redhat.com>:
>     >     >     > > > >>> >
>     >     >     > > > >>> > Hi Chris and Sönke,
>     >     >     > > > >>> >
>     >     >     > > > >>> > Using the numbering from Chris's email...
>     >     >     > > > >>> >
>     >     >     > > > >>> > 1. That's a good point, I'll see what is
> needed to
>     > make
>     >     > that
>     >     >     > work.
>     >     >     > > > >>> >
>     >     >     > > > >>> > 2. I'm happy enough to add support for "and"
> and
>     > "or" as
>     >     > part of
>     >     >     > this
>     >     >     > > > >>> KIP
>     >     >     > > > >>> > if people can see a need for it.
>     >     >     > > > >>> >
>     >     >     > > > >>> > In a similar vein, I was wondering about
> whether it
>     > would
>     >     > be
>     >     >     > > > worthwhile
>     >     >     > > > >>> > having the equivalent of an "else" clause
> (what's in
>     > the
>     >     > KIP is
>     >     >     > > > >>> basically
>     >     >     > > > >>> > an "if" statement). Without support for "else"
> I
>     > think
>     >     > people
>     >     >     > would
>     >     >     > > > >>> often
>     >     >     > > > >>> > need two conditionals, with the condition of
> one
>     > being the
>     >     >     > negation
>     >     >     > > > of
>     >     >     > > > >>> the
>     >     >     > > > >>> > condition of another.
>     >     >     > > > >>> >
>     >     >     > > > >>> > 3. I can see the attraction of an expression
>     > language. The
>     >     > pros
>     >     >     > > > include
>     >     >     > > > >>> > being terse and familiar to programmers and
>     > potentially
>     >     > very
>     >     >     > flexible
>     >     >     > > > >>> if
>     >     >     > > > >>> > that's needed in the future. I had a play and
>     > implemented
>     >     > it
>     >     >     > using
>     >     >     > > > >>> ANTLR
>     >     >     > > > >>> > and it's not difficult to write a grammar and
>     > implement the
>     >     >     > functions
>     >     >     > > > >>> we've
>     >     >     > > > >>> > already discussed and get decent error
> messages when
>     > the
>     >     >     > expression
>     >     >     > > > is
>     >     >     > > > >>> > malformed. So on the one hand I quite like the
> idea.
>     > On
>     >     > the other
>     >     >     > > > hand
>     >     >     > > > >>> it
>     >     >     > > > >>> > feels like overkill for the use cases that have
>     > actually
>     >     > been
>     >     >     > > > >>> identified so
>     >     >     > > > >>> > far.
>     >     >     > > > >>> >
>     >     >     > > > >>> > @Sönke what do you make of the expression
> language
>     > idea?
>     >     >     > > > >>> >
>     >     >     > > > >>> > Kind regards,
>     >     >     > > > >>> >
>     >     >     > > > >>> > Tom
>     >     >     > > > >>> >
>     >     >     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher
> Egerton <
>     >     >     > > > >>> chrise@confluent.io>
>     >     >     > > > >>> > wrote:
>     >     >     > > > >>> >
>     >     >     > > > >>> > > Hi Tom,
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > This looks great and I'd love to see the
>     > out-of-the-box
>     >     > SMTs
>     >     >     > become
>     >     >     > > > >>> even
>     >     >     > > > >>> > > more powerful with the improvements you've
>     > proposed! Got
>     >     > a few
>     >     >     > > > >>> remarks and
>     >     >     > > > >>> > > would be interested in your thoughts:
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > 1. Instead of the new "ConfigDef
> config(Map<String,
>     >     > String>
>     >     >     > props)"
>     >     >     > > > >>> method,
>     >     >     > > > >>> > > what would you think about adopting a similar
>     > approach
>     >     > as the
>     >     >     > > > >>> framework
>     >     >     > > > >>> > > uses with connectors, by adding a "Config
>     >     > validate(Map<String,
>     >     >     > > > >>> String>
>     >     >     > > > >>> > > props)" method that can perform custom
> validation
>     >     > outside of
>     >     >     > what
>     >     >     > > > >>> can be
>     >     >     > > > >>> > > performed by the ConfigDef's
>     > single-property-at-a-time
>     >     >     > validation?
>     >     >     > > > >>> It may
>     >     >     > > > >>> > > be a little heavyweight for use with this
>     > particular
>     >     > SMT, but
>     >     >     > it'd
>     >     >     > > > >>> provide
>     >     >     > > > >>> > > more flexibility for other SMT
> implementations and
>     > would
>     >     >     > mirror an
>     >     >     > > > >>> API that
>     >     >     > > > >>> > > developers targeting the framework are likely
>     > already
>     >     > familiar
>     >     >     > > > with.
>     >     >     > > > >>> > > 2. The possibility for adding the logical
> operators
>     >     > "and" and
>     >     >     > "or"
>     >     >     > > > is
>     >     >     > > > >>> > > mentioned, but only as a potential future
> change
>     > and not
>     >     > one
>     >     >     > > > >>> proposed by
>     >     >     > > > >>> > > this KIP. Why not include those operators
> sooner
>     > rather
>     >     > than
>     >     >     > later?
>     >     >     > > > >>> > > 3. The syntax for named conditions that are
> then
>     >     > referenced in
>     >     >     > > > >>> logical
>     >     >     > > > >>> > > operators is... tricky. It took me a few
> attempts
>     > to
>     >     > grok the
>     >     >     > > > example
>     >     >     > > > >>> > > provided in the KIP after reading Sönke's
> question
>     > about
>     >     > the
>     >     >     > > > example
>     >     >     > > > >>> for
>     >     >     > > > >>> > > negation. What about a more sophisticated
> but less
>     >     > verbose
>     >     >     > syntax
>     >     >     > > > >>> that
>     >     >     > > > >>> > > supports a single configuration for the
> condition,
>     > even
>     >     > with
>     >     >     > > > logical
>     >     >     > > > >>> > > operators? I'm thinking something like
>     >     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     >     > > > not(has-header:my-header)"
>     >     >     > > > >>> > > instead of the
>     > "transforms.conditionalExtract.condition:
>     >     >     > > > >>> not:hasMyHeader"
>     >     >     > > > >>> > > and
>     > "transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > has-header:my-header" properties. If support
> for a
>     >     > logical
>     >     >     > "and" is
>     >     >     > > > >>> added,
>     >     >     > > > >>> > > this could then be expanded to something like
>     >     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     >     > > > and(has-header(my-header),
>     >     >     > > > >>> > > not(topic-matches(my-prefix-.*)))". There
> would be
>     >     > additional
>     >     >     > > > >>> complexity
>     >     >     > > > >>> > > here with the need to escape parentheses and
>     > commas that
>     >     > are
>     >     >     > > > >>> intended to be
>     >     >     > > > >>> > > treated literally (as part of a header name,
> for
>     > example)
>     >     >     > instead
>     >     >     > > > of
>     >     >     > > > >>> as
>     >     >     > > > >>> > > part of the syntax for the condition itself,
> but a
>     > little
>     >     >     > > > additional
>     >     >     > > > >>> > > complexity for edge cases like that may be
>     > warranted if
>     >     > it
>     >     >     > heavily
>     >     >     > > > >>> reduces
>     >     >     > > > >>> > > complexity for the common cases. The
> rationale for
>     > the
>     >     > proposed
>     >     >     > > > >>> > > parentheses-based syntax here instead of
> what's
>     >     > mentioned in
>     >     >     > the
>     >     >     > > > KIP
>     >     >     > > > >>> > > (something like "and: <condition1>,
> <condition2>")
>     > is to
>     >     > help
>     >     >     > with
>     >     >     > > > >>> > > readability; we probably wouldn't need that
> with
>     > the
>     >     > approach
>     >     >     > of
>     >     >     > > > >>> naming
>     >     >     > > > >>> > > conditions via separate properties, but
> things may
>     > get a
>     >     > little
>     >     >     > > > >>> nasty with
>     >     >     > > > >>> > > literal conditions included there,
> especially with
>     > the
>     >     >     > possibility
>     >     >     > > > of
>     >     >     > > > >>> > > nesting. Finally, the shift in syntax here
> should
>     > make it
>     >     >     > easier to
>     >     >     > > > >>> handle
>     >     >     > > > >>> > > cases like the header value matching brought
> up by
>     >     > Sönke; it
>     >     >     > might
>     >     >     > > > >>> look
>     >     >     > > > >>> > > something like "header-matches(header-name,
>     >     >     > header-value-pattern)".
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > Cheers,
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > Chris
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
>     >     >     > tbentley@redhat.com>
>     >     >     > > > >>> wrote:
>     >     >     > > > >>> > >
>     >     >     > > > >>> > > > Hi Sönke,
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Thanks for taking a look.
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Let me answer in reverse order, since I
> think it
>     > might
>     >     > make
>     >     >     > more
>     >     >     > > > >>> sense
>     >     >     > > > >>> > > that
>     >     >     > > > >>> > > > way...
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Also, while writing that, I noticed that
> you
>     > have a
>     >     > version
>     >     >     > with
>     >     >     > > > >>> and
>     >     >     > > > >>> > > > > without "name" for your transformation
> in the
>     > KIP:
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > has-header:my-header
>     >     >     > > > >>> > > > > and
>     >     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     >     > has-header:my-header
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > The example
>     >     >     > > > >>> > > >
>  transforms.conditionalExtract.condition:
>     >     >     > has-header:my-header
>     >     >     > > > >>> > > > is a "has-header" condition (the prefix of
> the
>     > config
>     >     > value),
>     >     >     > > > >>> which will
>     >     >     > > > >>> > > > match records with a "my-header" header
> (given
>     > in the
>     >     >     > suffix).
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > The other example given is:
>     >     >     > > > >>> > > >
>  transforms.conditionalExtract.condition:
>     >     > not:hasMyHeader
>     >     >     > > > >>> > > >
>     >     >  transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > > has-header:my-header
>     >     >     > > > >>> > > > The root of the condition is a "not"
> condition
>     > (the
>     >     > prefix
>     >     >     > of the
>     >     >     > > > >>> value
>     >     >     > > > >>> > > for
>     >     >     > > > >>> > > > the
> transforms.conditionalExtract.condition key)
>     > of
>     >     > another
>     >     >     > named
>     >     >     > > > >>> > > condition
>     >     >     > > > >>> > > > called "hasMyHeader" (the suffix). Any name
>     > could be
>     >     > used
>     >     >     > for the
>     >     >     > > > >>> other
>     >     >     > > > >>> > > > condition. That other condition is
> configured at
>     >     >     > > > >>> > > >
>     >     > "transforms.conditionalExtract.condition.<conditionName>".
>     >     >     > That
>     >     >     > > > >>> condition
>     >     >     > > > >>> > > > is a "has-header" condition (the prefix),
> which
>     > will
>     >     > match
>     >     >     > > > records
>     >     >     > > > >>> with a
>     >     >     > > > >>> > > > "my-header" header (given in the suffix).
> So the
>     >     >     > "has-header:"
>     >     >     > > > >>> condition
>     >     >     > > > >>> > > > type would always require a suffix, as
> would the
>     > "not:"
>     >     >     > condition
>     >     >     > > > >>> type.
>     >     >     > > > >>> > > > Hypothetically you could have a "true"
> condition
>     > type
>     >     > (which
>     >     >     > > > would
>     >     >     > > > >>> not
>     >     >     > > > >>> > > > require a suffix), and the hypothetical
> binary
>     >     > conditions
>     >     >     > "and:"
>     >     >     > > > >>> and
>     >     >     > > > >>> > > "or:"
>     >     >     > > > >>> > > > would require a pair of other condition
> names.
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > So what's proposed is a scheme for encoding
>     > conditions
>     >     > where
>     >     >     > the
>     >     >     > > > >>> > > condition
>     >     >     > > > >>> > > > type is the prefix of the value of some
>     > "....condition"
>     >     >     > config
>     >     >     > > > >>> key, and
>     >     >     > > > >>> > > the
>     >     >     > > > >>> > > > optional suffix provides parameters for the
>     > condition.
>     >     > This
>     >     >     > makes
>     >     >     > > > >>> those
>     >     >     > > > >>> > > > parameters a bit inflexible, but is
> relatively
>     >     > succinct.
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > This leads on to your first point. You're
> right
>     > that
>     >     > use
>     >     >     > cases
>     >     >     > > > >>> might
>     >     >     > > > >>> > > appear
>     >     >     > > > >>> > > > which need other conditions, and we should
> make
>     > it
>     >     > flexible
>     >     >     > > > enough
>     >     >     > > > >>> to be
>     >     >     > > > >>> > > > able to cope with future use cases. On the
> other
>     > hand,
>     >     > I was
>     >     >     > > > >>> concerned
>     >     >     > > > >>> > > that
>     >     >     > > > >>> > > > we end up with something which is quite
>     > complicated to
>     >     >     > configure.
>     >     >     > > > >>> (There
>     >     >     > > > >>> > > > comes a point where it might makes more
> sense
>     > for the
>     >     > user to
>     >     >     > > > >>> write their
>     >     >     > > > >>> > > > own SMT).
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Just of the top of my head it might look
> like:
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> type:has-header
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > > header-name:my-header
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > field-value:my-value
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > That won't work because the format is
> basically a
>     >     >     > > > >>> Properties/Map<String,
>     >     >     > > > >>> > > > String> and what you've suggested has
> duplicate
>     > keys.
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > One thing I did briefly consider what the
>     > ability to
>     >     > treat
>     >     >     > > > >>> conditions as
>     >     >     > > > >>> > > > Configurable objects in their own right
> (opening
>     > up the
>     >     >     > > > >>> possibility of
>     >     >     > > > >>> > > > people supplying their own Conditions,
> just like
>     > they
>     >     > can
>     >     >     > supply
>     >     >     > > > >>> their
>     >     >     > > > >>> > > own
>     >     >     > > > >>> > > > SMTs). That might be configured something
> like
>     > this:
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > >
>  transforms.conditionalExtract.condition: not
>     >     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.not.type:
>     >     > Not
>     >     >     > > > >>> > > >
>     >     >  transforms.conditionalExtract.condition.not.negated: foo
>     >     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.foo.type:
>     >     >     > > > >>> HasHeaderWithValue
>     >     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.foo.header:
>     >     >     > my-header
>     >     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.foo.value:
>     >     >     > my-value
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > I didn't propose that I couldn't see the
> use
>     > cases to
>     >     > justify
>     >     >     > > > this
>     >     >     > > > >>> kind
>     >     >     > > > >>> > > of
>     >     >     > > > >>> > > > complexity, especially as the common case
> would
>     > surely
>     >     > be
>     >     >     > > > matching
>     >     >     > > > >>> > > against
>     >     >     > > > >>> > > > topic name (to be honest I wasn't
> completely
>     > convinced
>     >     > by the
>     >     >     > > > need
>     >     >     > > > >>> for
>     >     >     > > > >>> > > > "has-header"). In the current version of
> the KIP
>     >     > that's just
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > >
>  transforms.conditionalExtract.condition:
>     >     > topic-matches:
>     >     >     > > > >>> my-prefix-.*
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > but using the more flexible scheme that
> would
>     > require
>     >     >     > something
>     >     >     > > > >>> more like
>     >     >     > > > >>> > > > this:
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > >
>  transforms.conditionalExtract.condition: bar
>     >     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.bar.type:
>     >     >     > TopicMatch
>     >     >     > > > >>> > > >
>     >     >  transforms.conditionalExtract.condition.bar.pattern:
>     >     >     > > > >>> my-prefix-.*
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > If people know of use cases which would
> justify
>     > more
>     >     >     > > > >>> sophistication, I'm
>     >     >     > > > >>> > > > happy to reconsider.
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Thanks again for taking a look!
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > Tom
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>     >     >     > > > >>> > > > <so...@opencore.com.invalid>
> wrote:
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > > > > Hi Tom,
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > sounds useful to me, thanks for the KIP!
>     >     >     > > > >>> > > > > The only thought that I had while
> reading was
>     > that
>     >     > this
>     >     >     > will
>     >     >     > > > >>> probably
>     >     >     > > > >>> > > > raise
>     >     >     > > > >>> > > > > questions about more involved conditions
> fairly
>     >     > quickly.
>     >     >     > For
>     >     >     > > > >>> example
>     >     >     > > > >>> > > the
>     >     >     > > > >>> > > > > "has-header" will cause an appetite for
>     > conditions
>     >     > like
>     >     >     > > > >>> > > > > "this-header-has-that-value".
>     >     >     > > > >>> > > > > This would necessitate two parameters to
> be
>     > passed
>     >     > into the
>     >     >     > > > >>> condition,
>     >     >     > > > >>> > > > > which I think is not currently included
> in the
>     > KIP.
>     >     > I am
>     >     >     > not
>     >     >     > > > >>> saying add
>     >     >     > > > >>> > > > > this now, but might it make sense to
> discuss a
>     >     > concept of
>     >     >     > how
>     >     >     > > > >>> that
>     >     >     > > > >>> > > might
>     >     >     > > > >>> > > > > look now, to avoid potential changes
> later on.
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > Just of the top of my head it might look
> like:
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> type:has-header
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > > header-name:my-header
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > field-value:my-value
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > Also, while writing that, I noticed that
> you
>     > have a
>     >     > version
>     >     >     > > > with
>     >     >     > > > >>> and
>     >     >     > > > >>> > > > > without "name" for your transformation
> in the
>     > KIP:
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     > transforms.conditionalExtract.condition.hasMyHeader:
>     >     >     > > > >>> > > has-header:my-header
>     >     >     > > > >>> > > > > and
>     >     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     >     > has-header:my-header
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > Is this intentional and the name is
> optional?
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > Best regards,
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > Sönke
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley
> <
>     >     >     > tbentley@redhat.com>
>     >     >     > > > >>> wrote:
>     >     >     > > > >>> > > > > >
>     >     >     > > > >>> > > > > > Does anyone have any comments,
> feedback or
>     >     > thoughts about
>     >     >     > > > this?
>     >     >     > > > >>> > > > > >
>     >     >     > > > >>> > > > > > Thanks,
>     >     >     > > > >>> > > > > >
>     >     >     > > > >>> > > > > > Tom
>     >     >     > > > >>> > > > > >
>     >     >     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom
> Bentley
>     > <
>     >     >     > > > >>> tbentley@redhat.com>
>     >     >     > > > >>> > > > > wrote:
>     >     >     > > > >>> > > > > >
>     >     >     > > > >>> > > > > > > Hi,
>     >     >     > > > >>> > > > > > >
>     >     >     > > > >>> > > > > > > I've opened KIP-585 which is
> intended to
>     > provide
>     >     > a
>     >     >     > > > mechanism
>     >     >     > > > >>> to
>     >     >     > > > >>> > > > > > > conditionally apply SMTs in Kafka
> Connect.
>     > I'd be
>     >     >     > grateful
>     >     >     > > > >>> for any
>     >     >     > > > >>> > > > > > > feedback:
>     >     >     > > > >>> > > > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > >
>     >     >     > > > >>>
>     >     >     > > >
>     >     >     >
>     >     >
>     >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>     >     >     > > > >>> > > > > > >
>     >     >     > > > >>> > > > > > > Many thanks,
>     >     >     > > > >>> > > > > > >
>     >     >     > > > >>> > > > > > > Tom
>     >     >     > > > >>> > > > > > >
>     >     >     > > > >>> > > > >
>     >     >     > > > >>> > > >
>     >     >     > > > >>> > >
>     >     >     > > > >>>
>     >     >     > > > >>>
>     >     >     > > >
>     >     >     > >
>     >     >     >
>     >     >
>     >     >
>     >
>     >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi Tom,
Looks good to me.

Thanks,
Andrew

On 01/05/2020, 16:24, "Tom Bentley" <tb...@redhat.com> wrote:

    Hi,

    I've updated the KIP to reflect recent discussions. Please note the
    following:

    1. I've described a HasHeaderKey predicate rather than HeaderKeyMatches
    because at the moment Headers doesn't expose the whole set of headers, only
    those with a specific name. Exposing this would significantly increase the
    scope of this KIP but relatively little extra benefit.
    2. I used org.apache.kafka.connect.transformer.predicates rather than
    org.apache.kafka.connect.predicates, since I thought it better reflected
    the use of predicates within transforms. I'm flexible on this one if
    someone has a good case for the original name. For example the original
    package name would be more appropriate if we were expecting Connectors to
    make use of Predicates somehow.
    3. I've dropped the special case of not needing to provide a FQN for
    predicate classes in that package, since this isn't something supported by
    transformations themselves as far as I know. I like the idea, but it seemed
    inconsistent to need it for transformations but not for predicates.
    4. Chris, I've used your suggestions for a validate() method for the extra
    configurability needed. I've also added this to Predicate (even though I
    don't need it) for consistency with Connector and Transformation.
    5. For negating the result of a predicate I decided it was clearer to just
    have a "negate" config parameter, supported by all predicates, than to
    reach for more punctuation.

    I'd be grateful for any more feedback people might have.

    Kind regards,

    Tom

    On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <an...@live.com>
    wrote:

    > Hi Tom,
    > I think we should go for a consistent naming convention for the
    > predicates, maybe so they
    > read nicely if you say "IF" first. Here's a suggestion for predicates that
    > the KIP could introduce:
    >  - TopicNameMatches
    >  - HeaderKeyMatches
    >  - RecordIsTombstone
    >
    > On naming, I also suggest using
    > org.apache.kafka.connect.predicates.Predicate
    > as the name for the interface.
    >
    > I hadn't settled on a preference for negation. I suppose there are three
    > results
    > from evaluating a predicate - true, false or an exception caught by Kafka
    > Connect.
    > I suggest the exception case is always a predicate fail so the guarded
    > transformation
    > is skipped. Initially I'd preferred having the predicate be configured to
    > return a
    > negated result to give a "not match" behaviour. Thinking again now, I
    > prefer something like:
    >
    > transforms: t1
    > transforms.t1.?type: RecordIsTombstone
    > transforms.t1.type: org.apache.kafka.connect.transforms.InsertField$Value
    > transforms.t1.static.field: deleted
    > transforms.t1.static.value: true
    >
    > The "?" means the transform guarded by the predicate runs in the Predicate
    > returns true.
    >
    > transforms: t2
    > transforms.t2.?!type: org.apache.kafka.connect.predicates.TopicNameMatch
    > transforms.t2.?regex: my-prefix-.*
    > transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
    > transforms.t2.field: c1
    >
    > The "?!" means the transform guarded by the predicate runs if the Predicate
    > returns false.
    >
    > I think adding a Filter SMT that uses a Predicate is a nice idea.
    >
    > Cheers,
    > Andrew
    >
    > On 28/04/2020, 09:44, "Tom Bentley" <tb...@redhat.com> wrote:
    >
    >     Hi Andrew and Chris,
    >
    >     Firstly, thanks for the input.
    >
    >     To me, the ?type syntax has some pros and cons. On the pros side:
    >     * it's pretty succinct
    >     * it's flexible enough for the use cases we've identified so far.
    >
    >     On the cons side:
    >     * it's a bit cryptic; I don't think people who didn't know it already
    > would
    >     be able to guess what ?type meant.
    >     * it doesn't work so well if there is a whole chain of transformations
    >     which need to be applied conditionally (that can still be done, of
    > course,
    >     but the user has to duplicate the configuration for each SMT). Nor
    > does it
    >     work if you need something akin to a "nested if" (you could fake it
    > using
    >     And), but we don't really expect people to need that.
    >
    >     Andrew, you didn't give an example of a negated match. Would you use
    >     something like !?type for that, or did you have another idea?
    >
    >     On balance, I think this is a reasonable trade off.
    >
    >     I agree that exposing RecordPredicate as a public API, while providing
    >     common implementation(s) would allow most use cases to be covered
    > without
    >     coding, but also allow people with specific unaddressed needs to
    > implement
    >     their own without having to write a streaming application to do so.
    > Again,
    >     I think this is a reasonable trade off.
    >
    >     It would be helpful to have a consensus about exactly which condition
    >     implementations the KIP should provide. I think we all agree that
    >     TopicMatches (or should it be TopicNameMatches?) should be included.
    >     HasHeader (or should it be HasHeaderWithName or HasHeaderMatching for a
    >     regex version?), And and Or all seem like they could be useful, even
    > though
    >     no one has pointed to concrete use cases. It might also be worth
    >     considering that it would be natural to add a Filter SMT which filtered
    >     records using a RecordPredicate. Perhaps that provides motivation for
    >     conditions on the record key as well? If people think the Filter SMT
    > would
    >     be useful I'm willing to include it in this KIP, since it seems
    >     straightforward.
    >
    >     If people are happy with this approach, I'll update the KIP.
    >
    >     Kind regards,
    >
    >     Tom
    >
    >     On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <
    > andrew_schofield@live.com>
    >     wrote:
    >
    >     > Hi,
    >     > I'm suggesting that RecordPredicate be another pluggable interface
    > that
    >     > users
    >     > could conceivably implement on their own, and that the KIP
    > introduces the
    >     > most likely ones as supplied implementations, much as there are SMT
    >     > implementations such as HoistField. I don't really see users
    > implementing
    >     > their own RecordPredicate, but the supplied ones would probably have
    > a
    >     > common
    >     > interface and it seems a pity not to expose that as an external API.
    >     >
    >     > SMT is essentially a toolkit for Kafka Connect and I think
    > RecordPredicate
    >     > would be a nice minor enhancement and I'd be happy to contribute
    > code.
    >     >
    >     > In order to make it simpler to configure, I suggest that the class
    > name
    >     > configured
    >     > using "?type" would be assumed to be in package
    >     > org.apache.kafka.connect.predicates
    >     > in the absence of a package name so that shorter names such as
    >     > "TopicMatches"
    >     > could be used.
    >     >
    >     > Cheers,
    >     > Andrew Schofield
    >     > IBM Event Streams
    >     >
    >     > On 25/04/2020, 21:22, "Christopher Egerton" <ch...@confluent.io>
    > wrote:
    >     >
    >     >     Hi Andrew,
    >     >
    >     >     I know a DSL seems like overkill, and I'm not attached to it by
    > any
    >     > means,
    >     >     but I do think it serves a vital purpose in that it allows
    > people who
    >     > don't
    >     >     know how or have the time to write Java code to act on data being
    >     > processed
    >     >     by connectors.
    >     >
    >     >     Are you proposing that the "RecordPredicate" be another pluggable
    >     > interface
    >     >     that users could then implement on their own? Or, would this be
    > purely
    >     > a
    >     >     syntactic expansion to SMT configuration with some subset of
    > hard-coded
    >     >     predicates provided by the framework? I think there's value in
    > the
    >     > latter,
    >     >     but the former doesn't seem like it'd bring much to the table as
    > far as
    >     >     users are concerned, and for developers, it is an improvement,
    > but not
    >     > a
    >     >     major one.
    >     >
    >     >     Cheers,
    >     >
    >     >     Chris
    >     >
    >     >     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
    >     >     schofieldandrewj@gmail.com> wrote:
    >     >
    >     >     > I wonder whether we're getting a bit overcomplicated with
    > this. I
    >     > think all
    >     >     > that's required here is to add an optional guard predicate for
    > a
    >     >     > Transformation.
    >     >     > The predicate cannot end the Transformation chain, but it can
    > allow
    >     > or
    >     >     > prevent a particular
    >     >     > Transformation from running.
    >     >     >
    >     >     > How about this as syntax?
    >     >     > transforms: extractInt
    >     >     > transforms.extractInt.?type:
    >     >     > org.apache.kafka.connect.predicates.TopicMatches
    >     >     > transforms.extractInt.?regex: my-prefix-.*
    >     >     > transforms.extractInt.type:
    >     >     > org.apache.kafka.connect.transforms.ExtractField$Key
    >     >     > transforms.extractInt.field: c1
    >     >     >
    >     >     > The idea is that a Transformation can have an optional
    >     > RecordPredicate.
    >     >     > The RecordPredicate
    >     >     > can have configuration items, in a similar way to a
    > Transformation.
    >     > The
    >     >     > syntax of
    >     >     > using a '?' prefix to separate configuration for the
    > RecordPredicate
    >     > from
    >     >     > the configuration
    >     >     > for the Transformation could conceivably clash with an existing
    >     >     > Transformation
    >     >     > but the chances are minimal.
    >     >     >
    >     >     > This syntax doesn't allow for an 'else', but if the KIP offers
    > say a
    >     >     > TopicMatches predicate
    >     >     > then that can be configured to return FALSE if the predicate
    > matches.
    >     >     >
    >     >     > I feel that a DSL for SMTs is overkill. If you need something
    > that
    >     >     > complex, it's
    >     >     > perhaps too complex for a transformation chain and it's really
    > a
    >     > streaming
    >     >     > application.
    >     >     >
    >     >     > Andrew Schofield
    >     >     > IBM Event Streams
    >     >     >
    >     >     > On 2020/04/08 21:39:31, Christopher Egerton <
    > chrise@confluent.io>
    >     > wrote:
    >     >     > > Hi Tom,
    >     >     > >
    >     >     > > With regards to the potential Transformation::validate
    > method, I
    >     > don't
    >     >     > > quite follow your objections. The AbstractHerder class,
    >     > ConnectorConfig
    >     >     > > class, and embedding of ConfigDefs that happens with the two
    > is all
    >     >     > > internal logic and we're allowed to modify it however we
    > want, as
    >     > long as
    >     >     > > it doesn't alter any public-facing APIs (and if it does, we
    > just
    >     > have to
    >     >     > > document those changes in a KIP). We don't have to embed the
    >     > ConfigDef
    >     >     > for
    >     >     > > a transformation chain inside another ConfigDef if the API
    > we want
    >     > to
    >     >     > > present to our users doesn't play well with that part of the
    > code
    >     > base.
    >     >     > > Additionally, beyond aligning with the existing API provided
    > for
    >     >     > > connectors, another advantage is that it becomes possible to
    >     > validate
    >     >     > > properties for a configuration in the context of all other
    >     > properties, so
    >     >     > > to be clear, it's not just about preserving what may be
    > perceived
    >     > as a
    >     >     > > superficial similarity and comes with real functional
    > benefits
    >     > that can't
    >     >     > > be provided (at least, not as easily) by dynamic
    > construction of a
    >     >     > > ConfigDef object.
    >     >     > >
    >     >     > > As far as the new proposal goes, I hate to say it, but I
    > think
    >     > we're
    >     >     > stuck
    >     >     > > with the worst of both worlds here. Adding the new
    > RecordPredicate
    >     >     > > interface seems like it defeats the whole purpose of SMTs,
    > which
    >     > is to
    >     >     > > allow manipulation of a connector's event stream by users
    > who don't
    >     >     > > necessarily know how or have the time to write Java code of
    > their
    >     > own.
    >     >     > This
    >     >     > > is also why I'm in favor of adding a lightweight DSL for the
    >     > condition;
    >     >     > > emphasizing readability for people who aren't very familiar
    > with
    >     > Connect
    >     >     > > and just want to get something going quickly should be a
    > priority
    >     > for
    >     >     > SMTs.
    >     >     > > But if that's not going to happen with this KIP, I'd still
    > choose
    >     > the
    >     >     > > simpler, less-flexible approach initially outlined, in order
    > to
    >     > keep
    >     >     > things
    >     >     > > simple for people creating connectors and try to let them
    >     > accomplish what
    >     >     > > they want via configuration, not code.
    >     >     > >
    >     >     > > With regards to the question about where the line should be
    > drawn
    >     > and how
    >     >     > > much is too much and comparisons to other stream processing
    >     > frameworks, I
    >     >     > > think the nature of SMTs draws the line quite nicely: you
    > can only
    >     >     > process
    >     >     > > one message at a time. There's plenty of use cases out there
    > for
    >     >     > > heavier-duty processing frameworks like Kafka Streams, with
    >     > aggregate
    >     >     > > operations, joining of streams, expanding a single message
    > into
    >     > multiple
    >     >     > > messages, etc. With SMTs, none of this is possible; the
    > general
    >     > use case
    >     >     > is
    >     >     > > to filter and clean a stream of data. If any of the
    > heavier-weight
    >     > logic
    >     >     > > provided by, e.g., Streams, isn't required for a project, it
    >     > should be
    >     >     > > possible to get along with just a collection of sink
    > connectors,
    >     > source
    >     >     > > connectors, a converter or two, and SMTs that smooth over any
    >     > differences
    >     >     > > in data format between what source connectors produce and
    > what sink
    >     >     > > connectors expect. This is why I'm comfortable suggesting
    > heavy
    >     > expansion
    >     >     > > of the out-of-the-box SMTs that we provide with Connect; as
    > long as
    >     >     > they're
    >     >     > > reasonable to configure, they can greatly reduce the
    > operational
    >     > burden
    >     >     > on
    >     >     > > anyone running and/or using a Connect cluster since they
    > might
    >     > entirely
    >     >     > > replace additional services that would otherwise be required.
    >     >     > >
    >     >     > > All that said--this is a huge ask of someone who just wants
    > to
    >     > support
    >     >     > the
    >     >     > > SMT equivalent of an "if" statement, and it's totally
    >     > understandable if
    >     >     > > that's too much to ask. The one concern I have left is that
    > if we
    >     > expand
    >     >     > > the SMT in the future, there become compatibility concerns
    > since
    >     > SMTs are
    >     >     > > pretty tightly-coupled with the worker on which they run
    > (although
    >     >     > > technically, with some classpath/plugin path shuffling, you
    > can run
    >     >     > > different versions of the out-of-the-box SMTs from the
    > Connect
    >     > worker on
    >     >     > > which they're run). Someone might write a connector config
    > with the
    >     >     > > If/Conditional/Whatever SMT with a condition type that works
    > on one
    >     >     > worker,
    >     >     > > but that doesn't work on another worker that's running an
    > earlier
    >     > version
    >     >     > > of Connect. This is why I'm in favor of adding extra
    > predicates now
    >     >     > instead
    >     >     > > of later; if we're going to implement what has the potential
    > to be
    >     > a bit
    >     >     > of
    >     >     > > a Swiss army knife SMT with room for future expansion of
    >     > configuration,
    >     >     > and
    >     >     > > we can think of a reasonable way to add that functionality
    > now, it
    >     > seems
    >     >     > > better for users and administrators of Connect to try to do
    > that
    >     > now
    >     >     > > instead of later.
    >     >     > >
    >     >     > > Cheers,
    >     >     > >
    >     >     > > Chris
    >     >     > >
    >     >     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <
    > tbentley@redhat.com>
    >     > wrote:
    >     >     > >
    >     >     > > > Since no one objected I've updated the KIP with the
    > revised way
    >     > to
    >     >     > > > configure this transformation.
    >     >     > > >
    >     >     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <
    > tbentley@redhat.com>
    >     >     > wrote:
    >     >     > > >
    >     >     > > > > To come back about a point Chris made:
    >     >     > > > >
    >     >     > > > > 1. Instead of the new "ConfigDef config(Map<String,
    > String>
    >     > props)"
    >     >     > > > method,
    >     >     > > > >> what would you think about adopting a similar approach
    > as the
    >     >     > framework
    >     >     > > > >> uses with connectors, by adding a "Config
    > validate(Map<String,
    >     >     > String>
    >     >     > > > >> props)" method that can perform custom validation
    > outside of
    >     > what
    >     >     > can be
    >     >     > > > >> performed by the ConfigDef's single-property-at-a-time
    >     > validation?
    >     >     > It
    >     >     > > > may
    >     >     > > > >> be a little heavyweight for use with this particular
    > SMT, but
    >     > it'd
    >     >     > > > provide
    >     >     > > > >> more flexibility for other SMT implementations and would
    >     > mirror an
    >     >     > API
    >     >     > > > >> that
    >     >     > > > >> developers targeting the framework are likely already
    >     > familiar with.
    >     >     > > > >>
    >     >     > > > >
    >     >     > > > > The validate() + config() approach taken for Connectors
    >     > doesn't quite
    >     >     > > > work
    >     >     > > > > for Transformations.
    >     >     > > > >
    >     >     > > > > The default Connector.validate() basically just calls
    >     >     > > > > `config().validate(connectorConfigs)` and returns a
    > Config.
    >     >     > Separately
    >     >     > > > the
    >     >     > > > > AbstractHeader also calls `config()` and uses the Config
    > and
    >     >     > ConfigDef to
    >     >     > > > > build the ConfigInfos. The contract is not really
    > defined well
    >     >     > enough in
    >     >     > > > > the javadoc, but this means a connector can use
    > validate() to
    >     > build
    >     >     > the
    >     >     > > > > ConfigDef which it will return from config().
    >     >     > > > >
    >     >     > > > > For Transformations we don't really want validate() to
    > perform
    >     >     > validation
    >     >     > > > > at all since ultimately the transformations config will
    > be
    >     > embedded
    >     >     > in
    >     >     > > > the
    >     >     > > > > connector's and will be validated by the connector
    > itself. It
    >     >     > wouldn't be
    >     >     > > > > harmful if it _did_ perform validation, just
    > unnecessary. But
    >     > having
    >     >     > a
    >     >     > > > > validate() method that ought not to validate() seems
    > like a
    >     > recipe
    >     >     > for
    >     >     > > > > confusion.
    >     >     > > > >
    >     >     > > > > Also problematic is that there's no use for the Config
    >     > returned from
    >     >     > > > > Transformations.validate().
    >     >     > > > >
    >     >     > > > > So I'm not convinced that the superficial similarity
    > really
    >     > gains
    >     >     > > > anything.
    >     >     > > > >
    >     >     > > > > Kind regards,
    >     >     > > > >
    >     >     > > > > Tom
    >     >     > > > >
    >     >     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
    >     > tbentley@redhat.com>
    >     >     > wrote:
    >     >     > > > >
    >     >     > > > >> Hi,
    >     >     > > > >>
    >     >     > > > >> Hi all,
    >     >     > > > >>
    >     >     > > > >> Thanks for the discussion so far.
    >     >     > > > >>
    >     >     > > > >> It seems a bit weird to me that when configuring the
    >     > Conditional SMT
    >     >     > > > with
    >     >     > > > >> a DSL you would use a concise, intuitive DSL for
    > expressing
    >     > the
    >     >     > > > condition,
    >     >     > > > >> but not for the transforms that it's guarding. It also
    > seems
    >     >     > natural, if
    >     >     > > > >> you support this for conditionally applying SMTs, that
    > you'd
    >     > soon
    >     >     > want
    >     >     > > > to
    >     >     > > > >> support the same thing for a generic filter transformer.
    >     > Then, if
    >     >     > you
    >     >     > > > can
    >     >     > > > >> configure a filter transformation using this DSL, it
    > becomes
    >     > odd
    >     >     > that
    >     >     > > > you
    >     >     > > > >> can't do this for mapping transformations. I think it
    > would
    >     > be a
    >     >     > > > mistake to
    >     >     > > > >> go specifying an expression language for conditions when
    >     > really that
    >     >     > > > might
    >     >     > > > >> just be part of a language for transformations.
    >     >     > > > >>
    >     >     > > > >> I think it would be possible today to write an SMT which
    >     > allowed
    >     >     > you to
    >     >     > > > >> express the transformation in a DSL. Concretely, it's
    >     > possible to
    >     >     > > > imagine a
    >     >     > > > >> single transformation taking a DSL something like this:
    >     >     > > > >>
    >     >     > > > >>   compose(
    >     >     > > > >>     Flatten.Key(delimiter: ':'),
    >     >     > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
    >     >     > > > >>        then: Flatten.Value(delimiter: '/')))
    >     >     > > > >>
    >     >     > > > >> All I've really done here, beyond what was already
    > proposed,
    >     > is
    >     >     > rename
    >     >     > > > >> Conditional to If, imagine a couple more bits of syntax
    >     >     > > > >> (SMTs being constructed by class name and named argument
    >     > invocation
    >     >     > > > >> syntax to support SMT constructors) and add a higher
    > level
    >     > compose()
    >     >     > > > >> function for chaining SMTs (which could easily be
    > replaced
    >     > with
    >     >     > brace
    >     >     > > > >> delimited blocks).
    >     >     > > > >>
    >     >     > > > >> That may be a discussion we should have, but I think in
    >     > _this_ KIP
    >     >     > we
    >     >     > > > >> should focus on the conditional part, since the above
    > example
    >     >     > hopefully
    >     >     > > > >> shows that it would be possible to reuse it in a DSL if
    > there
    >     > was
    >     >     > > > appetite
    >     >     > > > >> for that.
    >     >     > > > >>
    >     >     > > > >> With that in mind, and since my original suggestion for
    > an
    >     >     > abbreviated
    >     >     > > > >> config syntax didn't appeal to people, I propose that we
    >     > stick with
    >     >     > the
    >     >     > > > >> existing norms for configuring this.
    >     >     > > > >> The previous example would look like this:
    >     >     > > > >>
    >     >     > > > >> transformation: flattenKey,if
    >     >     > > > >> transformation.flattenKey.type: Flatten.Key
    >     >     > > > >> transformation.flattenKey.delimiter: :
    >     >     > > > >> transformation.if.type: If
    >     >     > > > >> transformation.if.condition.type: TopicMatches
    >     >     > > > >> transformation.if.condition.pattern: blah-.*
    >     >     > > > >> transformation.if.then: flattenValue
    >     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    >     >     > > > >> transformation.if.then.flattenValue.delimiter: /
    >     >     > > > >>
    >     >     > > > >> Also, I'm inclined to think we should stick to just
    > supporting
    >     >     > > > >> TopicMatches and Not, since we've not identified an
    > actual
    >     > need for
    >     >     > > > >> 'has-header', 'or' and 'and'.
    >     >     > > > >>
    >     >     > > > >> An example of the usage of Not would look like this:
    >     >     > > > >>
    >     >     > > > >> transformation: flattenKey,if
    >     >     > > > >> transformation.flattenKey.type: Flatten.Key
    >     >     > > > >> transformation.flattenKey.delimiter: :
    >     >     > > > >> transformation.if.type: If
    >     >     > > > >> transformation.if.condition.type: Not
    >     >     > > > >> transformation.if.condition.operand: startsWithBlah
    >     >     > > > >> transformation.if.condition.operand.startsWithBlah.type:
    >     >     > TopicMatches
    >     >     > > > >>
    > transformation.if.condition.operand.startsWithBlah.pattern:
    >     > blah-.*
    >     >     > > > >> transformation.if.then: flattenValue
    >     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    >     >     > > > >> transformation.if.then.flattenValue.delimiter: /
    >     >     > > > >>
    >     >     > > > >> Hopefully we can agree that this allows the conditional
    > SMT
    >     > part to
    >     >     > make
    >     >     > > > >> progress without getting bogged down.
    >     >     > > > >>
    >     >     > > > >> Thoughts? If this seems broadly reasonable, I'll update
    > the
    >     > KIP.
    >     >     > > > >>
    >     >     > > > >> Kind regards,
    >     >     > > > >>
    >     >     > > > >> Tom
    >     >     > > > >>
    >     >     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
    >     > gunnar@hibernate.org
    >     >     > >
    >     >     > > > >> wrote:
    >     >     > > > >>
    >     >     > > > >>> Hi all,
    >     >     > > > >>>
    >     >     > > > >>> Thanks a lot for this initiative, Tom!
    >     >     > > > >>>
    >     >     > > > >>> To shed some light, the use case where this first came
    > up,
    >     > were
    >     >     > issues
    >     >     > > > >>> we saw with SMTs being applied to the different topics
    >     > produced by
    >     >     > the
    >     >     > > > >>> Debezium change data capture connectors. There are
    > different
    >     > kinds
    >     >     > of
    >     >     > > > >>> topics (for change data, schema history, heartbeats
    > etc.)
    >     > and the
    >     >     > > > >>> record structure to expect may vary between those.
    > Hence we
    >     > saw
    >     >     > issues
    >     >     > > > >>> with SMTs like ExtractField, which for instance only
    > should
    >     > be
    >     >     > applied
    >     >     > > > >>> to all change data topics but not the other ones.
    >     >     > > > >>>
    >     >     > > > >>> I like the overall approach; for Debezium's purposes,
    > the
    >     > simple
    >     >     > topic
    >     >     > > > >>> matching and negation operators would be sufficient
    > already.
    >     > I
    >     >     > agree
    >     >     > > > >>> with Chris and would prefer one single condition
    > attribute,
    >     > which
    >     >     > > > >>> contains a single condition or potentially a logical
    >     > expression
    >     >     > with
    >     >     > > > >>> not, and, etc. I think it's less ambiguous, in
    > particular
    >     > when it
    >     >     > > > >>> comes to ordering of the different conditions and
    >     > determining their
    >     >     > > > >>> precedence.
    >     >     > > > >>>
    >     >     > > > >>> Would love to see this feature in one or another way in
    >     > Connect.
    >     >     > > > >>>
    >     >     > > > >>> Best,
    >     >     > > > >>>
    >     >     > > > >>> --Gunnar
    >     >     > > > >>>
    >     >     > > > >>>
    >     >     > > > >>>
    >     >     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
    >     >     > > > >>> tbentley@redhat.com>:
    >     >     > > > >>> >
    >     >     > > > >>> > Hi Chris and Sönke,
    >     >     > > > >>> >
    >     >     > > > >>> > Using the numbering from Chris's email...
    >     >     > > > >>> >
    >     >     > > > >>> > 1. That's a good point, I'll see what is needed to
    > make
    >     > that
    >     >     > work.
    >     >     > > > >>> >
    >     >     > > > >>> > 2. I'm happy enough to add support for "and" and
    > "or" as
    >     > part of
    >     >     > this
    >     >     > > > >>> KIP
    >     >     > > > >>> > if people can see a need for it.
    >     >     > > > >>> >
    >     >     > > > >>> > In a similar vein, I was wondering about whether it
    > would
    >     > be
    >     >     > > > worthwhile
    >     >     > > > >>> > having the equivalent of an "else" clause (what's in
    > the
    >     > KIP is
    >     >     > > > >>> basically
    >     >     > > > >>> > an "if" statement). Without support for "else" I
    > think
    >     > people
    >     >     > would
    >     >     > > > >>> often
    >     >     > > > >>> > need two conditionals, with the condition of one
    > being the
    >     >     > negation
    >     >     > > > of
    >     >     > > > >>> the
    >     >     > > > >>> > condition of another.
    >     >     > > > >>> >
    >     >     > > > >>> > 3. I can see the attraction of an expression
    > language. The
    >     > pros
    >     >     > > > include
    >     >     > > > >>> > being terse and familiar to programmers and
    > potentially
    >     > very
    >     >     > flexible
    >     >     > > > >>> if
    >     >     > > > >>> > that's needed in the future. I had a play and
    > implemented
    >     > it
    >     >     > using
    >     >     > > > >>> ANTLR
    >     >     > > > >>> > and it's not difficult to write a grammar and
    > implement the
    >     >     > functions
    >     >     > > > >>> we've
    >     >     > > > >>> > already discussed and get decent error messages when
    > the
    >     >     > expression
    >     >     > > > is
    >     >     > > > >>> > malformed. So on the one hand I quite like the idea.
    > On
    >     > the other
    >     >     > > > hand
    >     >     > > > >>> it
    >     >     > > > >>> > feels like overkill for the use cases that have
    > actually
    >     > been
    >     >     > > > >>> identified so
    >     >     > > > >>> > far.
    >     >     > > > >>> >
    >     >     > > > >>> > @Sönke what do you make of the expression language
    > idea?
    >     >     > > > >>> >
    >     >     > > > >>> > Kind regards,
    >     >     > > > >>> >
    >     >     > > > >>> > Tom
    >     >     > > > >>> >
    >     >     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
    >     >     > > > >>> chrise@confluent.io>
    >     >     > > > >>> > wrote:
    >     >     > > > >>> >
    >     >     > > > >>> > > Hi Tom,
    >     >     > > > >>> > >
    >     >     > > > >>> > > This looks great and I'd love to see the
    > out-of-the-box
    >     > SMTs
    >     >     > become
    >     >     > > > >>> even
    >     >     > > > >>> > > more powerful with the improvements you've
    > proposed! Got
    >     > a few
    >     >     > > > >>> remarks and
    >     >     > > > >>> > > would be interested in your thoughts:
    >     >     > > > >>> > >
    >     >     > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String,
    >     > String>
    >     >     > props)"
    >     >     > > > >>> method,
    >     >     > > > >>> > > what would you think about adopting a similar
    > approach
    >     > as the
    >     >     > > > >>> framework
    >     >     > > > >>> > > uses with connectors, by adding a "Config
    >     > validate(Map<String,
    >     >     > > > >>> String>
    >     >     > > > >>> > > props)" method that can perform custom validation
    >     > outside of
    >     >     > what
    >     >     > > > >>> can be
    >     >     > > > >>> > > performed by the ConfigDef's
    > single-property-at-a-time
    >     >     > validation?
    >     >     > > > >>> It may
    >     >     > > > >>> > > be a little heavyweight for use with this
    > particular
    >     > SMT, but
    >     >     > it'd
    >     >     > > > >>> provide
    >     >     > > > >>> > > more flexibility for other SMT implementations and
    > would
    >     >     > mirror an
    >     >     > > > >>> API that
    >     >     > > > >>> > > developers targeting the framework are likely
    > already
    >     > familiar
    >     >     > > > with.
    >     >     > > > >>> > > 2. The possibility for adding the logical operators
    >     > "and" and
    >     >     > "or"
    >     >     > > > is
    >     >     > > > >>> > > mentioned, but only as a potential future change
    > and not
    >     > one
    >     >     > > > >>> proposed by
    >     >     > > > >>> > > this KIP. Why not include those operators sooner
    > rather
    >     > than
    >     >     > later?
    >     >     > > > >>> > > 3. The syntax for named conditions that are then
    >     > referenced in
    >     >     > > > >>> logical
    >     >     > > > >>> > > operators is... tricky. It took me a few attempts
    > to
    >     > grok the
    >     >     > > > example
    >     >     > > > >>> > > provided in the KIP after reading Sönke's question
    > about
    >     > the
    >     >     > > > example
    >     >     > > > >>> for
    >     >     > > > >>> > > negation. What about a more sophisticated but less
    >     > verbose
    >     >     > syntax
    >     >     > > > >>> that
    >     >     > > > >>> > > supports a single configuration for the condition,
    > even
    >     > with
    >     >     > > > logical
    >     >     > > > >>> > > operators? I'm thinking something like
    >     >     > > > >>> > > "transforms.conditionalExtract.condition:
    >     >     > > > not(has-header:my-header)"
    >     >     > > > >>> > > instead of the
    > "transforms.conditionalExtract.condition:
    >     >     > > > >>> not:hasMyHeader"
    >     >     > > > >>> > > and
    > "transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > has-header:my-header" properties. If support for a
    >     > logical
    >     >     > "and" is
    >     >     > > > >>> added,
    >     >     > > > >>> > > this could then be expanded to something like
    >     >     > > > >>> > > "transforms.conditionalExtract.condition:
    >     >     > > > and(has-header(my-header),
    >     >     > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be
    >     > additional
    >     >     > > > >>> complexity
    >     >     > > > >>> > > here with the need to escape parentheses and
    > commas that
    >     > are
    >     >     > > > >>> intended to be
    >     >     > > > >>> > > treated literally (as part of a header name, for
    > example)
    >     >     > instead
    >     >     > > > of
    >     >     > > > >>> as
    >     >     > > > >>> > > part of the syntax for the condition itself, but a
    > little
    >     >     > > > additional
    >     >     > > > >>> > > complexity for edge cases like that may be
    > warranted if
    >     > it
    >     >     > heavily
    >     >     > > > >>> reduces
    >     >     > > > >>> > > complexity for the common cases. The rationale for
    > the
    >     > proposed
    >     >     > > > >>> > > parentheses-based syntax here instead of what's
    >     > mentioned in
    >     >     > the
    >     >     > > > KIP
    >     >     > > > >>> > > (something like "and: <condition1>, <condition2>")
    > is to
    >     > help
    >     >     > with
    >     >     > > > >>> > > readability; we probably wouldn't need that with
    > the
    >     > approach
    >     >     > of
    >     >     > > > >>> naming
    >     >     > > > >>> > > conditions via separate properties, but things may
    > get a
    >     > little
    >     >     > > > >>> nasty with
    >     >     > > > >>> > > literal conditions included there, especially with
    > the
    >     >     > possibility
    >     >     > > > of
    >     >     > > > >>> > > nesting. Finally, the shift in syntax here should
    > make it
    >     >     > easier to
    >     >     > > > >>> handle
    >     >     > > > >>> > > cases like the header value matching brought up by
    >     > Sönke; it
    >     >     > might
    >     >     > > > >>> look
    >     >     > > > >>> > > something like "header-matches(header-name,
    >     >     > header-value-pattern)".
    >     >     > > > >>> > >
    >     >     > > > >>> > > Cheers,
    >     >     > > > >>> > >
    >     >     > > > >>> > > Chris
    >     >     > > > >>> > >
    >     >     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
    >     >     > tbentley@redhat.com>
    >     >     > > > >>> wrote:
    >     >     > > > >>> > >
    >     >     > > > >>> > > > Hi Sönke,
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Thanks for taking a look.
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Let me answer in reverse order, since I think it
    > might
    >     > make
    >     >     > more
    >     >     > > > >>> sense
    >     >     > > > >>> > > that
    >     >     > > > >>> > > > way...
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Also, while writing that, I noticed that you
    > have a
    >     > version
    >     >     > with
    >     >     > > > >>> and
    >     >     > > > >>> > > > > without "name" for your transformation in the
    > KIP:
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > has-header:my-header
    >     >     > > > >>> > > > > and
    >     >     > > > >>> > > > > transforms.conditionalExtract.condition:
    >     >     > has-header:my-header
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > The example
    >     >     > > > >>> > > >     transforms.conditionalExtract.condition:
    >     >     > has-header:my-header
    >     >     > > > >>> > > > is a "has-header" condition (the prefix of the
    > config
    >     > value),
    >     >     > > > >>> which will
    >     >     > > > >>> > > > match records with a "my-header" header (given
    > in the
    >     >     > suffix).
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > The other example given is:
    >     >     > > > >>> > > >     transforms.conditionalExtract.condition:
    >     > not:hasMyHeader
    >     >     > > > >>> > > >
    >     >  transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > > has-header:my-header
    >     >     > > > >>> > > > The root of the condition is a "not" condition
    > (the
    >     > prefix
    >     >     > of the
    >     >     > > > >>> value
    >     >     > > > >>> > > for
    >     >     > > > >>> > > > the transforms.conditionalExtract.condition key)
    > of
    >     > another
    >     >     > named
    >     >     > > > >>> > > condition
    >     >     > > > >>> > > > called "hasMyHeader" (the suffix). Any name
    > could be
    >     > used
    >     >     > for the
    >     >     > > > >>> other
    >     >     > > > >>> > > > condition. That other condition is configured at
    >     >     > > > >>> > > >
    >     > "transforms.conditionalExtract.condition.<conditionName>".
    >     >     > That
    >     >     > > > >>> condition
    >     >     > > > >>> > > > is a "has-header" condition (the prefix), which
    > will
    >     > match
    >     >     > > > records
    >     >     > > > >>> with a
    >     >     > > > >>> > > > "my-header" header (given in the suffix). So the
    >     >     > "has-header:"
    >     >     > > > >>> condition
    >     >     > > > >>> > > > type would always require a suffix, as would the
    > "not:"
    >     >     > condition
    >     >     > > > >>> type.
    >     >     > > > >>> > > > Hypothetically you could have a "true" condition
    > type
    >     > (which
    >     >     > > > would
    >     >     > > > >>> not
    >     >     > > > >>> > > > require a suffix), and the hypothetical binary
    >     > conditions
    >     >     > "and:"
    >     >     > > > >>> and
    >     >     > > > >>> > > "or:"
    >     >     > > > >>> > > > would require a pair of other condition names.
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > So what's proposed is a scheme for encoding
    > conditions
    >     > where
    >     >     > the
    >     >     > > > >>> > > condition
    >     >     > > > >>> > > > type is the prefix of the value of some
    > "....condition"
    >     >     > config
    >     >     > > > >>> key, and
    >     >     > > > >>> > > the
    >     >     > > > >>> > > > optional suffix provides parameters for the
    > condition.
    >     > This
    >     >     > makes
    >     >     > > > >>> those
    >     >     > > > >>> > > > parameters a bit inflexible, but is relatively
    >     > succinct.
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > This leads on to your first point. You're right
    > that
    >     > use
    >     >     > cases
    >     >     > > > >>> might
    >     >     > > > >>> > > appear
    >     >     > > > >>> > > > which need other conditions, and we should make
    > it
    >     > flexible
    >     >     > > > enough
    >     >     > > > >>> to be
    >     >     > > > >>> > > > able to cope with future use cases. On the other
    > hand,
    >     > I was
    >     >     > > > >>> concerned
    >     >     > > > >>> > > that
    >     >     > > > >>> > > > we end up with something which is quite
    > complicated to
    >     >     > configure.
    >     >     > > > >>> (There
    >     >     > > > >>> > > > comes a point where it might makes more sense
    > for the
    >     > user to
    >     >     > > > >>> write their
    >     >     > > > >>> > > > own SMT).
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Just of the top of my head it might look like:
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> type:has-header
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > > header-name:my-header
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > field-value:my-value
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > That won't work because the format is basically a
    >     >     > > > >>> Properties/Map<String,
    >     >     > > > >>> > > > String> and what you've suggested has duplicate
    > keys.
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > One thing I did briefly consider what the
    > ability to
    >     > treat
    >     >     > > > >>> conditions as
    >     >     > > > >>> > > > Configurable objects in their own right (opening
    > up the
    >     >     > > > >>> possibility of
    >     >     > > > >>> > > > people supplying their own Conditions, just like
    > they
    >     > can
    >     >     > supply
    >     >     > > > >>> their
    >     >     > > > >>> > > own
    >     >     > > > >>> > > > SMTs). That might be configured something like
    > this:
    >     >     > > > >>> > > >
    >     >     > > > >>> > > >     transforms.conditionalExtract.condition: not
    >     >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.not.type:
    >     > Not
    >     >     > > > >>> > > >
    >     >  transforms.conditionalExtract.condition.not.negated: foo
    >     >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.foo.type:
    >     >     > > > >>> HasHeaderWithValue
    >     >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.foo.header:
    >     >     > my-header
    >     >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.foo.value:
    >     >     > my-value
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > I didn't propose that I couldn't see the use
    > cases to
    >     > justify
    >     >     > > > this
    >     >     > > > >>> kind
    >     >     > > > >>> > > of
    >     >     > > > >>> > > > complexity, especially as the common case would
    > surely
    >     > be
    >     >     > > > matching
    >     >     > > > >>> > > against
    >     >     > > > >>> > > > topic name (to be honest I wasn't completely
    > convinced
    >     > by the
    >     >     > > > need
    >     >     > > > >>> for
    >     >     > > > >>> > > > "has-header"). In the current version of the KIP
    >     > that's just
    >     >     > > > >>> > > >
    >     >     > > > >>> > > >     transforms.conditionalExtract.condition:
    >     > topic-matches:
    >     >     > > > >>> my-prefix-.*
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > but using the more flexible scheme that would
    > require
    >     >     > something
    >     >     > > > >>> more like
    >     >     > > > >>> > > > this:
    >     >     > > > >>> > > >
    >     >     > > > >>> > > >     transforms.conditionalExtract.condition: bar
    >     >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.bar.type:
    >     >     > TopicMatch
    >     >     > > > >>> > > >
    >     >  transforms.conditionalExtract.condition.bar.pattern:
    >     >     > > > >>> my-prefix-.*
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > If people know of use cases which would justify
    > more
    >     >     > > > >>> sophistication, I'm
    >     >     > > > >>> > > > happy to reconsider.
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Thanks again for taking a look!
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > Tom
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
    >     >     > > > >>> > > > <so...@opencore.com.invalid> wrote:
    >     >     > > > >>> > > >
    >     >     > > > >>> > > > > Hi Tom,
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > sounds useful to me, thanks for the KIP!
    >     >     > > > >>> > > > > The only thought that I had while reading was
    > that
    >     > this
    >     >     > will
    >     >     > > > >>> probably
    >     >     > > > >>> > > > raise
    >     >     > > > >>> > > > > questions about more involved conditions fairly
    >     > quickly.
    >     >     > For
    >     >     > > > >>> example
    >     >     > > > >>> > > the
    >     >     > > > >>> > > > > "has-header" will cause an appetite for
    > conditions
    >     > like
    >     >     > > > >>> > > > > "this-header-has-that-value".
    >     >     > > > >>> > > > > This would necessitate two parameters to be
    > passed
    >     > into the
    >     >     > > > >>> condition,
    >     >     > > > >>> > > > > which I think is not currently included in the
    > KIP.
    >     > I am
    >     >     > not
    >     >     > > > >>> saying add
    >     >     > > > >>> > > > > this now, but might it make sense to discuss a
    >     > concept of
    >     >     > how
    >     >     > > > >>> that
    >     >     > > > >>> > > might
    >     >     > > > >>> > > > > look now, to avoid potential changes later on.
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > Just of the top of my head it might look like:
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> type:has-header
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > > header-name:my-header
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > field-value:my-value
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > Also, while writing that, I noticed that you
    > have a
    >     > version
    >     >     > > > with
    >     >     > > > >>> and
    >     >     > > > >>> > > > > without "name" for your transformation in the
    > KIP:
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    > transforms.conditionalExtract.condition.hasMyHeader:
    >     >     > > > >>> > > has-header:my-header
    >     >     > > > >>> > > > > and
    >     >     > > > >>> > > > > transforms.conditionalExtract.condition:
    >     >     > has-header:my-header
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > Is this intentional and the name is optional?
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > Best regards,
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > Sönke
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
    >     >     > tbentley@redhat.com>
    >     >     > > > >>> wrote:
    >     >     > > > >>> > > > > >
    >     >     > > > >>> > > > > > Does anyone have any comments, feedback or
    >     > thoughts about
    >     >     > > > this?
    >     >     > > > >>> > > > > >
    >     >     > > > >>> > > > > > Thanks,
    >     >     > > > >>> > > > > >
    >     >     > > > >>> > > > > > Tom
    >     >     > > > >>> > > > > >
    >     >     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley
    > <
    >     >     > > > >>> tbentley@redhat.com>
    >     >     > > > >>> > > > > wrote:
    >     >     > > > >>> > > > > >
    >     >     > > > >>> > > > > > > Hi,
    >     >     > > > >>> > > > > > >
    >     >     > > > >>> > > > > > > I've opened KIP-585 which is intended to
    > provide
    >     > a
    >     >     > > > mechanism
    >     >     > > > >>> to
    >     >     > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect.
    > I'd be
    >     >     > grateful
    >     >     > > > >>> for any
    >     >     > > > >>> > > > > > > feedback:
    >     >     > > > >>> > > > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > >
    >     >     > > > >>> > >
    >     >     > > > >>>
    >     >     > > >
    >     >     >
    >     >
    > https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
    >     >     > > > >>> > > > > > >
    >     >     > > > >>> > > > > > > Many thanks,
    >     >     > > > >>> > > > > > >
    >     >     > > > >>> > > > > > > Tom
    >     >     > > > >>> > > > > > >
    >     >     > > > >>> > > > >
    >     >     > > > >>> > > >
    >     >     > > > >>> > >
    >     >     > > > >>>
    >     >     > > > >>>
    >     >     > > >
    >     >     > >
    >     >     >
    >     >
    >     >
    >
    >


Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi,

I've updated the KIP to reflect recent discussions. Please note the
following:

1. I've described a HasHeaderKey predicate rather than HeaderKeyMatches
because at the moment Headers doesn't expose the whole set of headers, only
those with a specific name. Exposing this would significantly increase the
scope of this KIP but relatively little extra benefit.
2. I used org.apache.kafka.connect.transformer.predicates rather than
org.apache.kafka.connect.predicates, since I thought it better reflected
the use of predicates within transforms. I'm flexible on this one if
someone has a good case for the original name. For example the original
package name would be more appropriate if we were expecting Connectors to
make use of Predicates somehow.
3. I've dropped the special case of not needing to provide a FQN for
predicate classes in that package, since this isn't something supported by
transformations themselves as far as I know. I like the idea, but it seemed
inconsistent to need it for transformations but not for predicates.
4. Chris, I've used your suggestions for a validate() method for the extra
configurability needed. I've also added this to Predicate (even though I
don't need it) for consistency with Connector and Transformation.
5. For negating the result of a predicate I decided it was clearer to just
have a "negate" config parameter, supported by all predicates, than to
reach for more punctuation.

I'd be grateful for any more feedback people might have.

Kind regards,

Tom

On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <an...@live.com>
wrote:

> Hi Tom,
> I think we should go for a consistent naming convention for the
> predicates, maybe so they
> read nicely if you say "IF" first. Here's a suggestion for predicates that
> the KIP could introduce:
>  - TopicNameMatches
>  - HeaderKeyMatches
>  - RecordIsTombstone
>
> On naming, I also suggest using
> org.apache.kafka.connect.predicates.Predicate
> as the name for the interface.
>
> I hadn't settled on a preference for negation. I suppose there are three
> results
> from evaluating a predicate - true, false or an exception caught by Kafka
> Connect.
> I suggest the exception case is always a predicate fail so the guarded
> transformation
> is skipped. Initially I'd preferred having the predicate be configured to
> return a
> negated result to give a "not match" behaviour. Thinking again now, I
> prefer something like:
>
> transforms: t1
> transforms.t1.?type: RecordIsTombstone
> transforms.t1.type: org.apache.kafka.connect.transforms.InsertField$Value
> transforms.t1.static.field: deleted
> transforms.t1.static.value: true
>
> The "?" means the transform guarded by the predicate runs in the Predicate
> returns true.
>
> transforms: t2
> transforms.t2.?!type: org.apache.kafka.connect.predicates.TopicNameMatch
> transforms.t2.?regex: my-prefix-.*
> transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.t2.field: c1
>
> The "?!" means the transform guarded by the predicate runs if the Predicate
> returns false.
>
> I think adding a Filter SMT that uses a Predicate is a nice idea.
>
> Cheers,
> Andrew
>
> On 28/04/2020, 09:44, "Tom Bentley" <tb...@redhat.com> wrote:
>
>     Hi Andrew and Chris,
>
>     Firstly, thanks for the input.
>
>     To me, the ?type syntax has some pros and cons. On the pros side:
>     * it's pretty succinct
>     * it's flexible enough for the use cases we've identified so far.
>
>     On the cons side:
>     * it's a bit cryptic; I don't think people who didn't know it already
> would
>     be able to guess what ?type meant.
>     * it doesn't work so well if there is a whole chain of transformations
>     which need to be applied conditionally (that can still be done, of
> course,
>     but the user has to duplicate the configuration for each SMT). Nor
> does it
>     work if you need something akin to a "nested if" (you could fake it
> using
>     And), but we don't really expect people to need that.
>
>     Andrew, you didn't give an example of a negated match. Would you use
>     something like !?type for that, or did you have another idea?
>
>     On balance, I think this is a reasonable trade off.
>
>     I agree that exposing RecordPredicate as a public API, while providing
>     common implementation(s) would allow most use cases to be covered
> without
>     coding, but also allow people with specific unaddressed needs to
> implement
>     their own without having to write a streaming application to do so.
> Again,
>     I think this is a reasonable trade off.
>
>     It would be helpful to have a consensus about exactly which condition
>     implementations the KIP should provide. I think we all agree that
>     TopicMatches (or should it be TopicNameMatches?) should be included.
>     HasHeader (or should it be HasHeaderWithName or HasHeaderMatching for a
>     regex version?), And and Or all seem like they could be useful, even
> though
>     no one has pointed to concrete use cases. It might also be worth
>     considering that it would be natural to add a Filter SMT which filtered
>     records using a RecordPredicate. Perhaps that provides motivation for
>     conditions on the record key as well? If people think the Filter SMT
> would
>     be useful I'm willing to include it in this KIP, since it seems
>     straightforward.
>
>     If people are happy with this approach, I'll update the KIP.
>
>     Kind regards,
>
>     Tom
>
>     On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <
> andrew_schofield@live.com>
>     wrote:
>
>     > Hi,
>     > I'm suggesting that RecordPredicate be another pluggable interface
> that
>     > users
>     > could conceivably implement on their own, and that the KIP
> introduces the
>     > most likely ones as supplied implementations, much as there are SMT
>     > implementations such as HoistField. I don't really see users
> implementing
>     > their own RecordPredicate, but the supplied ones would probably have
> a
>     > common
>     > interface and it seems a pity not to expose that as an external API.
>     >
>     > SMT is essentially a toolkit for Kafka Connect and I think
> RecordPredicate
>     > would be a nice minor enhancement and I'd be happy to contribute
> code.
>     >
>     > In order to make it simpler to configure, I suggest that the class
> name
>     > configured
>     > using "?type" would be assumed to be in package
>     > org.apache.kafka.connect.predicates
>     > in the absence of a package name so that shorter names such as
>     > "TopicMatches"
>     > could be used.
>     >
>     > Cheers,
>     > Andrew Schofield
>     > IBM Event Streams
>     >
>     > On 25/04/2020, 21:22, "Christopher Egerton" <ch...@confluent.io>
> wrote:
>     >
>     >     Hi Andrew,
>     >
>     >     I know a DSL seems like overkill, and I'm not attached to it by
> any
>     > means,
>     >     but I do think it serves a vital purpose in that it allows
> people who
>     > don't
>     >     know how or have the time to write Java code to act on data being
>     > processed
>     >     by connectors.
>     >
>     >     Are you proposing that the "RecordPredicate" be another pluggable
>     > interface
>     >     that users could then implement on their own? Or, would this be
> purely
>     > a
>     >     syntactic expansion to SMT configuration with some subset of
> hard-coded
>     >     predicates provided by the framework? I think there's value in
> the
>     > latter,
>     >     but the former doesn't seem like it'd bring much to the table as
> far as
>     >     users are concerned, and for developers, it is an improvement,
> but not
>     > a
>     >     major one.
>     >
>     >     Cheers,
>     >
>     >     Chris
>     >
>     >     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
>     >     schofieldandrewj@gmail.com> wrote:
>     >
>     >     > I wonder whether we're getting a bit overcomplicated with
> this. I
>     > think all
>     >     > that's required here is to add an optional guard predicate for
> a
>     >     > Transformation.
>     >     > The predicate cannot end the Transformation chain, but it can
> allow
>     > or
>     >     > prevent a particular
>     >     > Transformation from running.
>     >     >
>     >     > How about this as syntax?
>     >     > transforms: extractInt
>     >     > transforms.extractInt.?type:
>     >     > org.apache.kafka.connect.predicates.TopicMatches
>     >     > transforms.extractInt.?regex: my-prefix-.*
>     >     > transforms.extractInt.type:
>     >     > org.apache.kafka.connect.transforms.ExtractField$Key
>     >     > transforms.extractInt.field: c1
>     >     >
>     >     > The idea is that a Transformation can have an optional
>     > RecordPredicate.
>     >     > The RecordPredicate
>     >     > can have configuration items, in a similar way to a
> Transformation.
>     > The
>     >     > syntax of
>     >     > using a '?' prefix to separate configuration for the
> RecordPredicate
>     > from
>     >     > the configuration
>     >     > for the Transformation could conceivably clash with an existing
>     >     > Transformation
>     >     > but the chances are minimal.
>     >     >
>     >     > This syntax doesn't allow for an 'else', but if the KIP offers
> say a
>     >     > TopicMatches predicate
>     >     > then that can be configured to return FALSE if the predicate
> matches.
>     >     >
>     >     > I feel that a DSL for SMTs is overkill. If you need something
> that
>     >     > complex, it's
>     >     > perhaps too complex for a transformation chain and it's really
> a
>     > streaming
>     >     > application.
>     >     >
>     >     > Andrew Schofield
>     >     > IBM Event Streams
>     >     >
>     >     > On 2020/04/08 21:39:31, Christopher Egerton <
> chrise@confluent.io>
>     > wrote:
>     >     > > Hi Tom,
>     >     > >
>     >     > > With regards to the potential Transformation::validate
> method, I
>     > don't
>     >     > > quite follow your objections. The AbstractHerder class,
>     > ConnectorConfig
>     >     > > class, and embedding of ConfigDefs that happens with the two
> is all
>     >     > > internal logic and we're allowed to modify it however we
> want, as
>     > long as
>     >     > > it doesn't alter any public-facing APIs (and if it does, we
> just
>     > have to
>     >     > > document those changes in a KIP). We don't have to embed the
>     > ConfigDef
>     >     > for
>     >     > > a transformation chain inside another ConfigDef if the API
> we want
>     > to
>     >     > > present to our users doesn't play well with that part of the
> code
>     > base.
>     >     > > Additionally, beyond aligning with the existing API provided
> for
>     >     > > connectors, another advantage is that it becomes possible to
>     > validate
>     >     > > properties for a configuration in the context of all other
>     > properties, so
>     >     > > to be clear, it's not just about preserving what may be
> perceived
>     > as a
>     >     > > superficial similarity and comes with real functional
> benefits
>     > that can't
>     >     > > be provided (at least, not as easily) by dynamic
> construction of a
>     >     > > ConfigDef object.
>     >     > >
>     >     > > As far as the new proposal goes, I hate to say it, but I
> think
>     > we're
>     >     > stuck
>     >     > > with the worst of both worlds here. Adding the new
> RecordPredicate
>     >     > > interface seems like it defeats the whole purpose of SMTs,
> which
>     > is to
>     >     > > allow manipulation of a connector's event stream by users
> who don't
>     >     > > necessarily know how or have the time to write Java code of
> their
>     > own.
>     >     > This
>     >     > > is also why I'm in favor of adding a lightweight DSL for the
>     > condition;
>     >     > > emphasizing readability for people who aren't very familiar
> with
>     > Connect
>     >     > > and just want to get something going quickly should be a
> priority
>     > for
>     >     > SMTs.
>     >     > > But if that's not going to happen with this KIP, I'd still
> choose
>     > the
>     >     > > simpler, less-flexible approach initially outlined, in order
> to
>     > keep
>     >     > things
>     >     > > simple for people creating connectors and try to let them
>     > accomplish what
>     >     > > they want via configuration, not code.
>     >     > >
>     >     > > With regards to the question about where the line should be
> drawn
>     > and how
>     >     > > much is too much and comparisons to other stream processing
>     > frameworks, I
>     >     > > think the nature of SMTs draws the line quite nicely: you
> can only
>     >     > process
>     >     > > one message at a time. There's plenty of use cases out there
> for
>     >     > > heavier-duty processing frameworks like Kafka Streams, with
>     > aggregate
>     >     > > operations, joining of streams, expanding a single message
> into
>     > multiple
>     >     > > messages, etc. With SMTs, none of this is possible; the
> general
>     > use case
>     >     > is
>     >     > > to filter and clean a stream of data. If any of the
> heavier-weight
>     > logic
>     >     > > provided by, e.g., Streams, isn't required for a project, it
>     > should be
>     >     > > possible to get along with just a collection of sink
> connectors,
>     > source
>     >     > > connectors, a converter or two, and SMTs that smooth over any
>     > differences
>     >     > > in data format between what source connectors produce and
> what sink
>     >     > > connectors expect. This is why I'm comfortable suggesting
> heavy
>     > expansion
>     >     > > of the out-of-the-box SMTs that we provide with Connect; as
> long as
>     >     > they're
>     >     > > reasonable to configure, they can greatly reduce the
> operational
>     > burden
>     >     > on
>     >     > > anyone running and/or using a Connect cluster since they
> might
>     > entirely
>     >     > > replace additional services that would otherwise be required.
>     >     > >
>     >     > > All that said--this is a huge ask of someone who just wants
> to
>     > support
>     >     > the
>     >     > > SMT equivalent of an "if" statement, and it's totally
>     > understandable if
>     >     > > that's too much to ask. The one concern I have left is that
> if we
>     > expand
>     >     > > the SMT in the future, there become compatibility concerns
> since
>     > SMTs are
>     >     > > pretty tightly-coupled with the worker on which they run
> (although
>     >     > > technically, with some classpath/plugin path shuffling, you
> can run
>     >     > > different versions of the out-of-the-box SMTs from the
> Connect
>     > worker on
>     >     > > which they're run). Someone might write a connector config
> with the
>     >     > > If/Conditional/Whatever SMT with a condition type that works
> on one
>     >     > worker,
>     >     > > but that doesn't work on another worker that's running an
> earlier
>     > version
>     >     > > of Connect. This is why I'm in favor of adding extra
> predicates now
>     >     > instead
>     >     > > of later; if we're going to implement what has the potential
> to be
>     > a bit
>     >     > of
>     >     > > a Swiss army knife SMT with room for future expansion of
>     > configuration,
>     >     > and
>     >     > > we can think of a reasonable way to add that functionality
> now, it
>     > seems
>     >     > > better for users and administrators of Connect to try to do
> that
>     > now
>     >     > > instead of later.
>     >     > >
>     >     > > Cheers,
>     >     > >
>     >     > > Chris
>     >     > >
>     >     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <
> tbentley@redhat.com>
>     > wrote:
>     >     > >
>     >     > > > Since no one objected I've updated the KIP with the
> revised way
>     > to
>     >     > > > configure this transformation.
>     >     > > >
>     >     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <
> tbentley@redhat.com>
>     >     > wrote:
>     >     > > >
>     >     > > > > To come back about a point Chris made:
>     >     > > > >
>     >     > > > > 1. Instead of the new "ConfigDef config(Map<String,
> String>
>     > props)"
>     >     > > > method,
>     >     > > > >> what would you think about adopting a similar approach
> as the
>     >     > framework
>     >     > > > >> uses with connectors, by adding a "Config
> validate(Map<String,
>     >     > String>
>     >     > > > >> props)" method that can perform custom validation
> outside of
>     > what
>     >     > can be
>     >     > > > >> performed by the ConfigDef's single-property-at-a-time
>     > validation?
>     >     > It
>     >     > > > may
>     >     > > > >> be a little heavyweight for use with this particular
> SMT, but
>     > it'd
>     >     > > > provide
>     >     > > > >> more flexibility for other SMT implementations and would
>     > mirror an
>     >     > API
>     >     > > > >> that
>     >     > > > >> developers targeting the framework are likely already
>     > familiar with.
>     >     > > > >>
>     >     > > > >
>     >     > > > > The validate() + config() approach taken for Connectors
>     > doesn't quite
>     >     > > > work
>     >     > > > > for Transformations.
>     >     > > > >
>     >     > > > > The default Connector.validate() basically just calls
>     >     > > > > `config().validate(connectorConfigs)` and returns a
> Config.
>     >     > Separately
>     >     > > > the
>     >     > > > > AbstractHeader also calls `config()` and uses the Config
> and
>     >     > ConfigDef to
>     >     > > > > build the ConfigInfos. The contract is not really
> defined well
>     >     > enough in
>     >     > > > > the javadoc, but this means a connector can use
> validate() to
>     > build
>     >     > the
>     >     > > > > ConfigDef which it will return from config().
>     >     > > > >
>     >     > > > > For Transformations we don't really want validate() to
> perform
>     >     > validation
>     >     > > > > at all since ultimately the transformations config will
> be
>     > embedded
>     >     > in
>     >     > > > the
>     >     > > > > connector's and will be validated by the connector
> itself. It
>     >     > wouldn't be
>     >     > > > > harmful if it _did_ perform validation, just
> unnecessary. But
>     > having
>     >     > a
>     >     > > > > validate() method that ought not to validate() seems
> like a
>     > recipe
>     >     > for
>     >     > > > > confusion.
>     >     > > > >
>     >     > > > > Also problematic is that there's no use for the Config
>     > returned from
>     >     > > > > Transformations.validate().
>     >     > > > >
>     >     > > > > So I'm not convinced that the superficial similarity
> really
>     > gains
>     >     > > > anything.
>     >     > > > >
>     >     > > > > Kind regards,
>     >     > > > >
>     >     > > > > Tom
>     >     > > > >
>     >     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
>     > tbentley@redhat.com>
>     >     > wrote:
>     >     > > > >
>     >     > > > >> Hi,
>     >     > > > >>
>     >     > > > >> Hi all,
>     >     > > > >>
>     >     > > > >> Thanks for the discussion so far.
>     >     > > > >>
>     >     > > > >> It seems a bit weird to me that when configuring the
>     > Conditional SMT
>     >     > > > with
>     >     > > > >> a DSL you would use a concise, intuitive DSL for
> expressing
>     > the
>     >     > > > condition,
>     >     > > > >> but not for the transforms that it's guarding. It also
> seems
>     >     > natural, if
>     >     > > > >> you support this for conditionally applying SMTs, that
> you'd
>     > soon
>     >     > want
>     >     > > > to
>     >     > > > >> support the same thing for a generic filter transformer.
>     > Then, if
>     >     > you
>     >     > > > can
>     >     > > > >> configure a filter transformation using this DSL, it
> becomes
>     > odd
>     >     > that
>     >     > > > you
>     >     > > > >> can't do this for mapping transformations. I think it
> would
>     > be a
>     >     > > > mistake to
>     >     > > > >> go specifying an expression language for conditions when
>     > really that
>     >     > > > might
>     >     > > > >> just be part of a language for transformations.
>     >     > > > >>
>     >     > > > >> I think it would be possible today to write an SMT which
>     > allowed
>     >     > you to
>     >     > > > >> express the transformation in a DSL. Concretely, it's
>     > possible to
>     >     > > > imagine a
>     >     > > > >> single transformation taking a DSL something like this:
>     >     > > > >>
>     >     > > > >>   compose(
>     >     > > > >>     Flatten.Key(delimiter: ':'),
>     >     > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
>     >     > > > >>        then: Flatten.Value(delimiter: '/')))
>     >     > > > >>
>     >     > > > >> All I've really done here, beyond what was already
> proposed,
>     > is
>     >     > rename
>     >     > > > >> Conditional to If, imagine a couple more bits of syntax
>     >     > > > >> (SMTs being constructed by class name and named argument
>     > invocation
>     >     > > > >> syntax to support SMT constructors) and add a higher
> level
>     > compose()
>     >     > > > >> function for chaining SMTs (which could easily be
> replaced
>     > with
>     >     > brace
>     >     > > > >> delimited blocks).
>     >     > > > >>
>     >     > > > >> That may be a discussion we should have, but I think in
>     > _this_ KIP
>     >     > we
>     >     > > > >> should focus on the conditional part, since the above
> example
>     >     > hopefully
>     >     > > > >> shows that it would be possible to reuse it in a DSL if
> there
>     > was
>     >     > > > appetite
>     >     > > > >> for that.
>     >     > > > >>
>     >     > > > >> With that in mind, and since my original suggestion for
> an
>     >     > abbreviated
>     >     > > > >> config syntax didn't appeal to people, I propose that we
>     > stick with
>     >     > the
>     >     > > > >> existing norms for configuring this.
>     >     > > > >> The previous example would look like this:
>     >     > > > >>
>     >     > > > >> transformation: flattenKey,if
>     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     > > > >> transformation.flattenKey.delimiter: :
>     >     > > > >> transformation.if.type: If
>     >     > > > >> transformation.if.condition.type: TopicMatches
>     >     > > > >> transformation.if.condition.pattern: blah-.*
>     >     > > > >> transformation.if.then: flattenValue
>     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     > > > >>
>     >     > > > >> Also, I'm inclined to think we should stick to just
> supporting
>     >     > > > >> TopicMatches and Not, since we've not identified an
> actual
>     > need for
>     >     > > > >> 'has-header', 'or' and 'and'.
>     >     > > > >>
>     >     > > > >> An example of the usage of Not would look like this:
>     >     > > > >>
>     >     > > > >> transformation: flattenKey,if
>     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     > > > >> transformation.flattenKey.delimiter: :
>     >     > > > >> transformation.if.type: If
>     >     > > > >> transformation.if.condition.type: Not
>     >     > > > >> transformation.if.condition.operand: startsWithBlah
>     >     > > > >> transformation.if.condition.operand.startsWithBlah.type:
>     >     > TopicMatches
>     >     > > > >>
> transformation.if.condition.operand.startsWithBlah.pattern:
>     > blah-.*
>     >     > > > >> transformation.if.then: flattenValue
>     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     > > > >>
>     >     > > > >> Hopefully we can agree that this allows the conditional
> SMT
>     > part to
>     >     > make
>     >     > > > >> progress without getting bogged down.
>     >     > > > >>
>     >     > > > >> Thoughts? If this seems broadly reasonable, I'll update
> the
>     > KIP.
>     >     > > > >>
>     >     > > > >> Kind regards,
>     >     > > > >>
>     >     > > > >> Tom
>     >     > > > >>
>     >     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
>     > gunnar@hibernate.org
>     >     > >
>     >     > > > >> wrote:
>     >     > > > >>
>     >     > > > >>> Hi all,
>     >     > > > >>>
>     >     > > > >>> Thanks a lot for this initiative, Tom!
>     >     > > > >>>
>     >     > > > >>> To shed some light, the use case where this first came
> up,
>     > were
>     >     > issues
>     >     > > > >>> we saw with SMTs being applied to the different topics
>     > produced by
>     >     > the
>     >     > > > >>> Debezium change data capture connectors. There are
> different
>     > kinds
>     >     > of
>     >     > > > >>> topics (for change data, schema history, heartbeats
> etc.)
>     > and the
>     >     > > > >>> record structure to expect may vary between those.
> Hence we
>     > saw
>     >     > issues
>     >     > > > >>> with SMTs like ExtractField, which for instance only
> should
>     > be
>     >     > applied
>     >     > > > >>> to all change data topics but not the other ones.
>     >     > > > >>>
>     >     > > > >>> I like the overall approach; for Debezium's purposes,
> the
>     > simple
>     >     > topic
>     >     > > > >>> matching and negation operators would be sufficient
> already.
>     > I
>     >     > agree
>     >     > > > >>> with Chris and would prefer one single condition
> attribute,
>     > which
>     >     > > > >>> contains a single condition or potentially a logical
>     > expression
>     >     > with
>     >     > > > >>> not, and, etc. I think it's less ambiguous, in
> particular
>     > when it
>     >     > > > >>> comes to ordering of the different conditions and
>     > determining their
>     >     > > > >>> precedence.
>     >     > > > >>>
>     >     > > > >>> Would love to see this feature in one or another way in
>     > Connect.
>     >     > > > >>>
>     >     > > > >>> Best,
>     >     > > > >>>
>     >     > > > >>> --Gunnar
>     >     > > > >>>
>     >     > > > >>>
>     >     > > > >>>
>     >     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>     >     > > > >>> tbentley@redhat.com>:
>     >     > > > >>> >
>     >     > > > >>> > Hi Chris and Sönke,
>     >     > > > >>> >
>     >     > > > >>> > Using the numbering from Chris's email...
>     >     > > > >>> >
>     >     > > > >>> > 1. That's a good point, I'll see what is needed to
> make
>     > that
>     >     > work.
>     >     > > > >>> >
>     >     > > > >>> > 2. I'm happy enough to add support for "and" and
> "or" as
>     > part of
>     >     > this
>     >     > > > >>> KIP
>     >     > > > >>> > if people can see a need for it.
>     >     > > > >>> >
>     >     > > > >>> > In a similar vein, I was wondering about whether it
> would
>     > be
>     >     > > > worthwhile
>     >     > > > >>> > having the equivalent of an "else" clause (what's in
> the
>     > KIP is
>     >     > > > >>> basically
>     >     > > > >>> > an "if" statement). Without support for "else" I
> think
>     > people
>     >     > would
>     >     > > > >>> often
>     >     > > > >>> > need two conditionals, with the condition of one
> being the
>     >     > negation
>     >     > > > of
>     >     > > > >>> the
>     >     > > > >>> > condition of another.
>     >     > > > >>> >
>     >     > > > >>> > 3. I can see the attraction of an expression
> language. The
>     > pros
>     >     > > > include
>     >     > > > >>> > being terse and familiar to programmers and
> potentially
>     > very
>     >     > flexible
>     >     > > > >>> if
>     >     > > > >>> > that's needed in the future. I had a play and
> implemented
>     > it
>     >     > using
>     >     > > > >>> ANTLR
>     >     > > > >>> > and it's not difficult to write a grammar and
> implement the
>     >     > functions
>     >     > > > >>> we've
>     >     > > > >>> > already discussed and get decent error messages when
> the
>     >     > expression
>     >     > > > is
>     >     > > > >>> > malformed. So on the one hand I quite like the idea.
> On
>     > the other
>     >     > > > hand
>     >     > > > >>> it
>     >     > > > >>> > feels like overkill for the use cases that have
> actually
>     > been
>     >     > > > >>> identified so
>     >     > > > >>> > far.
>     >     > > > >>> >
>     >     > > > >>> > @Sönke what do you make of the expression language
> idea?
>     >     > > > >>> >
>     >     > > > >>> > Kind regards,
>     >     > > > >>> >
>     >     > > > >>> > Tom
>     >     > > > >>> >
>     >     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
>     >     > > > >>> chrise@confluent.io>
>     >     > > > >>> > wrote:
>     >     > > > >>> >
>     >     > > > >>> > > Hi Tom,
>     >     > > > >>> > >
>     >     > > > >>> > > This looks great and I'd love to see the
> out-of-the-box
>     > SMTs
>     >     > become
>     >     > > > >>> even
>     >     > > > >>> > > more powerful with the improvements you've
> proposed! Got
>     > a few
>     >     > > > >>> remarks and
>     >     > > > >>> > > would be interested in your thoughts:
>     >     > > > >>> > >
>     >     > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String,
>     > String>
>     >     > props)"
>     >     > > > >>> method,
>     >     > > > >>> > > what would you think about adopting a similar
> approach
>     > as the
>     >     > > > >>> framework
>     >     > > > >>> > > uses with connectors, by adding a "Config
>     > validate(Map<String,
>     >     > > > >>> String>
>     >     > > > >>> > > props)" method that can perform custom validation
>     > outside of
>     >     > what
>     >     > > > >>> can be
>     >     > > > >>> > > performed by the ConfigDef's
> single-property-at-a-time
>     >     > validation?
>     >     > > > >>> It may
>     >     > > > >>> > > be a little heavyweight for use with this
> particular
>     > SMT, but
>     >     > it'd
>     >     > > > >>> provide
>     >     > > > >>> > > more flexibility for other SMT implementations and
> would
>     >     > mirror an
>     >     > > > >>> API that
>     >     > > > >>> > > developers targeting the framework are likely
> already
>     > familiar
>     >     > > > with.
>     >     > > > >>> > > 2. The possibility for adding the logical operators
>     > "and" and
>     >     > "or"
>     >     > > > is
>     >     > > > >>> > > mentioned, but only as a potential future change
> and not
>     > one
>     >     > > > >>> proposed by
>     >     > > > >>> > > this KIP. Why not include those operators sooner
> rather
>     > than
>     >     > later?
>     >     > > > >>> > > 3. The syntax for named conditions that are then
>     > referenced in
>     >     > > > >>> logical
>     >     > > > >>> > > operators is... tricky. It took me a few attempts
> to
>     > grok the
>     >     > > > example
>     >     > > > >>> > > provided in the KIP after reading Sönke's question
> about
>     > the
>     >     > > > example
>     >     > > > >>> for
>     >     > > > >>> > > negation. What about a more sophisticated but less
>     > verbose
>     >     > syntax
>     >     > > > >>> that
>     >     > > > >>> > > supports a single configuration for the condition,
> even
>     > with
>     >     > > > logical
>     >     > > > >>> > > operators? I'm thinking something like
>     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     > > > not(has-header:my-header)"
>     >     > > > >>> > > instead of the
> "transforms.conditionalExtract.condition:
>     >     > > > >>> not:hasMyHeader"
>     >     > > > >>> > > and
> "transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header" properties. If support for a
>     > logical
>     >     > "and" is
>     >     > > > >>> added,
>     >     > > > >>> > > this could then be expanded to something like
>     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     > > > and(has-header(my-header),
>     >     > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be
>     > additional
>     >     > > > >>> complexity
>     >     > > > >>> > > here with the need to escape parentheses and
> commas that
>     > are
>     >     > > > >>> intended to be
>     >     > > > >>> > > treated literally (as part of a header name, for
> example)
>     >     > instead
>     >     > > > of
>     >     > > > >>> as
>     >     > > > >>> > > part of the syntax for the condition itself, but a
> little
>     >     > > > additional
>     >     > > > >>> > > complexity for edge cases like that may be
> warranted if
>     > it
>     >     > heavily
>     >     > > > >>> reduces
>     >     > > > >>> > > complexity for the common cases. The rationale for
> the
>     > proposed
>     >     > > > >>> > > parentheses-based syntax here instead of what's
>     > mentioned in
>     >     > the
>     >     > > > KIP
>     >     > > > >>> > > (something like "and: <condition1>, <condition2>")
> is to
>     > help
>     >     > with
>     >     > > > >>> > > readability; we probably wouldn't need that with
> the
>     > approach
>     >     > of
>     >     > > > >>> naming
>     >     > > > >>> > > conditions via separate properties, but things may
> get a
>     > little
>     >     > > > >>> nasty with
>     >     > > > >>> > > literal conditions included there, especially with
> the
>     >     > possibility
>     >     > > > of
>     >     > > > >>> > > nesting. Finally, the shift in syntax here should
> make it
>     >     > easier to
>     >     > > > >>> handle
>     >     > > > >>> > > cases like the header value matching brought up by
>     > Sönke; it
>     >     > might
>     >     > > > >>> look
>     >     > > > >>> > > something like "header-matches(header-name,
>     >     > header-value-pattern)".
>     >     > > > >>> > >
>     >     > > > >>> > > Cheers,
>     >     > > > >>> > >
>     >     > > > >>> > > Chris
>     >     > > > >>> > >
>     >     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
>     >     > tbentley@redhat.com>
>     >     > > > >>> wrote:
>     >     > > > >>> > >
>     >     > > > >>> > > > Hi Sönke,
>     >     > > > >>> > > >
>     >     > > > >>> > > > Thanks for taking a look.
>     >     > > > >>> > > >
>     >     > > > >>> > > > Let me answer in reverse order, since I think it
> might
>     > make
>     >     > more
>     >     > > > >>> sense
>     >     > > > >>> > > that
>     >     > > > >>> > > > way...
>     >     > > > >>> > > >
>     >     > > > >>> > > > Also, while writing that, I noticed that you
> have a
>     > version
>     >     > with
>     >     > > > >>> and
>     >     > > > >>> > > > > without "name" for your transformation in the
> KIP:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header
>     >     > > > >>> > > > > and
>     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > > > The example
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > is a "has-header" condition (the prefix of the
> config
>     > value),
>     >     > > > >>> which will
>     >     > > > >>> > > > match records with a "my-header" header (given
> in the
>     >     > suffix).
>     >     > > > >>> > > >
>     >     > > > >>> > > > The other example given is:
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     > not:hasMyHeader
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > has-header:my-header
>     >     > > > >>> > > > The root of the condition is a "not" condition
> (the
>     > prefix
>     >     > of the
>     >     > > > >>> value
>     >     > > > >>> > > for
>     >     > > > >>> > > > the transforms.conditionalExtract.condition key)
> of
>     > another
>     >     > named
>     >     > > > >>> > > condition
>     >     > > > >>> > > > called "hasMyHeader" (the suffix). Any name
> could be
>     > used
>     >     > for the
>     >     > > > >>> other
>     >     > > > >>> > > > condition. That other condition is configured at
>     >     > > > >>> > > >
>     > "transforms.conditionalExtract.condition.<conditionName>".
>     >     > That
>     >     > > > >>> condition
>     >     > > > >>> > > > is a "has-header" condition (the prefix), which
> will
>     > match
>     >     > > > records
>     >     > > > >>> with a
>     >     > > > >>> > > > "my-header" header (given in the suffix). So the
>     >     > "has-header:"
>     >     > > > >>> condition
>     >     > > > >>> > > > type would always require a suffix, as would the
> "not:"
>     >     > condition
>     >     > > > >>> type.
>     >     > > > >>> > > > Hypothetically you could have a "true" condition
> type
>     > (which
>     >     > > > would
>     >     > > > >>> not
>     >     > > > >>> > > > require a suffix), and the hypothetical binary
>     > conditions
>     >     > "and:"
>     >     > > > >>> and
>     >     > > > >>> > > "or:"
>     >     > > > >>> > > > would require a pair of other condition names.
>     >     > > > >>> > > >
>     >     > > > >>> > > > So what's proposed is a scheme for encoding
> conditions
>     > where
>     >     > the
>     >     > > > >>> > > condition
>     >     > > > >>> > > > type is the prefix of the value of some
> "....condition"
>     >     > config
>     >     > > > >>> key, and
>     >     > > > >>> > > the
>     >     > > > >>> > > > optional suffix provides parameters for the
> condition.
>     > This
>     >     > makes
>     >     > > > >>> those
>     >     > > > >>> > > > parameters a bit inflexible, but is relatively
>     > succinct.
>     >     > > > >>> > > >
>     >     > > > >>> > > > This leads on to your first point. You're right
> that
>     > use
>     >     > cases
>     >     > > > >>> might
>     >     > > > >>> > > appear
>     >     > > > >>> > > > which need other conditions, and we should make
> it
>     > flexible
>     >     > > > enough
>     >     > > > >>> to be
>     >     > > > >>> > > > able to cope with future use cases. On the other
> hand,
>     > I was
>     >     > > > >>> concerned
>     >     > > > >>> > > that
>     >     > > > >>> > > > we end up with something which is quite
> complicated to
>     >     > configure.
>     >     > > > >>> (There
>     >     > > > >>> > > > comes a point where it might makes more sense
> for the
>     > user to
>     >     > > > >>> write their
>     >     > > > >>> > > > own SMT).
>     >     > > > >>> > > >
>     >     > > > >>> > > > Just of the top of my head it might look like:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> type:has-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > header-name:my-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > field-value:my-value
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > > > That won't work because the format is basically a
>     >     > > > >>> Properties/Map<String,
>     >     > > > >>> > > > String> and what you've suggested has duplicate
> keys.
>     >     > > > >>> > > >
>     >     > > > >>> > > > One thing I did briefly consider what the
> ability to
>     > treat
>     >     > > > >>> conditions as
>     >     > > > >>> > > > Configurable objects in their own right (opening
> up the
>     >     > > > >>> possibility of
>     >     > > > >>> > > > people supplying their own Conditions, just like
> they
>     > can
>     >     > supply
>     >     > > > >>> their
>     >     > > > >>> > > own
>     >     > > > >>> > > > SMTs). That might be configured something like
> this:
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition: not
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.not.type:
>     > Not
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.not.negated: foo
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.type:
>     >     > > > >>> HasHeaderWithValue
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.header:
>     >     > my-header
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.value:
>     >     > my-value
>     >     > > > >>> > > >
>     >     > > > >>> > > > I didn't propose that I couldn't see the use
> cases to
>     > justify
>     >     > > > this
>     >     > > > >>> kind
>     >     > > > >>> > > of
>     >     > > > >>> > > > complexity, especially as the common case would
> surely
>     > be
>     >     > > > matching
>     >     > > > >>> > > against
>     >     > > > >>> > > > topic name (to be honest I wasn't completely
> convinced
>     > by the
>     >     > > > need
>     >     > > > >>> for
>     >     > > > >>> > > > "has-header"). In the current version of the KIP
>     > that's just
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     > topic-matches:
>     >     > > > >>> my-prefix-.*
>     >     > > > >>> > > >
>     >     > > > >>> > > > but using the more flexible scheme that would
> require
>     >     > something
>     >     > > > >>> more like
>     >     > > > >>> > > > this:
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition: bar
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.bar.type:
>     >     > TopicMatch
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.bar.pattern:
>     >     > > > >>> my-prefix-.*
>     >     > > > >>> > > >
>     >     > > > >>> > > > If people know of use cases which would justify
> more
>     >     > > > >>> sophistication, I'm
>     >     > > > >>> > > > happy to reconsider.
>     >     > > > >>> > > >
>     >     > > > >>> > > > Thanks again for taking a look!
>     >     > > > >>> > > >
>     >     > > > >>> > > > Tom
>     >     > > > >>> > > >
>     >     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>     >     > > > >>> > > > <so...@opencore.com.invalid> wrote:
>     >     > > > >>> > > >
>     >     > > > >>> > > > > Hi Tom,
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > sounds useful to me, thanks for the KIP!
>     >     > > > >>> > > > > The only thought that I had while reading was
> that
>     > this
>     >     > will
>     >     > > > >>> probably
>     >     > > > >>> > > > raise
>     >     > > > >>> > > > > questions about more involved conditions fairly
>     > quickly.
>     >     > For
>     >     > > > >>> example
>     >     > > > >>> > > the
>     >     > > > >>> > > > > "has-header" will cause an appetite for
> conditions
>     > like
>     >     > > > >>> > > > > "this-header-has-that-value".
>     >     > > > >>> > > > > This would necessitate two parameters to be
> passed
>     > into the
>     >     > > > >>> condition,
>     >     > > > >>> > > > > which I think is not currently included in the
> KIP.
>     > I am
>     >     > not
>     >     > > > >>> saying add
>     >     > > > >>> > > > > this now, but might it make sense to discuss a
>     > concept of
>     >     > how
>     >     > > > >>> that
>     >     > > > >>> > > might
>     >     > > > >>> > > > > look now, to avoid potential changes later on.
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Just of the top of my head it might look like:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> type:has-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > header-name:my-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > field-value:my-value
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Also, while writing that, I noticed that you
> have a
>     > version
>     >     > > > with
>     >     > > > >>> and
>     >     > > > >>> > > > > without "name" for your transformation in the
> KIP:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header
>     >     > > > >>> > > > > and
>     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Is this intentional and the name is optional?
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Best regards,
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Sönke
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
>     >     > tbentley@redhat.com>
>     >     > > > >>> wrote:
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Does anyone have any comments, feedback or
>     > thoughts about
>     >     > > > this?
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Thanks,
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Tom
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley
> <
>     >     > > > >>> tbentley@redhat.com>
>     >     > > > >>> > > > > wrote:
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > > Hi,
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > I've opened KIP-585 which is intended to
> provide
>     > a
>     >     > > > mechanism
>     >     > > > >>> to
>     >     > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect.
> I'd be
>     >     > grateful
>     >     > > > >>> for any
>     >     > > > >>> > > > > > > feedback:
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > >
>     >     > > > >>>
>     >     > > >
>     >     >
>     >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > Many thanks,
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > Tom
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > >
>     >     > > > >>>
>     >     > > > >>>
>     >     > > >
>     >     > >
>     >     >
>     >
>     >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi Tom,
I think we should go for a consistent naming convention for the predicates, maybe so they
read nicely if you say "IF" first. Here's a suggestion for predicates that the KIP could introduce:
 - TopicNameMatches
 - HeaderKeyMatches
 - RecordIsTombstone

On naming, I also suggest using org.apache.kafka.connect.predicates.Predicate
as the name for the interface.

I hadn't settled on a preference for negation. I suppose there are three results
from evaluating a predicate - true, false or an exception caught by Kafka Connect.
I suggest the exception case is always a predicate fail so the guarded transformation
is skipped. Initially I'd preferred having the predicate be configured to return a
negated result to give a "not match" behaviour. Thinking again now, I prefer something like:

transforms: t1
transforms.t1.?type: RecordIsTombstone
transforms.t1.type: org.apache.kafka.connect.transforms.InsertField$Value
transforms.t1.static.field: deleted
transforms.t1.static.value: true

The "?" means the transform guarded by the predicate runs in the Predicate
returns true.

transforms: t2
transforms.t2.?!type: org.apache.kafka.connect.predicates.TopicNameMatch
transforms.t2.?regex: my-prefix-.*
transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.t2.field: c1
  
The "?!" means the transform guarded by the predicate runs if the Predicate
returns false.

I think adding a Filter SMT that uses a Predicate is a nice idea.

Cheers,
Andrew

On 28/04/2020, 09:44, "Tom Bentley" <tb...@redhat.com> wrote:

    Hi Andrew and Chris,

    Firstly, thanks for the input.

    To me, the ?type syntax has some pros and cons. On the pros side:
    * it's pretty succinct
    * it's flexible enough for the use cases we've identified so far.

    On the cons side:
    * it's a bit cryptic; I don't think people who didn't know it already would
    be able to guess what ?type meant.
    * it doesn't work so well if there is a whole chain of transformations
    which need to be applied conditionally (that can still be done, of course,
    but the user has to duplicate the configuration for each SMT). Nor does it
    work if you need something akin to a "nested if" (you could fake it using
    And), but we don't really expect people to need that.

    Andrew, you didn't give an example of a negated match. Would you use
    something like !?type for that, or did you have another idea?

    On balance, I think this is a reasonable trade off.

    I agree that exposing RecordPredicate as a public API, while providing
    common implementation(s) would allow most use cases to be covered without
    coding, but also allow people with specific unaddressed needs to implement
    their own without having to write a streaming application to do so. Again,
    I think this is a reasonable trade off.

    It would be helpful to have a consensus about exactly which condition
    implementations the KIP should provide. I think we all agree that
    TopicMatches (or should it be TopicNameMatches?) should be included.
    HasHeader (or should it be HasHeaderWithName or HasHeaderMatching for a
    regex version?), And and Or all seem like they could be useful, even though
    no one has pointed to concrete use cases. It might also be worth
    considering that it would be natural to add a Filter SMT which filtered
    records using a RecordPredicate. Perhaps that provides motivation for
    conditions on the record key as well? If people think the Filter SMT would
    be useful I'm willing to include it in this KIP, since it seems
    straightforward.

    If people are happy with this approach, I'll update the KIP.

    Kind regards,

    Tom

    On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <an...@live.com>
    wrote:

    > Hi,
    > I'm suggesting that RecordPredicate be another pluggable interface that
    > users
    > could conceivably implement on their own, and that the KIP introduces the
    > most likely ones as supplied implementations, much as there are SMT
    > implementations such as HoistField. I don't really see users implementing
    > their own RecordPredicate, but the supplied ones would probably have a
    > common
    > interface and it seems a pity not to expose that as an external API.
    >
    > SMT is essentially a toolkit for Kafka Connect and I think RecordPredicate
    > would be a nice minor enhancement and I'd be happy to contribute code.
    >
    > In order to make it simpler to configure, I suggest that the class name
    > configured
    > using "?type" would be assumed to be in package
    > org.apache.kafka.connect.predicates
    > in the absence of a package name so that shorter names such as
    > "TopicMatches"
    > could be used.
    >
    > Cheers,
    > Andrew Schofield
    > IBM Event Streams
    >
    > On 25/04/2020, 21:22, "Christopher Egerton" <ch...@confluent.io> wrote:
    >
    >     Hi Andrew,
    >
    >     I know a DSL seems like overkill, and I'm not attached to it by any
    > means,
    >     but I do think it serves a vital purpose in that it allows people who
    > don't
    >     know how or have the time to write Java code to act on data being
    > processed
    >     by connectors.
    >
    >     Are you proposing that the "RecordPredicate" be another pluggable
    > interface
    >     that users could then implement on their own? Or, would this be purely
    > a
    >     syntactic expansion to SMT configuration with some subset of hard-coded
    >     predicates provided by the framework? I think there's value in the
    > latter,
    >     but the former doesn't seem like it'd bring much to the table as far as
    >     users are concerned, and for developers, it is an improvement, but not
    > a
    >     major one.
    >
    >     Cheers,
    >
    >     Chris
    >
    >     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
    >     schofieldandrewj@gmail.com> wrote:
    >
    >     > I wonder whether we're getting a bit overcomplicated with this. I
    > think all
    >     > that's required here is to add an optional guard predicate for a
    >     > Transformation.
    >     > The predicate cannot end the Transformation chain, but it can allow
    > or
    >     > prevent a particular
    >     > Transformation from running.
    >     >
    >     > How about this as syntax?
    >     > transforms: extractInt
    >     > transforms.extractInt.?type:
    >     > org.apache.kafka.connect.predicates.TopicMatches
    >     > transforms.extractInt.?regex: my-prefix-.*
    >     > transforms.extractInt.type:
    >     > org.apache.kafka.connect.transforms.ExtractField$Key
    >     > transforms.extractInt.field: c1
    >     >
    >     > The idea is that a Transformation can have an optional
    > RecordPredicate.
    >     > The RecordPredicate
    >     > can have configuration items, in a similar way to a Transformation.
    > The
    >     > syntax of
    >     > using a '?' prefix to separate configuration for the RecordPredicate
    > from
    >     > the configuration
    >     > for the Transformation could conceivably clash with an existing
    >     > Transformation
    >     > but the chances are minimal.
    >     >
    >     > This syntax doesn't allow for an 'else', but if the KIP offers say a
    >     > TopicMatches predicate
    >     > then that can be configured to return FALSE if the predicate matches.
    >     >
    >     > I feel that a DSL for SMTs is overkill. If you need something that
    >     > complex, it's
    >     > perhaps too complex for a transformation chain and it's really a
    > streaming
    >     > application.
    >     >
    >     > Andrew Schofield
    >     > IBM Event Streams
    >     >
    >     > On 2020/04/08 21:39:31, Christopher Egerton <ch...@confluent.io>
    > wrote:
    >     > > Hi Tom,
    >     > >
    >     > > With regards to the potential Transformation::validate method, I
    > don't
    >     > > quite follow your objections. The AbstractHerder class,
    > ConnectorConfig
    >     > > class, and embedding of ConfigDefs that happens with the two is all
    >     > > internal logic and we're allowed to modify it however we want, as
    > long as
    >     > > it doesn't alter any public-facing APIs (and if it does, we just
    > have to
    >     > > document those changes in a KIP). We don't have to embed the
    > ConfigDef
    >     > for
    >     > > a transformation chain inside another ConfigDef if the API we want
    > to
    >     > > present to our users doesn't play well with that part of the code
    > base.
    >     > > Additionally, beyond aligning with the existing API provided for
    >     > > connectors, another advantage is that it becomes possible to
    > validate
    >     > > properties for a configuration in the context of all other
    > properties, so
    >     > > to be clear, it's not just about preserving what may be perceived
    > as a
    >     > > superficial similarity and comes with real functional benefits
    > that can't
    >     > > be provided (at least, not as easily) by dynamic construction of a
    >     > > ConfigDef object.
    >     > >
    >     > > As far as the new proposal goes, I hate to say it, but I think
    > we're
    >     > stuck
    >     > > with the worst of both worlds here. Adding the new RecordPredicate
    >     > > interface seems like it defeats the whole purpose of SMTs, which
    > is to
    >     > > allow manipulation of a connector's event stream by users who don't
    >     > > necessarily know how or have the time to write Java code of their
    > own.
    >     > This
    >     > > is also why I'm in favor of adding a lightweight DSL for the
    > condition;
    >     > > emphasizing readability for people who aren't very familiar with
    > Connect
    >     > > and just want to get something going quickly should be a priority
    > for
    >     > SMTs.
    >     > > But if that's not going to happen with this KIP, I'd still choose
    > the
    >     > > simpler, less-flexible approach initially outlined, in order to
    > keep
    >     > things
    >     > > simple for people creating connectors and try to let them
    > accomplish what
    >     > > they want via configuration, not code.
    >     > >
    >     > > With regards to the question about where the line should be drawn
    > and how
    >     > > much is too much and comparisons to other stream processing
    > frameworks, I
    >     > > think the nature of SMTs draws the line quite nicely: you can only
    >     > process
    >     > > one message at a time. There's plenty of use cases out there for
    >     > > heavier-duty processing frameworks like Kafka Streams, with
    > aggregate
    >     > > operations, joining of streams, expanding a single message into
    > multiple
    >     > > messages, etc. With SMTs, none of this is possible; the general
    > use case
    >     > is
    >     > > to filter and clean a stream of data. If any of the heavier-weight
    > logic
    >     > > provided by, e.g., Streams, isn't required for a project, it
    > should be
    >     > > possible to get along with just a collection of sink connectors,
    > source
    >     > > connectors, a converter or two, and SMTs that smooth over any
    > differences
    >     > > in data format between what source connectors produce and what sink
    >     > > connectors expect. This is why I'm comfortable suggesting heavy
    > expansion
    >     > > of the out-of-the-box SMTs that we provide with Connect; as long as
    >     > they're
    >     > > reasonable to configure, they can greatly reduce the operational
    > burden
    >     > on
    >     > > anyone running and/or using a Connect cluster since they might
    > entirely
    >     > > replace additional services that would otherwise be required.
    >     > >
    >     > > All that said--this is a huge ask of someone who just wants to
    > support
    >     > the
    >     > > SMT equivalent of an "if" statement, and it's totally
    > understandable if
    >     > > that's too much to ask. The one concern I have left is that if we
    > expand
    >     > > the SMT in the future, there become compatibility concerns since
    > SMTs are
    >     > > pretty tightly-coupled with the worker on which they run (although
    >     > > technically, with some classpath/plugin path shuffling, you can run
    >     > > different versions of the out-of-the-box SMTs from the Connect
    > worker on
    >     > > which they're run). Someone might write a connector config with the
    >     > > If/Conditional/Whatever SMT with a condition type that works on one
    >     > worker,
    >     > > but that doesn't work on another worker that's running an earlier
    > version
    >     > > of Connect. This is why I'm in favor of adding extra predicates now
    >     > instead
    >     > > of later; if we're going to implement what has the potential to be
    > a bit
    >     > of
    >     > > a Swiss army knife SMT with room for future expansion of
    > configuration,
    >     > and
    >     > > we can think of a reasonable way to add that functionality now, it
    > seems
    >     > > better for users and administrators of Connect to try to do that
    > now
    >     > > instead of later.
    >     > >
    >     > > Cheers,
    >     > >
    >     > > Chris
    >     > >
    >     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com>
    > wrote:
    >     > >
    >     > > > Since no one objected I've updated the KIP with the revised way
    > to
    >     > > > configure this transformation.
    >     > > >
    >     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com>
    >     > wrote:
    >     > > >
    >     > > > > To come back about a point Chris made:
    >     > > > >
    >     > > > > 1. Instead of the new "ConfigDef config(Map<String, String>
    > props)"
    >     > > > method,
    >     > > > >> what would you think about adopting a similar approach as the
    >     > framework
    >     > > > >> uses with connectors, by adding a "Config validate(Map<String,
    >     > String>
    >     > > > >> props)" method that can perform custom validation outside of
    > what
    >     > can be
    >     > > > >> performed by the ConfigDef's single-property-at-a-time
    > validation?
    >     > It
    >     > > > may
    >     > > > >> be a little heavyweight for use with this particular SMT, but
    > it'd
    >     > > > provide
    >     > > > >> more flexibility for other SMT implementations and would
    > mirror an
    >     > API
    >     > > > >> that
    >     > > > >> developers targeting the framework are likely already
    > familiar with.
    >     > > > >>
    >     > > > >
    >     > > > > The validate() + config() approach taken for Connectors
    > doesn't quite
    >     > > > work
    >     > > > > for Transformations.
    >     > > > >
    >     > > > > The default Connector.validate() basically just calls
    >     > > > > `config().validate(connectorConfigs)` and returns a Config.
    >     > Separately
    >     > > > the
    >     > > > > AbstractHeader also calls `config()` and uses the Config and
    >     > ConfigDef to
    >     > > > > build the ConfigInfos. The contract is not really defined well
    >     > enough in
    >     > > > > the javadoc, but this means a connector can use validate() to
    > build
    >     > the
    >     > > > > ConfigDef which it will return from config().
    >     > > > >
    >     > > > > For Transformations we don't really want validate() to perform
    >     > validation
    >     > > > > at all since ultimately the transformations config will be
    > embedded
    >     > in
    >     > > > the
    >     > > > > connector's and will be validated by the connector itself. It
    >     > wouldn't be
    >     > > > > harmful if it _did_ perform validation, just unnecessary. But
    > having
    >     > a
    >     > > > > validate() method that ought not to validate() seems like a
    > recipe
    >     > for
    >     > > > > confusion.
    >     > > > >
    >     > > > > Also problematic is that there's no use for the Config
    > returned from
    >     > > > > Transformations.validate().
    >     > > > >
    >     > > > > So I'm not convinced that the superficial similarity really
    > gains
    >     > > > anything.
    >     > > > >
    >     > > > > Kind regards,
    >     > > > >
    >     > > > > Tom
    >     > > > >
    >     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
    > tbentley@redhat.com>
    >     > wrote:
    >     > > > >
    >     > > > >> Hi,
    >     > > > >>
    >     > > > >> Hi all,
    >     > > > >>
    >     > > > >> Thanks for the discussion so far.
    >     > > > >>
    >     > > > >> It seems a bit weird to me that when configuring the
    > Conditional SMT
    >     > > > with
    >     > > > >> a DSL you would use a concise, intuitive DSL for expressing
    > the
    >     > > > condition,
    >     > > > >> but not for the transforms that it's guarding. It also seems
    >     > natural, if
    >     > > > >> you support this for conditionally applying SMTs, that you'd
    > soon
    >     > want
    >     > > > to
    >     > > > >> support the same thing for a generic filter transformer.
    > Then, if
    >     > you
    >     > > > can
    >     > > > >> configure a filter transformation using this DSL, it becomes
    > odd
    >     > that
    >     > > > you
    >     > > > >> can't do this for mapping transformations. I think it would
    > be a
    >     > > > mistake to
    >     > > > >> go specifying an expression language for conditions when
    > really that
    >     > > > might
    >     > > > >> just be part of a language for transformations.
    >     > > > >>
    >     > > > >> I think it would be possible today to write an SMT which
    > allowed
    >     > you to
    >     > > > >> express the transformation in a DSL. Concretely, it's
    > possible to
    >     > > > imagine a
    >     > > > >> single transformation taking a DSL something like this:
    >     > > > >>
    >     > > > >>   compose(
    >     > > > >>     Flatten.Key(delimiter: ':'),
    >     > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
    >     > > > >>        then: Flatten.Value(delimiter: '/')))
    >     > > > >>
    >     > > > >> All I've really done here, beyond what was already proposed,
    > is
    >     > rename
    >     > > > >> Conditional to If, imagine a couple more bits of syntax
    >     > > > >> (SMTs being constructed by class name and named argument
    > invocation
    >     > > > >> syntax to support SMT constructors) and add a higher level
    > compose()
    >     > > > >> function for chaining SMTs (which could easily be replaced
    > with
    >     > brace
    >     > > > >> delimited blocks).
    >     > > > >>
    >     > > > >> That may be a discussion we should have, but I think in
    > _this_ KIP
    >     > we
    >     > > > >> should focus on the conditional part, since the above example
    >     > hopefully
    >     > > > >> shows that it would be possible to reuse it in a DSL if there
    > was
    >     > > > appetite
    >     > > > >> for that.
    >     > > > >>
    >     > > > >> With that in mind, and since my original suggestion for an
    >     > abbreviated
    >     > > > >> config syntax didn't appeal to people, I propose that we
    > stick with
    >     > the
    >     > > > >> existing norms for configuring this.
    >     > > > >> The previous example would look like this:
    >     > > > >>
    >     > > > >> transformation: flattenKey,if
    >     > > > >> transformation.flattenKey.type: Flatten.Key
    >     > > > >> transformation.flattenKey.delimiter: :
    >     > > > >> transformation.if.type: If
    >     > > > >> transformation.if.condition.type: TopicMatches
    >     > > > >> transformation.if.condition.pattern: blah-.*
    >     > > > >> transformation.if.then: flattenValue
    >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    >     > > > >> transformation.if.then.flattenValue.delimiter: /
    >     > > > >>
    >     > > > >> Also, I'm inclined to think we should stick to just supporting
    >     > > > >> TopicMatches and Not, since we've not identified an actual
    > need for
    >     > > > >> 'has-header', 'or' and 'and'.
    >     > > > >>
    >     > > > >> An example of the usage of Not would look like this:
    >     > > > >>
    >     > > > >> transformation: flattenKey,if
    >     > > > >> transformation.flattenKey.type: Flatten.Key
    >     > > > >> transformation.flattenKey.delimiter: :
    >     > > > >> transformation.if.type: If
    >     > > > >> transformation.if.condition.type: Not
    >     > > > >> transformation.if.condition.operand: startsWithBlah
    >     > > > >> transformation.if.condition.operand.startsWithBlah.type:
    >     > TopicMatches
    >     > > > >> transformation.if.condition.operand.startsWithBlah.pattern:
    > blah-.*
    >     > > > >> transformation.if.then: flattenValue
    >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    >     > > > >> transformation.if.then.flattenValue.delimiter: /
    >     > > > >>
    >     > > > >> Hopefully we can agree that this allows the conditional SMT
    > part to
    >     > make
    >     > > > >> progress without getting bogged down.
    >     > > > >>
    >     > > > >> Thoughts? If this seems broadly reasonable, I'll update the
    > KIP.
    >     > > > >>
    >     > > > >> Kind regards,
    >     > > > >>
    >     > > > >> Tom
    >     > > > >>
    >     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
    > gunnar@hibernate.org
    >     > >
    >     > > > >> wrote:
    >     > > > >>
    >     > > > >>> Hi all,
    >     > > > >>>
    >     > > > >>> Thanks a lot for this initiative, Tom!
    >     > > > >>>
    >     > > > >>> To shed some light, the use case where this first came up,
    > were
    >     > issues
    >     > > > >>> we saw with SMTs being applied to the different topics
    > produced by
    >     > the
    >     > > > >>> Debezium change data capture connectors. There are different
    > kinds
    >     > of
    >     > > > >>> topics (for change data, schema history, heartbeats etc.)
    > and the
    >     > > > >>> record structure to expect may vary between those. Hence we
    > saw
    >     > issues
    >     > > > >>> with SMTs like ExtractField, which for instance only should
    > be
    >     > applied
    >     > > > >>> to all change data topics but not the other ones.
    >     > > > >>>
    >     > > > >>> I like the overall approach; for Debezium's purposes, the
    > simple
    >     > topic
    >     > > > >>> matching and negation operators would be sufficient already.
    > I
    >     > agree
    >     > > > >>> with Chris and would prefer one single condition attribute,
    > which
    >     > > > >>> contains a single condition or potentially a logical
    > expression
    >     > with
    >     > > > >>> not, and, etc. I think it's less ambiguous, in particular
    > when it
    >     > > > >>> comes to ordering of the different conditions and
    > determining their
    >     > > > >>> precedence.
    >     > > > >>>
    >     > > > >>> Would love to see this feature in one or another way in
    > Connect.
    >     > > > >>>
    >     > > > >>> Best,
    >     > > > >>>
    >     > > > >>> --Gunnar
    >     > > > >>>
    >     > > > >>>
    >     > > > >>>
    >     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
    >     > > > >>> tbentley@redhat.com>:
    >     > > > >>> >
    >     > > > >>> > Hi Chris and Sönke,
    >     > > > >>> >
    >     > > > >>> > Using the numbering from Chris's email...
    >     > > > >>> >
    >     > > > >>> > 1. That's a good point, I'll see what is needed to make
    > that
    >     > work.
    >     > > > >>> >
    >     > > > >>> > 2. I'm happy enough to add support for "and" and "or" as
    > part of
    >     > this
    >     > > > >>> KIP
    >     > > > >>> > if people can see a need for it.
    >     > > > >>> >
    >     > > > >>> > In a similar vein, I was wondering about whether it would
    > be
    >     > > > worthwhile
    >     > > > >>> > having the equivalent of an "else" clause (what's in the
    > KIP is
    >     > > > >>> basically
    >     > > > >>> > an "if" statement). Without support for "else" I think
    > people
    >     > would
    >     > > > >>> often
    >     > > > >>> > need two conditionals, with the condition of one being the
    >     > negation
    >     > > > of
    >     > > > >>> the
    >     > > > >>> > condition of another.
    >     > > > >>> >
    >     > > > >>> > 3. I can see the attraction of an expression language. The
    > pros
    >     > > > include
    >     > > > >>> > being terse and familiar to programmers and potentially
    > very
    >     > flexible
    >     > > > >>> if
    >     > > > >>> > that's needed in the future. I had a play and implemented
    > it
    >     > using
    >     > > > >>> ANTLR
    >     > > > >>> > and it's not difficult to write a grammar and implement the
    >     > functions
    >     > > > >>> we've
    >     > > > >>> > already discussed and get decent error messages when the
    >     > expression
    >     > > > is
    >     > > > >>> > malformed. So on the one hand I quite like the idea. On
    > the other
    >     > > > hand
    >     > > > >>> it
    >     > > > >>> > feels like overkill for the use cases that have actually
    > been
    >     > > > >>> identified so
    >     > > > >>> > far.
    >     > > > >>> >
    >     > > > >>> > @Sönke what do you make of the expression language idea?
    >     > > > >>> >
    >     > > > >>> > Kind regards,
    >     > > > >>> >
    >     > > > >>> > Tom
    >     > > > >>> >
    >     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
    >     > > > >>> chrise@confluent.io>
    >     > > > >>> > wrote:
    >     > > > >>> >
    >     > > > >>> > > Hi Tom,
    >     > > > >>> > >
    >     > > > >>> > > This looks great and I'd love to see the out-of-the-box
    > SMTs
    >     > become
    >     > > > >>> even
    >     > > > >>> > > more powerful with the improvements you've proposed! Got
    > a few
    >     > > > >>> remarks and
    >     > > > >>> > > would be interested in your thoughts:
    >     > > > >>> > >
    >     > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String,
    > String>
    >     > props)"
    >     > > > >>> method,
    >     > > > >>> > > what would you think about adopting a similar approach
    > as the
    >     > > > >>> framework
    >     > > > >>> > > uses with connectors, by adding a "Config
    > validate(Map<String,
    >     > > > >>> String>
    >     > > > >>> > > props)" method that can perform custom validation
    > outside of
    >     > what
    >     > > > >>> can be
    >     > > > >>> > > performed by the ConfigDef's single-property-at-a-time
    >     > validation?
    >     > > > >>> It may
    >     > > > >>> > > be a little heavyweight for use with this particular
    > SMT, but
    >     > it'd
    >     > > > >>> provide
    >     > > > >>> > > more flexibility for other SMT implementations and would
    >     > mirror an
    >     > > > >>> API that
    >     > > > >>> > > developers targeting the framework are likely already
    > familiar
    >     > > > with.
    >     > > > >>> > > 2. The possibility for adding the logical operators
    > "and" and
    >     > "or"
    >     > > > is
    >     > > > >>> > > mentioned, but only as a potential future change and not
    > one
    >     > > > >>> proposed by
    >     > > > >>> > > this KIP. Why not include those operators sooner rather
    > than
    >     > later?
    >     > > > >>> > > 3. The syntax for named conditions that are then
    > referenced in
    >     > > > >>> logical
    >     > > > >>> > > operators is... tricky. It took me a few attempts to
    > grok the
    >     > > > example
    >     > > > >>> > > provided in the KIP after reading Sönke's question about
    > the
    >     > > > example
    >     > > > >>> for
    >     > > > >>> > > negation. What about a more sophisticated but less
    > verbose
    >     > syntax
    >     > > > >>> that
    >     > > > >>> > > supports a single configuration for the condition, even
    > with
    >     > > > logical
    >     > > > >>> > > operators? I'm thinking something like
    >     > > > >>> > > "transforms.conditionalExtract.condition:
    >     > > > not(has-header:my-header)"
    >     > > > >>> > > instead of the "transforms.conditionalExtract.condition:
    >     > > > >>> not:hasMyHeader"
    >     > > > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > has-header:my-header" properties. If support for a
    > logical
    >     > "and" is
    >     > > > >>> added,
    >     > > > >>> > > this could then be expanded to something like
    >     > > > >>> > > "transforms.conditionalExtract.condition:
    >     > > > and(has-header(my-header),
    >     > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be
    > additional
    >     > > > >>> complexity
    >     > > > >>> > > here with the need to escape parentheses and commas that
    > are
    >     > > > >>> intended to be
    >     > > > >>> > > treated literally (as part of a header name, for example)
    >     > instead
    >     > > > of
    >     > > > >>> as
    >     > > > >>> > > part of the syntax for the condition itself, but a little
    >     > > > additional
    >     > > > >>> > > complexity for edge cases like that may be warranted if
    > it
    >     > heavily
    >     > > > >>> reduces
    >     > > > >>> > > complexity for the common cases. The rationale for the
    > proposed
    >     > > > >>> > > parentheses-based syntax here instead of what's
    > mentioned in
    >     > the
    >     > > > KIP
    >     > > > >>> > > (something like "and: <condition1>, <condition2>") is to
    > help
    >     > with
    >     > > > >>> > > readability; we probably wouldn't need that with the
    > approach
    >     > of
    >     > > > >>> naming
    >     > > > >>> > > conditions via separate properties, but things may get a
    > little
    >     > > > >>> nasty with
    >     > > > >>> > > literal conditions included there, especially with the
    >     > possibility
    >     > > > of
    >     > > > >>> > > nesting. Finally, the shift in syntax here should make it
    >     > easier to
    >     > > > >>> handle
    >     > > > >>> > > cases like the header value matching brought up by
    > Sönke; it
    >     > might
    >     > > > >>> look
    >     > > > >>> > > something like "header-matches(header-name,
    >     > header-value-pattern)".
    >     > > > >>> > >
    >     > > > >>> > > Cheers,
    >     > > > >>> > >
    >     > > > >>> > > Chris
    >     > > > >>> > >
    >     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
    >     > tbentley@redhat.com>
    >     > > > >>> wrote:
    >     > > > >>> > >
    >     > > > >>> > > > Hi Sönke,
    >     > > > >>> > > >
    >     > > > >>> > > > Thanks for taking a look.
    >     > > > >>> > > >
    >     > > > >>> > > > Let me answer in reverse order, since I think it might
    > make
    >     > more
    >     > > > >>> sense
    >     > > > >>> > > that
    >     > > > >>> > > > way...
    >     > > > >>> > > >
    >     > > > >>> > > > Also, while writing that, I noticed that you have a
    > version
    >     > with
    >     > > > >>> and
    >     > > > >>> > > > > without "name" for your transformation in the KIP:
    >     > > > >>> > > > >
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > has-header:my-header
    >     > > > >>> > > > > and
    >     > > > >>> > > > > transforms.conditionalExtract.condition:
    >     > has-header:my-header
    >     > > > >>> > > > >
    >     > > > >>> > > >
    >     > > > >>> > > > The example
    >     > > > >>> > > >     transforms.conditionalExtract.condition:
    >     > has-header:my-header
    >     > > > >>> > > > is a "has-header" condition (the prefix of the config
    > value),
    >     > > > >>> which will
    >     > > > >>> > > > match records with a "my-header" header (given in the
    >     > suffix).
    >     > > > >>> > > >
    >     > > > >>> > > > The other example given is:
    >     > > > >>> > > >     transforms.conditionalExtract.condition:
    > not:hasMyHeader
    >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > > has-header:my-header
    >     > > > >>> > > > The root of the condition is a "not" condition (the
    > prefix
    >     > of the
    >     > > > >>> value
    >     > > > >>> > > for
    >     > > > >>> > > > the transforms.conditionalExtract.condition key) of
    > another
    >     > named
    >     > > > >>> > > condition
    >     > > > >>> > > > called "hasMyHeader" (the suffix). Any name could be
    > used
    >     > for the
    >     > > > >>> other
    >     > > > >>> > > > condition. That other condition is configured at
    >     > > > >>> > > >
    > "transforms.conditionalExtract.condition.<conditionName>".
    >     > That
    >     > > > >>> condition
    >     > > > >>> > > > is a "has-header" condition (the prefix), which will
    > match
    >     > > > records
    >     > > > >>> with a
    >     > > > >>> > > > "my-header" header (given in the suffix). So the
    >     > "has-header:"
    >     > > > >>> condition
    >     > > > >>> > > > type would always require a suffix, as would the "not:"
    >     > condition
    >     > > > >>> type.
    >     > > > >>> > > > Hypothetically you could have a "true" condition type
    > (which
    >     > > > would
    >     > > > >>> not
    >     > > > >>> > > > require a suffix), and the hypothetical binary
    > conditions
    >     > "and:"
    >     > > > >>> and
    >     > > > >>> > > "or:"
    >     > > > >>> > > > would require a pair of other condition names.
    >     > > > >>> > > >
    >     > > > >>> > > > So what's proposed is a scheme for encoding conditions
    > where
    >     > the
    >     > > > >>> > > condition
    >     > > > >>> > > > type is the prefix of the value of some "....condition"
    >     > config
    >     > > > >>> key, and
    >     > > > >>> > > the
    >     > > > >>> > > > optional suffix provides parameters for the condition.
    > This
    >     > makes
    >     > > > >>> those
    >     > > > >>> > > > parameters a bit inflexible, but is relatively
    > succinct.
    >     > > > >>> > > >
    >     > > > >>> > > > This leads on to your first point. You're right that
    > use
    >     > cases
    >     > > > >>> might
    >     > > > >>> > > appear
    >     > > > >>> > > > which need other conditions, and we should make it
    > flexible
    >     > > > enough
    >     > > > >>> to be
    >     > > > >>> > > > able to cope with future use cases. On the other hand,
    > I was
    >     > > > >>> concerned
    >     > > > >>> > > that
    >     > > > >>> > > > we end up with something which is quite complicated to
    >     > configure.
    >     > > > >>> (There
    >     > > > >>> > > > comes a point where it might makes more sense for the
    > user to
    >     > > > >>> write their
    >     > > > >>> > > > own SMT).
    >     > > > >>> > > >
    >     > > > >>> > > > Just of the top of my head it might look like:
    >     > > > >>> > > > >
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> type:has-header
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > > header-name:my-header
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > field-value:my-value
    >     > > > >>> > > > >
    >     > > > >>> > > >
    >     > > > >>> > > > That won't work because the format is basically a
    >     > > > >>> Properties/Map<String,
    >     > > > >>> > > > String> and what you've suggested has duplicate keys.
    >     > > > >>> > > >
    >     > > > >>> > > > One thing I did briefly consider what the ability to
    > treat
    >     > > > >>> conditions as
    >     > > > >>> > > > Configurable objects in their own right (opening up the
    >     > > > >>> possibility of
    >     > > > >>> > > > people supplying their own Conditions, just like they
    > can
    >     > supply
    >     > > > >>> their
    >     > > > >>> > > own
    >     > > > >>> > > > SMTs). That might be configured something like this:
    >     > > > >>> > > >
    >     > > > >>> > > >     transforms.conditionalExtract.condition: not
    >     > > > >>> > > >     transforms.conditionalExtract.condition.not.type:
    > Not
    >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.not.negated: foo
    >     > > > >>> > > >     transforms.conditionalExtract.condition.foo.type:
    >     > > > >>> HasHeaderWithValue
    >     > > > >>> > > >     transforms.conditionalExtract.condition.foo.header:
    >     > my-header
    >     > > > >>> > > >     transforms.conditionalExtract.condition.foo.value:
    >     > my-value
    >     > > > >>> > > >
    >     > > > >>> > > > I didn't propose that I couldn't see the use cases to
    > justify
    >     > > > this
    >     > > > >>> kind
    >     > > > >>> > > of
    >     > > > >>> > > > complexity, especially as the common case would surely
    > be
    >     > > > matching
    >     > > > >>> > > against
    >     > > > >>> > > > topic name (to be honest I wasn't completely convinced
    > by the
    >     > > > need
    >     > > > >>> for
    >     > > > >>> > > > "has-header"). In the current version of the KIP
    > that's just
    >     > > > >>> > > >
    >     > > > >>> > > >     transforms.conditionalExtract.condition:
    > topic-matches:
    >     > > > >>> my-prefix-.*
    >     > > > >>> > > >
    >     > > > >>> > > > but using the more flexible scheme that would require
    >     > something
    >     > > > >>> more like
    >     > > > >>> > > > this:
    >     > > > >>> > > >
    >     > > > >>> > > >     transforms.conditionalExtract.condition: bar
    >     > > > >>> > > >     transforms.conditionalExtract.condition.bar.type:
    >     > TopicMatch
    >     > > > >>> > > >
    >  transforms.conditionalExtract.condition.bar.pattern:
    >     > > > >>> my-prefix-.*
    >     > > > >>> > > >
    >     > > > >>> > > > If people know of use cases which would justify more
    >     > > > >>> sophistication, I'm
    >     > > > >>> > > > happy to reconsider.
    >     > > > >>> > > >
    >     > > > >>> > > > Thanks again for taking a look!
    >     > > > >>> > > >
    >     > > > >>> > > > Tom
    >     > > > >>> > > >
    >     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
    >     > > > >>> > > > <so...@opencore.com.invalid> wrote:
    >     > > > >>> > > >
    >     > > > >>> > > > > Hi Tom,
    >     > > > >>> > > > >
    >     > > > >>> > > > > sounds useful to me, thanks for the KIP!
    >     > > > >>> > > > > The only thought that I had while reading was that
    > this
    >     > will
    >     > > > >>> probably
    >     > > > >>> > > > raise
    >     > > > >>> > > > > questions about more involved conditions fairly
    > quickly.
    >     > For
    >     > > > >>> example
    >     > > > >>> > > the
    >     > > > >>> > > > > "has-header" will cause an appetite for conditions
    > like
    >     > > > >>> > > > > "this-header-has-that-value".
    >     > > > >>> > > > > This would necessitate two parameters to be passed
    > into the
    >     > > > >>> condition,
    >     > > > >>> > > > > which I think is not currently included in the KIP.
    > I am
    >     > not
    >     > > > >>> saying add
    >     > > > >>> > > > > this now, but might it make sense to discuss a
    > concept of
    >     > how
    >     > > > >>> that
    >     > > > >>> > > might
    >     > > > >>> > > > > look now, to avoid potential changes later on.
    >     > > > >>> > > > >
    >     > > > >>> > > > > Just of the top of my head it might look like:
    >     > > > >>> > > > >
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> type:has-header
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > > header-name:my-header
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > field-value:my-value
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > > Also, while writing that, I noticed that you have a
    > version
    >     > > > with
    >     > > > >>> and
    >     > > > >>> > > > > without "name" for your transformation in the KIP:
    >     > > > >>> > > > >
    >     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    >     > > > >>> > > has-header:my-header
    >     > > > >>> > > > > and
    >     > > > >>> > > > > transforms.conditionalExtract.condition:
    >     > has-header:my-header
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > > Is this intentional and the name is optional?
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > > Best regards,
    >     > > > >>> > > > >
    >     > > > >>> > > > > Sönke
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
    >     > tbentley@redhat.com>
    >     > > > >>> wrote:
    >     > > > >>> > > > > >
    >     > > > >>> > > > > > Does anyone have any comments, feedback or
    > thoughts about
    >     > > > this?
    >     > > > >>> > > > > >
    >     > > > >>> > > > > > Thanks,
    >     > > > >>> > > > > >
    >     > > > >>> > > > > > Tom
    >     > > > >>> > > > > >
    >     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
    >     > > > >>> tbentley@redhat.com>
    >     > > > >>> > > > > wrote:
    >     > > > >>> > > > > >
    >     > > > >>> > > > > > > Hi,
    >     > > > >>> > > > > > >
    >     > > > >>> > > > > > > I've opened KIP-585 which is intended to provide
    > a
    >     > > > mechanism
    >     > > > >>> to
    >     > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be
    >     > grateful
    >     > > > >>> for any
    >     > > > >>> > > > > > > feedback:
    >     > > > >>> > > > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > >
    >     > > > >>> > >
    >     > > > >>>
    >     > > >
    >     >
    > https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
    >     > > > >>> > > > > > >
    >     > > > >>> > > > > > > Many thanks,
    >     > > > >>> > > > > > >
    >     > > > >>> > > > > > > Tom
    >     > > > >>> > > > > > >
    >     > > > >>> > > > >
    >     > > > >>> > > >
    >     > > > >>> > >
    >     > > > >>>
    >     > > > >>>
    >     > > >
    >     > >
    >     >
    >
    >


Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Andrew and Chris,

Firstly, thanks for the input.

To me, the ?type syntax has some pros and cons. On the pros side:
* it's pretty succinct
* it's flexible enough for the use cases we've identified so far.

On the cons side:
* it's a bit cryptic; I don't think people who didn't know it already would
be able to guess what ?type meant.
* it doesn't work so well if there is a whole chain of transformations
which need to be applied conditionally (that can still be done, of course,
but the user has to duplicate the configuration for each SMT). Nor does it
work if you need something akin to a "nested if" (you could fake it using
And), but we don't really expect people to need that.

Andrew, you didn't give an example of a negated match. Would you use
something like !?type for that, or did you have another idea?

On balance, I think this is a reasonable trade off.

I agree that exposing RecordPredicate as a public API, while providing
common implementation(s) would allow most use cases to be covered without
coding, but also allow people with specific unaddressed needs to implement
their own without having to write a streaming application to do so. Again,
I think this is a reasonable trade off.

It would be helpful to have a consensus about exactly which condition
implementations the KIP should provide. I think we all agree that
TopicMatches (or should it be TopicNameMatches?) should be included.
HasHeader (or should it be HasHeaderWithName or HasHeaderMatching for a
regex version?), And and Or all seem like they could be useful, even though
no one has pointed to concrete use cases. It might also be worth
considering that it would be natural to add a Filter SMT which filtered
records using a RecordPredicate. Perhaps that provides motivation for
conditions on the record key as well? If people think the Filter SMT would
be useful I'm willing to include it in this KIP, since it seems
straightforward.

If people are happy with this approach, I'll update the KIP.

Kind regards,

Tom

On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <an...@live.com>
wrote:

> Hi,
> I'm suggesting that RecordPredicate be another pluggable interface that
> users
> could conceivably implement on their own, and that the KIP introduces the
> most likely ones as supplied implementations, much as there are SMT
> implementations such as HoistField. I don't really see users implementing
> their own RecordPredicate, but the supplied ones would probably have a
> common
> interface and it seems a pity not to expose that as an external API.
>
> SMT is essentially a toolkit for Kafka Connect and I think RecordPredicate
> would be a nice minor enhancement and I'd be happy to contribute code.
>
> In order to make it simpler to configure, I suggest that the class name
> configured
> using "?type" would be assumed to be in package
> org.apache.kafka.connect.predicates
> in the absence of a package name so that shorter names such as
> "TopicMatches"
> could be used.
>
> Cheers,
> Andrew Schofield
> IBM Event Streams
>
> On 25/04/2020, 21:22, "Christopher Egerton" <ch...@confluent.io> wrote:
>
>     Hi Andrew,
>
>     I know a DSL seems like overkill, and I'm not attached to it by any
> means,
>     but I do think it serves a vital purpose in that it allows people who
> don't
>     know how or have the time to write Java code to act on data being
> processed
>     by connectors.
>
>     Are you proposing that the "RecordPredicate" be another pluggable
> interface
>     that users could then implement on their own? Or, would this be purely
> a
>     syntactic expansion to SMT configuration with some subset of hard-coded
>     predicates provided by the framework? I think there's value in the
> latter,
>     but the former doesn't seem like it'd bring much to the table as far as
>     users are concerned, and for developers, it is an improvement, but not
> a
>     major one.
>
>     Cheers,
>
>     Chris
>
>     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
>     schofieldandrewj@gmail.com> wrote:
>
>     > I wonder whether we're getting a bit overcomplicated with this. I
> think all
>     > that's required here is to add an optional guard predicate for a
>     > Transformation.
>     > The predicate cannot end the Transformation chain, but it can allow
> or
>     > prevent a particular
>     > Transformation from running.
>     >
>     > How about this as syntax?
>     > transforms: extractInt
>     > transforms.extractInt.?type:
>     > org.apache.kafka.connect.predicates.TopicMatches
>     > transforms.extractInt.?regex: my-prefix-.*
>     > transforms.extractInt.type:
>     > org.apache.kafka.connect.transforms.ExtractField$Key
>     > transforms.extractInt.field: c1
>     >
>     > The idea is that a Transformation can have an optional
> RecordPredicate.
>     > The RecordPredicate
>     > can have configuration items, in a similar way to a Transformation.
> The
>     > syntax of
>     > using a '?' prefix to separate configuration for the RecordPredicate
> from
>     > the configuration
>     > for the Transformation could conceivably clash with an existing
>     > Transformation
>     > but the chances are minimal.
>     >
>     > This syntax doesn't allow for an 'else', but if the KIP offers say a
>     > TopicMatches predicate
>     > then that can be configured to return FALSE if the predicate matches.
>     >
>     > I feel that a DSL for SMTs is overkill. If you need something that
>     > complex, it's
>     > perhaps too complex for a transformation chain and it's really a
> streaming
>     > application.
>     >
>     > Andrew Schofield
>     > IBM Event Streams
>     >
>     > On 2020/04/08 21:39:31, Christopher Egerton <ch...@confluent.io>
> wrote:
>     > > Hi Tom,
>     > >
>     > > With regards to the potential Transformation::validate method, I
> don't
>     > > quite follow your objections. The AbstractHerder class,
> ConnectorConfig
>     > > class, and embedding of ConfigDefs that happens with the two is all
>     > > internal logic and we're allowed to modify it however we want, as
> long as
>     > > it doesn't alter any public-facing APIs (and if it does, we just
> have to
>     > > document those changes in a KIP). We don't have to embed the
> ConfigDef
>     > for
>     > > a transformation chain inside another ConfigDef if the API we want
> to
>     > > present to our users doesn't play well with that part of the code
> base.
>     > > Additionally, beyond aligning with the existing API provided for
>     > > connectors, another advantage is that it becomes possible to
> validate
>     > > properties for a configuration in the context of all other
> properties, so
>     > > to be clear, it's not just about preserving what may be perceived
> as a
>     > > superficial similarity and comes with real functional benefits
> that can't
>     > > be provided (at least, not as easily) by dynamic construction of a
>     > > ConfigDef object.
>     > >
>     > > As far as the new proposal goes, I hate to say it, but I think
> we're
>     > stuck
>     > > with the worst of both worlds here. Adding the new RecordPredicate
>     > > interface seems like it defeats the whole purpose of SMTs, which
> is to
>     > > allow manipulation of a connector's event stream by users who don't
>     > > necessarily know how or have the time to write Java code of their
> own.
>     > This
>     > > is also why I'm in favor of adding a lightweight DSL for the
> condition;
>     > > emphasizing readability for people who aren't very familiar with
> Connect
>     > > and just want to get something going quickly should be a priority
> for
>     > SMTs.
>     > > But if that's not going to happen with this KIP, I'd still choose
> the
>     > > simpler, less-flexible approach initially outlined, in order to
> keep
>     > things
>     > > simple for people creating connectors and try to let them
> accomplish what
>     > > they want via configuration, not code.
>     > >
>     > > With regards to the question about where the line should be drawn
> and how
>     > > much is too much and comparisons to other stream processing
> frameworks, I
>     > > think the nature of SMTs draws the line quite nicely: you can only
>     > process
>     > > one message at a time. There's plenty of use cases out there for
>     > > heavier-duty processing frameworks like Kafka Streams, with
> aggregate
>     > > operations, joining of streams, expanding a single message into
> multiple
>     > > messages, etc. With SMTs, none of this is possible; the general
> use case
>     > is
>     > > to filter and clean a stream of data. If any of the heavier-weight
> logic
>     > > provided by, e.g., Streams, isn't required for a project, it
> should be
>     > > possible to get along with just a collection of sink connectors,
> source
>     > > connectors, a converter or two, and SMTs that smooth over any
> differences
>     > > in data format between what source connectors produce and what sink
>     > > connectors expect. This is why I'm comfortable suggesting heavy
> expansion
>     > > of the out-of-the-box SMTs that we provide with Connect; as long as
>     > they're
>     > > reasonable to configure, they can greatly reduce the operational
> burden
>     > on
>     > > anyone running and/or using a Connect cluster since they might
> entirely
>     > > replace additional services that would otherwise be required.
>     > >
>     > > All that said--this is a huge ask of someone who just wants to
> support
>     > the
>     > > SMT equivalent of an "if" statement, and it's totally
> understandable if
>     > > that's too much to ask. The one concern I have left is that if we
> expand
>     > > the SMT in the future, there become compatibility concerns since
> SMTs are
>     > > pretty tightly-coupled with the worker on which they run (although
>     > > technically, with some classpath/plugin path shuffling, you can run
>     > > different versions of the out-of-the-box SMTs from the Connect
> worker on
>     > > which they're run). Someone might write a connector config with the
>     > > If/Conditional/Whatever SMT with a condition type that works on one
>     > worker,
>     > > but that doesn't work on another worker that's running an earlier
> version
>     > > of Connect. This is why I'm in favor of adding extra predicates now
>     > instead
>     > > of later; if we're going to implement what has the potential to be
> a bit
>     > of
>     > > a Swiss army knife SMT with room for future expansion of
> configuration,
>     > and
>     > > we can think of a reasonable way to add that functionality now, it
> seems
>     > > better for users and administrators of Connect to try to do that
> now
>     > > instead of later.
>     > >
>     > > Cheers,
>     > >
>     > > Chris
>     > >
>     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com>
> wrote:
>     > >
>     > > > Since no one objected I've updated the KIP with the revised way
> to
>     > > > configure this transformation.
>     > > >
>     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com>
>     > wrote:
>     > > >
>     > > > > To come back about a point Chris made:
>     > > > >
>     > > > > 1. Instead of the new "ConfigDef config(Map<String, String>
> props)"
>     > > > method,
>     > > > >> what would you think about adopting a similar approach as the
>     > framework
>     > > > >> uses with connectors, by adding a "Config validate(Map<String,
>     > String>
>     > > > >> props)" method that can perform custom validation outside of
> what
>     > can be
>     > > > >> performed by the ConfigDef's single-property-at-a-time
> validation?
>     > It
>     > > > may
>     > > > >> be a little heavyweight for use with this particular SMT, but
> it'd
>     > > > provide
>     > > > >> more flexibility for other SMT implementations and would
> mirror an
>     > API
>     > > > >> that
>     > > > >> developers targeting the framework are likely already
> familiar with.
>     > > > >>
>     > > > >
>     > > > > The validate() + config() approach taken for Connectors
> doesn't quite
>     > > > work
>     > > > > for Transformations.
>     > > > >
>     > > > > The default Connector.validate() basically just calls
>     > > > > `config().validate(connectorConfigs)` and returns a Config.
>     > Separately
>     > > > the
>     > > > > AbstractHeader also calls `config()` and uses the Config and
>     > ConfigDef to
>     > > > > build the ConfigInfos. The contract is not really defined well
>     > enough in
>     > > > > the javadoc, but this means a connector can use validate() to
> build
>     > the
>     > > > > ConfigDef which it will return from config().
>     > > > >
>     > > > > For Transformations we don't really want validate() to perform
>     > validation
>     > > > > at all since ultimately the transformations config will be
> embedded
>     > in
>     > > > the
>     > > > > connector's and will be validated by the connector itself. It
>     > wouldn't be
>     > > > > harmful if it _did_ perform validation, just unnecessary. But
> having
>     > a
>     > > > > validate() method that ought not to validate() seems like a
> recipe
>     > for
>     > > > > confusion.
>     > > > >
>     > > > > Also problematic is that there's no use for the Config
> returned from
>     > > > > Transformations.validate().
>     > > > >
>     > > > > So I'm not convinced that the superficial similarity really
> gains
>     > > > anything.
>     > > > >
>     > > > > Kind regards,
>     > > > >
>     > > > > Tom
>     > > > >
>     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
> tbentley@redhat.com>
>     > wrote:
>     > > > >
>     > > > >> Hi,
>     > > > >>
>     > > > >> Hi all,
>     > > > >>
>     > > > >> Thanks for the discussion so far.
>     > > > >>
>     > > > >> It seems a bit weird to me that when configuring the
> Conditional SMT
>     > > > with
>     > > > >> a DSL you would use a concise, intuitive DSL for expressing
> the
>     > > > condition,
>     > > > >> but not for the transforms that it's guarding. It also seems
>     > natural, if
>     > > > >> you support this for conditionally applying SMTs, that you'd
> soon
>     > want
>     > > > to
>     > > > >> support the same thing for a generic filter transformer.
> Then, if
>     > you
>     > > > can
>     > > > >> configure a filter transformation using this DSL, it becomes
> odd
>     > that
>     > > > you
>     > > > >> can't do this for mapping transformations. I think it would
> be a
>     > > > mistake to
>     > > > >> go specifying an expression language for conditions when
> really that
>     > > > might
>     > > > >> just be part of a language for transformations.
>     > > > >>
>     > > > >> I think it would be possible today to write an SMT which
> allowed
>     > you to
>     > > > >> express the transformation in a DSL. Concretely, it's
> possible to
>     > > > imagine a
>     > > > >> single transformation taking a DSL something like this:
>     > > > >>
>     > > > >>   compose(
>     > > > >>     Flatten.Key(delimiter: ':'),
>     > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
>     > > > >>        then: Flatten.Value(delimiter: '/')))
>     > > > >>
>     > > > >> All I've really done here, beyond what was already proposed,
> is
>     > rename
>     > > > >> Conditional to If, imagine a couple more bits of syntax
>     > > > >> (SMTs being constructed by class name and named argument
> invocation
>     > > > >> syntax to support SMT constructors) and add a higher level
> compose()
>     > > > >> function for chaining SMTs (which could easily be replaced
> with
>     > brace
>     > > > >> delimited blocks).
>     > > > >>
>     > > > >> That may be a discussion we should have, but I think in
> _this_ KIP
>     > we
>     > > > >> should focus on the conditional part, since the above example
>     > hopefully
>     > > > >> shows that it would be possible to reuse it in a DSL if there
> was
>     > > > appetite
>     > > > >> for that.
>     > > > >>
>     > > > >> With that in mind, and since my original suggestion for an
>     > abbreviated
>     > > > >> config syntax didn't appeal to people, I propose that we
> stick with
>     > the
>     > > > >> existing norms for configuring this.
>     > > > >> The previous example would look like this:
>     > > > >>
>     > > > >> transformation: flattenKey,if
>     > > > >> transformation.flattenKey.type: Flatten.Key
>     > > > >> transformation.flattenKey.delimiter: :
>     > > > >> transformation.if.type: If
>     > > > >> transformation.if.condition.type: TopicMatches
>     > > > >> transformation.if.condition.pattern: blah-.*
>     > > > >> transformation.if.then: flattenValue
>     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     > > > >> transformation.if.then.flattenValue.delimiter: /
>     > > > >>
>     > > > >> Also, I'm inclined to think we should stick to just supporting
>     > > > >> TopicMatches and Not, since we've not identified an actual
> need for
>     > > > >> 'has-header', 'or' and 'and'.
>     > > > >>
>     > > > >> An example of the usage of Not would look like this:
>     > > > >>
>     > > > >> transformation: flattenKey,if
>     > > > >> transformation.flattenKey.type: Flatten.Key
>     > > > >> transformation.flattenKey.delimiter: :
>     > > > >> transformation.if.type: If
>     > > > >> transformation.if.condition.type: Not
>     > > > >> transformation.if.condition.operand: startsWithBlah
>     > > > >> transformation.if.condition.operand.startsWithBlah.type:
>     > TopicMatches
>     > > > >> transformation.if.condition.operand.startsWithBlah.pattern:
> blah-.*
>     > > > >> transformation.if.then: flattenValue
>     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     > > > >> transformation.if.then.flattenValue.delimiter: /
>     > > > >>
>     > > > >> Hopefully we can agree that this allows the conditional SMT
> part to
>     > make
>     > > > >> progress without getting bogged down.
>     > > > >>
>     > > > >> Thoughts? If this seems broadly reasonable, I'll update the
> KIP.
>     > > > >>
>     > > > >> Kind regards,
>     > > > >>
>     > > > >> Tom
>     > > > >>
>     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
> gunnar@hibernate.org
>     > >
>     > > > >> wrote:
>     > > > >>
>     > > > >>> Hi all,
>     > > > >>>
>     > > > >>> Thanks a lot for this initiative, Tom!
>     > > > >>>
>     > > > >>> To shed some light, the use case where this first came up,
> were
>     > issues
>     > > > >>> we saw with SMTs being applied to the different topics
> produced by
>     > the
>     > > > >>> Debezium change data capture connectors. There are different
> kinds
>     > of
>     > > > >>> topics (for change data, schema history, heartbeats etc.)
> and the
>     > > > >>> record structure to expect may vary between those. Hence we
> saw
>     > issues
>     > > > >>> with SMTs like ExtractField, which for instance only should
> be
>     > applied
>     > > > >>> to all change data topics but not the other ones.
>     > > > >>>
>     > > > >>> I like the overall approach; for Debezium's purposes, the
> simple
>     > topic
>     > > > >>> matching and negation operators would be sufficient already.
> I
>     > agree
>     > > > >>> with Chris and would prefer one single condition attribute,
> which
>     > > > >>> contains a single condition or potentially a logical
> expression
>     > with
>     > > > >>> not, and, etc. I think it's less ambiguous, in particular
> when it
>     > > > >>> comes to ordering of the different conditions and
> determining their
>     > > > >>> precedence.
>     > > > >>>
>     > > > >>> Would love to see this feature in one or another way in
> Connect.
>     > > > >>>
>     > > > >>> Best,
>     > > > >>>
>     > > > >>> --Gunnar
>     > > > >>>
>     > > > >>>
>     > > > >>>
>     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>     > > > >>> tbentley@redhat.com>:
>     > > > >>> >
>     > > > >>> > Hi Chris and Sönke,
>     > > > >>> >
>     > > > >>> > Using the numbering from Chris's email...
>     > > > >>> >
>     > > > >>> > 1. That's a good point, I'll see what is needed to make
> that
>     > work.
>     > > > >>> >
>     > > > >>> > 2. I'm happy enough to add support for "and" and "or" as
> part of
>     > this
>     > > > >>> KIP
>     > > > >>> > if people can see a need for it.
>     > > > >>> >
>     > > > >>> > In a similar vein, I was wondering about whether it would
> be
>     > > > worthwhile
>     > > > >>> > having the equivalent of an "else" clause (what's in the
> KIP is
>     > > > >>> basically
>     > > > >>> > an "if" statement). Without support for "else" I think
> people
>     > would
>     > > > >>> often
>     > > > >>> > need two conditionals, with the condition of one being the
>     > negation
>     > > > of
>     > > > >>> the
>     > > > >>> > condition of another.
>     > > > >>> >
>     > > > >>> > 3. I can see the attraction of an expression language. The
> pros
>     > > > include
>     > > > >>> > being terse and familiar to programmers and potentially
> very
>     > flexible
>     > > > >>> if
>     > > > >>> > that's needed in the future. I had a play and implemented
> it
>     > using
>     > > > >>> ANTLR
>     > > > >>> > and it's not difficult to write a grammar and implement the
>     > functions
>     > > > >>> we've
>     > > > >>> > already discussed and get decent error messages when the
>     > expression
>     > > > is
>     > > > >>> > malformed. So on the one hand I quite like the idea. On
> the other
>     > > > hand
>     > > > >>> it
>     > > > >>> > feels like overkill for the use cases that have actually
> been
>     > > > >>> identified so
>     > > > >>> > far.
>     > > > >>> >
>     > > > >>> > @Sönke what do you make of the expression language idea?
>     > > > >>> >
>     > > > >>> > Kind regards,
>     > > > >>> >
>     > > > >>> > Tom
>     > > > >>> >
>     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
>     > > > >>> chrise@confluent.io>
>     > > > >>> > wrote:
>     > > > >>> >
>     > > > >>> > > Hi Tom,
>     > > > >>> > >
>     > > > >>> > > This looks great and I'd love to see the out-of-the-box
> SMTs
>     > become
>     > > > >>> even
>     > > > >>> > > more powerful with the improvements you've proposed! Got
> a few
>     > > > >>> remarks and
>     > > > >>> > > would be interested in your thoughts:
>     > > > >>> > >
>     > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String,
> String>
>     > props)"
>     > > > >>> method,
>     > > > >>> > > what would you think about adopting a similar approach
> as the
>     > > > >>> framework
>     > > > >>> > > uses with connectors, by adding a "Config
> validate(Map<String,
>     > > > >>> String>
>     > > > >>> > > props)" method that can perform custom validation
> outside of
>     > what
>     > > > >>> can be
>     > > > >>> > > performed by the ConfigDef's single-property-at-a-time
>     > validation?
>     > > > >>> It may
>     > > > >>> > > be a little heavyweight for use with this particular
> SMT, but
>     > it'd
>     > > > >>> provide
>     > > > >>> > > more flexibility for other SMT implementations and would
>     > mirror an
>     > > > >>> API that
>     > > > >>> > > developers targeting the framework are likely already
> familiar
>     > > > with.
>     > > > >>> > > 2. The possibility for adding the logical operators
> "and" and
>     > "or"
>     > > > is
>     > > > >>> > > mentioned, but only as a potential future change and not
> one
>     > > > >>> proposed by
>     > > > >>> > > this KIP. Why not include those operators sooner rather
> than
>     > later?
>     > > > >>> > > 3. The syntax for named conditions that are then
> referenced in
>     > > > >>> logical
>     > > > >>> > > operators is... tricky. It took me a few attempts to
> grok the
>     > > > example
>     > > > >>> > > provided in the KIP after reading Sönke's question about
> the
>     > > > example
>     > > > >>> for
>     > > > >>> > > negation. What about a more sophisticated but less
> verbose
>     > syntax
>     > > > >>> that
>     > > > >>> > > supports a single configuration for the condition, even
> with
>     > > > logical
>     > > > >>> > > operators? I'm thinking something like
>     > > > >>> > > "transforms.conditionalExtract.condition:
>     > > > not(has-header:my-header)"
>     > > > >>> > > instead of the "transforms.conditionalExtract.condition:
>     > > > >>> not:hasMyHeader"
>     > > > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > has-header:my-header" properties. If support for a
> logical
>     > "and" is
>     > > > >>> added,
>     > > > >>> > > this could then be expanded to something like
>     > > > >>> > > "transforms.conditionalExtract.condition:
>     > > > and(has-header(my-header),
>     > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be
> additional
>     > > > >>> complexity
>     > > > >>> > > here with the need to escape parentheses and commas that
> are
>     > > > >>> intended to be
>     > > > >>> > > treated literally (as part of a header name, for example)
>     > instead
>     > > > of
>     > > > >>> as
>     > > > >>> > > part of the syntax for the condition itself, but a little
>     > > > additional
>     > > > >>> > > complexity for edge cases like that may be warranted if
> it
>     > heavily
>     > > > >>> reduces
>     > > > >>> > > complexity for the common cases. The rationale for the
> proposed
>     > > > >>> > > parentheses-based syntax here instead of what's
> mentioned in
>     > the
>     > > > KIP
>     > > > >>> > > (something like "and: <condition1>, <condition2>") is to
> help
>     > with
>     > > > >>> > > readability; we probably wouldn't need that with the
> approach
>     > of
>     > > > >>> naming
>     > > > >>> > > conditions via separate properties, but things may get a
> little
>     > > > >>> nasty with
>     > > > >>> > > literal conditions included there, especially with the
>     > possibility
>     > > > of
>     > > > >>> > > nesting. Finally, the shift in syntax here should make it
>     > easier to
>     > > > >>> handle
>     > > > >>> > > cases like the header value matching brought up by
> Sönke; it
>     > might
>     > > > >>> look
>     > > > >>> > > something like "header-matches(header-name,
>     > header-value-pattern)".
>     > > > >>> > >
>     > > > >>> > > Cheers,
>     > > > >>> > >
>     > > > >>> > > Chris
>     > > > >>> > >
>     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
>     > tbentley@redhat.com>
>     > > > >>> wrote:
>     > > > >>> > >
>     > > > >>> > > > Hi Sönke,
>     > > > >>> > > >
>     > > > >>> > > > Thanks for taking a look.
>     > > > >>> > > >
>     > > > >>> > > > Let me answer in reverse order, since I think it might
> make
>     > more
>     > > > >>> sense
>     > > > >>> > > that
>     > > > >>> > > > way...
>     > > > >>> > > >
>     > > > >>> > > > Also, while writing that, I noticed that you have a
> version
>     > with
>     > > > >>> and
>     > > > >>> > > > > without "name" for your transformation in the KIP:
>     > > > >>> > > > >
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > has-header:my-header
>     > > > >>> > > > > and
>     > > > >>> > > > > transforms.conditionalExtract.condition:
>     > has-header:my-header
>     > > > >>> > > > >
>     > > > >>> > > >
>     > > > >>> > > > The example
>     > > > >>> > > >     transforms.conditionalExtract.condition:
>     > has-header:my-header
>     > > > >>> > > > is a "has-header" condition (the prefix of the config
> value),
>     > > > >>> which will
>     > > > >>> > > > match records with a "my-header" header (given in the
>     > suffix).
>     > > > >>> > > >
>     > > > >>> > > > The other example given is:
>     > > > >>> > > >     transforms.conditionalExtract.condition:
> not:hasMyHeader
>     > > > >>> > > >
>  transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > > has-header:my-header
>     > > > >>> > > > The root of the condition is a "not" condition (the
> prefix
>     > of the
>     > > > >>> value
>     > > > >>> > > for
>     > > > >>> > > > the transforms.conditionalExtract.condition key) of
> another
>     > named
>     > > > >>> > > condition
>     > > > >>> > > > called "hasMyHeader" (the suffix). Any name could be
> used
>     > for the
>     > > > >>> other
>     > > > >>> > > > condition. That other condition is configured at
>     > > > >>> > > >
> "transforms.conditionalExtract.condition.<conditionName>".
>     > That
>     > > > >>> condition
>     > > > >>> > > > is a "has-header" condition (the prefix), which will
> match
>     > > > records
>     > > > >>> with a
>     > > > >>> > > > "my-header" header (given in the suffix). So the
>     > "has-header:"
>     > > > >>> condition
>     > > > >>> > > > type would always require a suffix, as would the "not:"
>     > condition
>     > > > >>> type.
>     > > > >>> > > > Hypothetically you could have a "true" condition type
> (which
>     > > > would
>     > > > >>> not
>     > > > >>> > > > require a suffix), and the hypothetical binary
> conditions
>     > "and:"
>     > > > >>> and
>     > > > >>> > > "or:"
>     > > > >>> > > > would require a pair of other condition names.
>     > > > >>> > > >
>     > > > >>> > > > So what's proposed is a scheme for encoding conditions
> where
>     > the
>     > > > >>> > > condition
>     > > > >>> > > > type is the prefix of the value of some "....condition"
>     > config
>     > > > >>> key, and
>     > > > >>> > > the
>     > > > >>> > > > optional suffix provides parameters for the condition.
> This
>     > makes
>     > > > >>> those
>     > > > >>> > > > parameters a bit inflexible, but is relatively
> succinct.
>     > > > >>> > > >
>     > > > >>> > > > This leads on to your first point. You're right that
> use
>     > cases
>     > > > >>> might
>     > > > >>> > > appear
>     > > > >>> > > > which need other conditions, and we should make it
> flexible
>     > > > enough
>     > > > >>> to be
>     > > > >>> > > > able to cope with future use cases. On the other hand,
> I was
>     > > > >>> concerned
>     > > > >>> > > that
>     > > > >>> > > > we end up with something which is quite complicated to
>     > configure.
>     > > > >>> (There
>     > > > >>> > > > comes a point where it might makes more sense for the
> user to
>     > > > >>> write their
>     > > > >>> > > > own SMT).
>     > > > >>> > > >
>     > > > >>> > > > Just of the top of my head it might look like:
>     > > > >>> > > > >
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> type:has-header
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > > header-name:my-header
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > field-value:my-value
>     > > > >>> > > > >
>     > > > >>> > > >
>     > > > >>> > > > That won't work because the format is basically a
>     > > > >>> Properties/Map<String,
>     > > > >>> > > > String> and what you've suggested has duplicate keys.
>     > > > >>> > > >
>     > > > >>> > > > One thing I did briefly consider what the ability to
> treat
>     > > > >>> conditions as
>     > > > >>> > > > Configurable objects in their own right (opening up the
>     > > > >>> possibility of
>     > > > >>> > > > people supplying their own Conditions, just like they
> can
>     > supply
>     > > > >>> their
>     > > > >>> > > own
>     > > > >>> > > > SMTs). That might be configured something like this:
>     > > > >>> > > >
>     > > > >>> > > >     transforms.conditionalExtract.condition: not
>     > > > >>> > > >     transforms.conditionalExtract.condition.not.type:
> Not
>     > > > >>> > > >
>  transforms.conditionalExtract.condition.not.negated: foo
>     > > > >>> > > >     transforms.conditionalExtract.condition.foo.type:
>     > > > >>> HasHeaderWithValue
>     > > > >>> > > >     transforms.conditionalExtract.condition.foo.header:
>     > my-header
>     > > > >>> > > >     transforms.conditionalExtract.condition.foo.value:
>     > my-value
>     > > > >>> > > >
>     > > > >>> > > > I didn't propose that I couldn't see the use cases to
> justify
>     > > > this
>     > > > >>> kind
>     > > > >>> > > of
>     > > > >>> > > > complexity, especially as the common case would surely
> be
>     > > > matching
>     > > > >>> > > against
>     > > > >>> > > > topic name (to be honest I wasn't completely convinced
> by the
>     > > > need
>     > > > >>> for
>     > > > >>> > > > "has-header"). In the current version of the KIP
> that's just
>     > > > >>> > > >
>     > > > >>> > > >     transforms.conditionalExtract.condition:
> topic-matches:
>     > > > >>> my-prefix-.*
>     > > > >>> > > >
>     > > > >>> > > > but using the more flexible scheme that would require
>     > something
>     > > > >>> more like
>     > > > >>> > > > this:
>     > > > >>> > > >
>     > > > >>> > > >     transforms.conditionalExtract.condition: bar
>     > > > >>> > > >     transforms.conditionalExtract.condition.bar.type:
>     > TopicMatch
>     > > > >>> > > >
>  transforms.conditionalExtract.condition.bar.pattern:
>     > > > >>> my-prefix-.*
>     > > > >>> > > >
>     > > > >>> > > > If people know of use cases which would justify more
>     > > > >>> sophistication, I'm
>     > > > >>> > > > happy to reconsider.
>     > > > >>> > > >
>     > > > >>> > > > Thanks again for taking a look!
>     > > > >>> > > >
>     > > > >>> > > > Tom
>     > > > >>> > > >
>     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>     > > > >>> > > > <so...@opencore.com.invalid> wrote:
>     > > > >>> > > >
>     > > > >>> > > > > Hi Tom,
>     > > > >>> > > > >
>     > > > >>> > > > > sounds useful to me, thanks for the KIP!
>     > > > >>> > > > > The only thought that I had while reading was that
> this
>     > will
>     > > > >>> probably
>     > > > >>> > > > raise
>     > > > >>> > > > > questions about more involved conditions fairly
> quickly.
>     > For
>     > > > >>> example
>     > > > >>> > > the
>     > > > >>> > > > > "has-header" will cause an appetite for conditions
> like
>     > > > >>> > > > > "this-header-has-that-value".
>     > > > >>> > > > > This would necessitate two parameters to be passed
> into the
>     > > > >>> condition,
>     > > > >>> > > > > which I think is not currently included in the KIP.
> I am
>     > not
>     > > > >>> saying add
>     > > > >>> > > > > this now, but might it make sense to discuss a
> concept of
>     > how
>     > > > >>> that
>     > > > >>> > > might
>     > > > >>> > > > > look now, to avoid potential changes later on.
>     > > > >>> > > > >
>     > > > >>> > > > > Just of the top of my head it might look like:
>     > > > >>> > > > >
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> type:has-header
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > > header-name:my-header
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > field-value:my-value
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > > > Also, while writing that, I noticed that you have a
> version
>     > > > with
>     > > > >>> and
>     > > > >>> > > > > without "name" for your transformation in the KIP:
>     > > > >>> > > > >
>     > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>     > > > >>> > > has-header:my-header
>     > > > >>> > > > > and
>     > > > >>> > > > > transforms.conditionalExtract.condition:
>     > has-header:my-header
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > > > Is this intentional and the name is optional?
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > > > Best regards,
>     > > > >>> > > > >
>     > > > >>> > > > > Sönke
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
>     > tbentley@redhat.com>
>     > > > >>> wrote:
>     > > > >>> > > > > >
>     > > > >>> > > > > > Does anyone have any comments, feedback or
> thoughts about
>     > > > this?
>     > > > >>> > > > > >
>     > > > >>> > > > > > Thanks,
>     > > > >>> > > > > >
>     > > > >>> > > > > > Tom
>     > > > >>> > > > > >
>     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
>     > > > >>> tbentley@redhat.com>
>     > > > >>> > > > > wrote:
>     > > > >>> > > > > >
>     > > > >>> > > > > > > Hi,
>     > > > >>> > > > > > >
>     > > > >>> > > > > > > I've opened KIP-585 which is intended to provide
> a
>     > > > mechanism
>     > > > >>> to
>     > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be
>     > grateful
>     > > > >>> for any
>     > > > >>> > > > > > > feedback:
>     > > > >>> > > > > > >
>     > > > >>> > > > >
>     > > > >>> > > > >
>     > > > >>> > > >
>     > > > >>> > >
>     > > > >>>
>     > > >
>     >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>     > > > >>> > > > > > >
>     > > > >>> > > > > > > Many thanks,
>     > > > >>> > > > > > >
>     > > > >>> > > > > > > Tom
>     > > > >>> > > > > > >
>     > > > >>> > > > >
>     > > > >>> > > >
>     > > > >>> > >
>     > > > >>>
>     > > > >>>
>     > > >
>     > >
>     >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <an...@live.com>.
Hi,
I'm suggesting that RecordPredicate be another pluggable interface that users
could conceivably implement on their own, and that the KIP introduces the
most likely ones as supplied implementations, much as there are SMT
implementations such as HoistField. I don't really see users implementing
their own RecordPredicate, but the supplied ones would probably have a common
interface and it seems a pity not to expose that as an external API.

SMT is essentially a toolkit for Kafka Connect and I think RecordPredicate
would be a nice minor enhancement and I'd be happy to contribute code.

In order to make it simpler to configure, I suggest that the class name configured
using "?type" would be assumed to be in package org.apache.kafka.connect.predicates
in the absence of a package name so that shorter names such as "TopicMatches"
could be used.

Cheers,
Andrew Schofield
IBM Event Streams

On 25/04/2020, 21:22, "Christopher Egerton" <ch...@confluent.io> wrote:

    Hi Andrew,

    I know a DSL seems like overkill, and I'm not attached to it by any means,
    but I do think it serves a vital purpose in that it allows people who don't
    know how or have the time to write Java code to act on data being processed
    by connectors.

    Are you proposing that the "RecordPredicate" be another pluggable interface
    that users could then implement on their own? Or, would this be purely a
    syntactic expansion to SMT configuration with some subset of hard-coded
    predicates provided by the framework? I think there's value in the latter,
    but the former doesn't seem like it'd bring much to the table as far as
    users are concerned, and for developers, it is an improvement, but not a
    major one.

    Cheers,

    Chris

    On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
    schofieldandrewj@gmail.com> wrote:

    > I wonder whether we're getting a bit overcomplicated with this. I think all
    > that's required here is to add an optional guard predicate for a
    > Transformation.
    > The predicate cannot end the Transformation chain, but it can allow or
    > prevent a particular
    > Transformation from running.
    >
    > How about this as syntax?
    > transforms: extractInt
    > transforms.extractInt.?type:
    > org.apache.kafka.connect.predicates.TopicMatches
    > transforms.extractInt.?regex: my-prefix-.*
    > transforms.extractInt.type:
    > org.apache.kafka.connect.transforms.ExtractField$Key
    > transforms.extractInt.field: c1
    >
    > The idea is that a Transformation can have an optional RecordPredicate.
    > The RecordPredicate
    > can have configuration items, in a similar way to a Transformation. The
    > syntax of
    > using a '?' prefix to separate configuration for the RecordPredicate from
    > the configuration
    > for the Transformation could conceivably clash with an existing
    > Transformation
    > but the chances are minimal.
    >
    > This syntax doesn't allow for an 'else', but if the KIP offers say a
    > TopicMatches predicate
    > then that can be configured to return FALSE if the predicate matches.
    >
    > I feel that a DSL for SMTs is overkill. If you need something that
    > complex, it's
    > perhaps too complex for a transformation chain and it's really a streaming
    > application.
    >
    > Andrew Schofield
    > IBM Event Streams
    >
    > On 2020/04/08 21:39:31, Christopher Egerton <ch...@confluent.io> wrote:
    > > Hi Tom,
    > >
    > > With regards to the potential Transformation::validate method, I don't
    > > quite follow your objections. The AbstractHerder class, ConnectorConfig
    > > class, and embedding of ConfigDefs that happens with the two is all
    > > internal logic and we're allowed to modify it however we want, as long as
    > > it doesn't alter any public-facing APIs (and if it does, we just have to
    > > document those changes in a KIP). We don't have to embed the ConfigDef
    > for
    > > a transformation chain inside another ConfigDef if the API we want to
    > > present to our users doesn't play well with that part of the code base.
    > > Additionally, beyond aligning with the existing API provided for
    > > connectors, another advantage is that it becomes possible to validate
    > > properties for a configuration in the context of all other properties, so
    > > to be clear, it's not just about preserving what may be perceived as a
    > > superficial similarity and comes with real functional benefits that can't
    > > be provided (at least, not as easily) by dynamic construction of a
    > > ConfigDef object.
    > >
    > > As far as the new proposal goes, I hate to say it, but I think we're
    > stuck
    > > with the worst of both worlds here. Adding the new RecordPredicate
    > > interface seems like it defeats the whole purpose of SMTs, which is to
    > > allow manipulation of a connector's event stream by users who don't
    > > necessarily know how or have the time to write Java code of their own.
    > This
    > > is also why I'm in favor of adding a lightweight DSL for the condition;
    > > emphasizing readability for people who aren't very familiar with Connect
    > > and just want to get something going quickly should be a priority for
    > SMTs.
    > > But if that's not going to happen with this KIP, I'd still choose the
    > > simpler, less-flexible approach initially outlined, in order to keep
    > things
    > > simple for people creating connectors and try to let them accomplish what
    > > they want via configuration, not code.
    > >
    > > With regards to the question about where the line should be drawn and how
    > > much is too much and comparisons to other stream processing frameworks, I
    > > think the nature of SMTs draws the line quite nicely: you can only
    > process
    > > one message at a time. There's plenty of use cases out there for
    > > heavier-duty processing frameworks like Kafka Streams, with aggregate
    > > operations, joining of streams, expanding a single message into multiple
    > > messages, etc. With SMTs, none of this is possible; the general use case
    > is
    > > to filter and clean a stream of data. If any of the heavier-weight logic
    > > provided by, e.g., Streams, isn't required for a project, it should be
    > > possible to get along with just a collection of sink connectors, source
    > > connectors, a converter or two, and SMTs that smooth over any differences
    > > in data format between what source connectors produce and what sink
    > > connectors expect. This is why I'm comfortable suggesting heavy expansion
    > > of the out-of-the-box SMTs that we provide with Connect; as long as
    > they're
    > > reasonable to configure, they can greatly reduce the operational burden
    > on
    > > anyone running and/or using a Connect cluster since they might entirely
    > > replace additional services that would otherwise be required.
    > >
    > > All that said--this is a huge ask of someone who just wants to support
    > the
    > > SMT equivalent of an "if" statement, and it's totally understandable if
    > > that's too much to ask. The one concern I have left is that if we expand
    > > the SMT in the future, there become compatibility concerns since SMTs are
    > > pretty tightly-coupled with the worker on which they run (although
    > > technically, with some classpath/plugin path shuffling, you can run
    > > different versions of the out-of-the-box SMTs from the Connect worker on
    > > which they're run). Someone might write a connector config with the
    > > If/Conditional/Whatever SMT with a condition type that works on one
    > worker,
    > > but that doesn't work on another worker that's running an earlier version
    > > of Connect. This is why I'm in favor of adding extra predicates now
    > instead
    > > of later; if we're going to implement what has the potential to be a bit
    > of
    > > a Swiss army knife SMT with room for future expansion of configuration,
    > and
    > > we can think of a reasonable way to add that functionality now, it seems
    > > better for users and administrators of Connect to try to do that now
    > > instead of later.
    > >
    > > Cheers,
    > >
    > > Chris
    > >
    > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com> wrote:
    > >
    > > > Since no one objected I've updated the KIP with the revised way to
    > > > configure this transformation.
    > > >
    > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com>
    > wrote:
    > > >
    > > > > To come back about a point Chris made:
    > > > >
    > > > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
    > > > method,
    > > > >> what would you think about adopting a similar approach as the
    > framework
    > > > >> uses with connectors, by adding a "Config validate(Map<String,
    > String>
    > > > >> props)" method that can perform custom validation outside of what
    > can be
    > > > >> performed by the ConfigDef's single-property-at-a-time validation?
    > It
    > > > may
    > > > >> be a little heavyweight for use with this particular SMT, but it'd
    > > > provide
    > > > >> more flexibility for other SMT implementations and would mirror an
    > API
    > > > >> that
    > > > >> developers targeting the framework are likely already familiar with.
    > > > >>
    > > > >
    > > > > The validate() + config() approach taken for Connectors doesn't quite
    > > > work
    > > > > for Transformations.
    > > > >
    > > > > The default Connector.validate() basically just calls
    > > > > `config().validate(connectorConfigs)` and returns a Config.
    > Separately
    > > > the
    > > > > AbstractHeader also calls `config()` and uses the Config and
    > ConfigDef to
    > > > > build the ConfigInfos. The contract is not really defined well
    > enough in
    > > > > the javadoc, but this means a connector can use validate() to build
    > the
    > > > > ConfigDef which it will return from config().
    > > > >
    > > > > For Transformations we don't really want validate() to perform
    > validation
    > > > > at all since ultimately the transformations config will be embedded
    > in
    > > > the
    > > > > connector's and will be validated by the connector itself. It
    > wouldn't be
    > > > > harmful if it _did_ perform validation, just unnecessary. But having
    > a
    > > > > validate() method that ought not to validate() seems like a recipe
    > for
    > > > > confusion.
    > > > >
    > > > > Also problematic is that there's no use for the Config returned from
    > > > > Transformations.validate().
    > > > >
    > > > > So I'm not convinced that the superficial similarity really gains
    > > > anything.
    > > > >
    > > > > Kind regards,
    > > > >
    > > > > Tom
    > > > >
    > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com>
    > wrote:
    > > > >
    > > > >> Hi,
    > > > >>
    > > > >> Hi all,
    > > > >>
    > > > >> Thanks for the discussion so far.
    > > > >>
    > > > >> It seems a bit weird to me that when configuring the Conditional SMT
    > > > with
    > > > >> a DSL you would use a concise, intuitive DSL for expressing the
    > > > condition,
    > > > >> but not for the transforms that it's guarding. It also seems
    > natural, if
    > > > >> you support this for conditionally applying SMTs, that you'd soon
    > want
    > > > to
    > > > >> support the same thing for a generic filter transformer. Then, if
    > you
    > > > can
    > > > >> configure a filter transformation using this DSL, it becomes odd
    > that
    > > > you
    > > > >> can't do this for mapping transformations. I think it would be a
    > > > mistake to
    > > > >> go specifying an expression language for conditions when really that
    > > > might
    > > > >> just be part of a language for transformations.
    > > > >>
    > > > >> I think it would be possible today to write an SMT which allowed
    > you to
    > > > >> express the transformation in a DSL. Concretely, it's possible to
    > > > imagine a
    > > > >> single transformation taking a DSL something like this:
    > > > >>
    > > > >>   compose(
    > > > >>     Flatten.Key(delimiter: ':'),
    > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
    > > > >>        then: Flatten.Value(delimiter: '/')))
    > > > >>
    > > > >> All I've really done here, beyond what was already proposed, is
    > rename
    > > > >> Conditional to If, imagine a couple more bits of syntax
    > > > >> (SMTs being constructed by class name and named argument invocation
    > > > >> syntax to support SMT constructors) and add a higher level compose()
    > > > >> function for chaining SMTs (which could easily be replaced with
    > brace
    > > > >> delimited blocks).
    > > > >>
    > > > >> That may be a discussion we should have, but I think in _this_ KIP
    > we
    > > > >> should focus on the conditional part, since the above example
    > hopefully
    > > > >> shows that it would be possible to reuse it in a DSL if there was
    > > > appetite
    > > > >> for that.
    > > > >>
    > > > >> With that in mind, and since my original suggestion for an
    > abbreviated
    > > > >> config syntax didn't appeal to people, I propose that we stick with
    > the
    > > > >> existing norms for configuring this.
    > > > >> The previous example would look like this:
    > > > >>
    > > > >> transformation: flattenKey,if
    > > > >> transformation.flattenKey.type: Flatten.Key
    > > > >> transformation.flattenKey.delimiter: :
    > > > >> transformation.if.type: If
    > > > >> transformation.if.condition.type: TopicMatches
    > > > >> transformation.if.condition.pattern: blah-.*
    > > > >> transformation.if.then: flattenValue
    > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    > > > >> transformation.if.then.flattenValue.delimiter: /
    > > > >>
    > > > >> Also, I'm inclined to think we should stick to just supporting
    > > > >> TopicMatches and Not, since we've not identified an actual need for
    > > > >> 'has-header', 'or' and 'and'.
    > > > >>
    > > > >> An example of the usage of Not would look like this:
    > > > >>
    > > > >> transformation: flattenKey,if
    > > > >> transformation.flattenKey.type: Flatten.Key
    > > > >> transformation.flattenKey.delimiter: :
    > > > >> transformation.if.type: If
    > > > >> transformation.if.condition.type: Not
    > > > >> transformation.if.condition.operand: startsWithBlah
    > > > >> transformation.if.condition.operand.startsWithBlah.type:
    > TopicMatches
    > > > >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
    > > > >> transformation.if.then: flattenValue
    > > > >> transformation.if.then.flattenValue.type: Flatten.Value
    > > > >> transformation.if.then.flattenValue.delimiter: /
    > > > >>
    > > > >> Hopefully we can agree that this allows the conditional SMT part to
    > make
    > > > >> progress without getting bogged down.
    > > > >>
    > > > >> Thoughts? If this seems broadly reasonable, I'll update the KIP.
    > > > >>
    > > > >> Kind regards,
    > > > >>
    > > > >> Tom
    > > > >>
    > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gunnar@hibernate.org
    > >
    > > > >> wrote:
    > > > >>
    > > > >>> Hi all,
    > > > >>>
    > > > >>> Thanks a lot for this initiative, Tom!
    > > > >>>
    > > > >>> To shed some light, the use case where this first came up, were
    > issues
    > > > >>> we saw with SMTs being applied to the different topics produced by
    > the
    > > > >>> Debezium change data capture connectors. There are different kinds
    > of
    > > > >>> topics (for change data, schema history, heartbeats etc.) and the
    > > > >>> record structure to expect may vary between those. Hence we saw
    > issues
    > > > >>> with SMTs like ExtractField, which for instance only should be
    > applied
    > > > >>> to all change data topics but not the other ones.
    > > > >>>
    > > > >>> I like the overall approach; for Debezium's purposes, the simple
    > topic
    > > > >>> matching and negation operators would be sufficient already. I
    > agree
    > > > >>> with Chris and would prefer one single condition attribute, which
    > > > >>> contains a single condition or potentially a logical expression
    > with
    > > > >>> not, and, etc. I think it's less ambiguous, in particular when it
    > > > >>> comes to ordering of the different conditions and determining their
    > > > >>> precedence.
    > > > >>>
    > > > >>> Would love to see this feature in one or another way in Connect.
    > > > >>>
    > > > >>> Best,
    > > > >>>
    > > > >>> --Gunnar
    > > > >>>
    > > > >>>
    > > > >>>
    > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
    > > > >>> tbentley@redhat.com>:
    > > > >>> >
    > > > >>> > Hi Chris and Sönke,
    > > > >>> >
    > > > >>> > Using the numbering from Chris's email...
    > > > >>> >
    > > > >>> > 1. That's a good point, I'll see what is needed to make that
    > work.
    > > > >>> >
    > > > >>> > 2. I'm happy enough to add support for "and" and "or" as part of
    > this
    > > > >>> KIP
    > > > >>> > if people can see a need for it.
    > > > >>> >
    > > > >>> > In a similar vein, I was wondering about whether it would be
    > > > worthwhile
    > > > >>> > having the equivalent of an "else" clause (what's in the KIP is
    > > > >>> basically
    > > > >>> > an "if" statement). Without support for "else" I think people
    > would
    > > > >>> often
    > > > >>> > need two conditionals, with the condition of one being the
    > negation
    > > > of
    > > > >>> the
    > > > >>> > condition of another.
    > > > >>> >
    > > > >>> > 3. I can see the attraction of an expression language. The pros
    > > > include
    > > > >>> > being terse and familiar to programmers and potentially very
    > flexible
    > > > >>> if
    > > > >>> > that's needed in the future. I had a play and implemented it
    > using
    > > > >>> ANTLR
    > > > >>> > and it's not difficult to write a grammar and implement the
    > functions
    > > > >>> we've
    > > > >>> > already discussed and get decent error messages when the
    > expression
    > > > is
    > > > >>> > malformed. So on the one hand I quite like the idea. On the other
    > > > hand
    > > > >>> it
    > > > >>> > feels like overkill for the use cases that have actually been
    > > > >>> identified so
    > > > >>> > far.
    > > > >>> >
    > > > >>> > @Sönke what do you make of the expression language idea?
    > > > >>> >
    > > > >>> > Kind regards,
    > > > >>> >
    > > > >>> > Tom
    > > > >>> >
    > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
    > > > >>> chrise@confluent.io>
    > > > >>> > wrote:
    > > > >>> >
    > > > >>> > > Hi Tom,
    > > > >>> > >
    > > > >>> > > This looks great and I'd love to see the out-of-the-box SMTs
    > become
    > > > >>> even
    > > > >>> > > more powerful with the improvements you've proposed! Got a few
    > > > >>> remarks and
    > > > >>> > > would be interested in your thoughts:
    > > > >>> > >
    > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String, String>
    > props)"
    > > > >>> method,
    > > > >>> > > what would you think about adopting a similar approach as the
    > > > >>> framework
    > > > >>> > > uses with connectors, by adding a "Config validate(Map<String,
    > > > >>> String>
    > > > >>> > > props)" method that can perform custom validation outside of
    > what
    > > > >>> can be
    > > > >>> > > performed by the ConfigDef's single-property-at-a-time
    > validation?
    > > > >>> It may
    > > > >>> > > be a little heavyweight for use with this particular SMT, but
    > it'd
    > > > >>> provide
    > > > >>> > > more flexibility for other SMT implementations and would
    > mirror an
    > > > >>> API that
    > > > >>> > > developers targeting the framework are likely already familiar
    > > > with.
    > > > >>> > > 2. The possibility for adding the logical operators "and" and
    > "or"
    > > > is
    > > > >>> > > mentioned, but only as a potential future change and not one
    > > > >>> proposed by
    > > > >>> > > this KIP. Why not include those operators sooner rather than
    > later?
    > > > >>> > > 3. The syntax for named conditions that are then referenced in
    > > > >>> logical
    > > > >>> > > operators is... tricky. It took me a few attempts to grok the
    > > > example
    > > > >>> > > provided in the KIP after reading Sönke's question about the
    > > > example
    > > > >>> for
    > > > >>> > > negation. What about a more sophisticated but less verbose
    > syntax
    > > > >>> that
    > > > >>> > > supports a single configuration for the condition, even with
    > > > logical
    > > > >>> > > operators? I'm thinking something like
    > > > >>> > > "transforms.conditionalExtract.condition:
    > > > not(has-header:my-header)"
    > > > >>> > > instead of the "transforms.conditionalExtract.condition:
    > > > >>> not:hasMyHeader"
    > > > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header" properties. If support for a logical
    > "and" is
    > > > >>> added,
    > > > >>> > > this could then be expanded to something like
    > > > >>> > > "transforms.conditionalExtract.condition:
    > > > and(has-header(my-header),
    > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be additional
    > > > >>> complexity
    > > > >>> > > here with the need to escape parentheses and commas that are
    > > > >>> intended to be
    > > > >>> > > treated literally (as part of a header name, for example)
    > instead
    > > > of
    > > > >>> as
    > > > >>> > > part of the syntax for the condition itself, but a little
    > > > additional
    > > > >>> > > complexity for edge cases like that may be warranted if it
    > heavily
    > > > >>> reduces
    > > > >>> > > complexity for the common cases. The rationale for the proposed
    > > > >>> > > parentheses-based syntax here instead of what's mentioned in
    > the
    > > > KIP
    > > > >>> > > (something like "and: <condition1>, <condition2>") is to help
    > with
    > > > >>> > > readability; we probably wouldn't need that with the approach
    > of
    > > > >>> naming
    > > > >>> > > conditions via separate properties, but things may get a little
    > > > >>> nasty with
    > > > >>> > > literal conditions included there, especially with the
    > possibility
    > > > of
    > > > >>> > > nesting. Finally, the shift in syntax here should make it
    > easier to
    > > > >>> handle
    > > > >>> > > cases like the header value matching brought up by Sönke; it
    > might
    > > > >>> look
    > > > >>> > > something like "header-matches(header-name,
    > header-value-pattern)".
    > > > >>> > >
    > > > >>> > > Cheers,
    > > > >>> > >
    > > > >>> > > Chris
    > > > >>> > >
    > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
    > tbentley@redhat.com>
    > > > >>> wrote:
    > > > >>> > >
    > > > >>> > > > Hi Sönke,
    > > > >>> > > >
    > > > >>> > > > Thanks for taking a look.
    > > > >>> > > >
    > > > >>> > > > Let me answer in reverse order, since I think it might make
    > more
    > > > >>> sense
    > > > >>> > > that
    > > > >>> > > > way...
    > > > >>> > > >
    > > > >>> > > > Also, while writing that, I noticed that you have a version
    > with
    > > > >>> and
    > > > >>> > > > > without "name" for your transformation in the KIP:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header
    > > > >>> > > > > and
    > > > >>> > > > > transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > > > The example
    > > > >>> > > >     transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > is a "has-header" condition (the prefix of the config value),
    > > > >>> which will
    > > > >>> > > > match records with a "my-header" header (given in the
    > suffix).
    > > > >>> > > >
    > > > >>> > > > The other example given is:
    > > > >>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
    > > > >>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > has-header:my-header
    > > > >>> > > > The root of the condition is a "not" condition (the prefix
    > of the
    > > > >>> value
    > > > >>> > > for
    > > > >>> > > > the transforms.conditionalExtract.condition key) of another
    > named
    > > > >>> > > condition
    > > > >>> > > > called "hasMyHeader" (the suffix). Any name could be used
    > for the
    > > > >>> other
    > > > >>> > > > condition. That other condition is configured at
    > > > >>> > > > "transforms.conditionalExtract.condition.<conditionName>".
    > That
    > > > >>> condition
    > > > >>> > > > is a "has-header" condition (the prefix), which will match
    > > > records
    > > > >>> with a
    > > > >>> > > > "my-header" header (given in the suffix). So the
    > "has-header:"
    > > > >>> condition
    > > > >>> > > > type would always require a suffix, as would the "not:"
    > condition
    > > > >>> type.
    > > > >>> > > > Hypothetically you could have a "true" condition type (which
    > > > would
    > > > >>> not
    > > > >>> > > > require a suffix), and the hypothetical binary conditions
    > "and:"
    > > > >>> and
    > > > >>> > > "or:"
    > > > >>> > > > would require a pair of other condition names.
    > > > >>> > > >
    > > > >>> > > > So what's proposed is a scheme for encoding conditions where
    > the
    > > > >>> > > condition
    > > > >>> > > > type is the prefix of the value of some "....condition"
    > config
    > > > >>> key, and
    > > > >>> > > the
    > > > >>> > > > optional suffix provides parameters for the condition. This
    > makes
    > > > >>> those
    > > > >>> > > > parameters a bit inflexible, but is relatively succinct.
    > > > >>> > > >
    > > > >>> > > > This leads on to your first point. You're right that use
    > cases
    > > > >>> might
    > > > >>> > > appear
    > > > >>> > > > which need other conditions, and we should make it flexible
    > > > enough
    > > > >>> to be
    > > > >>> > > > able to cope with future use cases. On the other hand, I was
    > > > >>> concerned
    > > > >>> > > that
    > > > >>> > > > we end up with something which is quite complicated to
    > configure.
    > > > >>> (There
    > > > >>> > > > comes a point where it might makes more sense for the user to
    > > > >>> write their
    > > > >>> > > > own SMT).
    > > > >>> > > >
    > > > >>> > > > Just of the top of my head it might look like:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> type:has-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > header-name:my-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > field-value:my-value
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > > > That won't work because the format is basically a
    > > > >>> Properties/Map<String,
    > > > >>> > > > String> and what you've suggested has duplicate keys.
    > > > >>> > > >
    > > > >>> > > > One thing I did briefly consider what the ability to treat
    > > > >>> conditions as
    > > > >>> > > > Configurable objects in their own right (opening up the
    > > > >>> possibility of
    > > > >>> > > > people supplying their own Conditions, just like they can
    > supply
    > > > >>> their
    > > > >>> > > own
    > > > >>> > > > SMTs). That might be configured something like this:
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: not
    > > > >>> > > >     transforms.conditionalExtract.condition.not.type: Not
    > > > >>> > > >     transforms.conditionalExtract.condition.not.negated: foo
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.type:
    > > > >>> HasHeaderWithValue
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.header:
    > my-header
    > > > >>> > > >     transforms.conditionalExtract.condition.foo.value:
    > my-value
    > > > >>> > > >
    > > > >>> > > > I didn't propose that I couldn't see the use cases to justify
    > > > this
    > > > >>> kind
    > > > >>> > > of
    > > > >>> > > > complexity, especially as the common case would surely be
    > > > matching
    > > > >>> > > against
    > > > >>> > > > topic name (to be honest I wasn't completely convinced by the
    > > > need
    > > > >>> for
    > > > >>> > > > "has-header"). In the current version of the KIP that's just
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: topic-matches:
    > > > >>> my-prefix-.*
    > > > >>> > > >
    > > > >>> > > > but using the more flexible scheme that would require
    > something
    > > > >>> more like
    > > > >>> > > > this:
    > > > >>> > > >
    > > > >>> > > >     transforms.conditionalExtract.condition: bar
    > > > >>> > > >     transforms.conditionalExtract.condition.bar.type:
    > TopicMatch
    > > > >>> > > >     transforms.conditionalExtract.condition.bar.pattern:
    > > > >>> my-prefix-.*
    > > > >>> > > >
    > > > >>> > > > If people know of use cases which would justify more
    > > > >>> sophistication, I'm
    > > > >>> > > > happy to reconsider.
    > > > >>> > > >
    > > > >>> > > > Thanks again for taking a look!
    > > > >>> > > >
    > > > >>> > > > Tom
    > > > >>> > > >
    > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
    > > > >>> > > > <so...@opencore.com.invalid> wrote:
    > > > >>> > > >
    > > > >>> > > > > Hi Tom,
    > > > >>> > > > >
    > > > >>> > > > > sounds useful to me, thanks for the KIP!
    > > > >>> > > > > The only thought that I had while reading was that this
    > will
    > > > >>> probably
    > > > >>> > > > raise
    > > > >>> > > > > questions about more involved conditions fairly quickly.
    > For
    > > > >>> example
    > > > >>> > > the
    > > > >>> > > > > "has-header" will cause an appetite for conditions like
    > > > >>> > > > > "this-header-has-that-value".
    > > > >>> > > > > This would necessitate two parameters to be passed into the
    > > > >>> condition,
    > > > >>> > > > > which I think is not currently included in the KIP. I am
    > not
    > > > >>> saying add
    > > > >>> > > > > this now, but might it make sense to discuss a concept of
    > how
    > > > >>> that
    > > > >>> > > might
    > > > >>> > > > > look now, to avoid potential changes later on.
    > > > >>> > > > >
    > > > >>> > > > > Just of the top of my head it might look like:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> type:has-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > > header-name:my-header
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > field-value:my-value
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Also, while writing that, I noticed that you have a version
    > > > with
    > > > >>> and
    > > > >>> > > > > without "name" for your transformation in the KIP:
    > > > >>> > > > >
    > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
    > > > >>> > > has-header:my-header
    > > > >>> > > > > and
    > > > >>> > > > > transforms.conditionalExtract.condition:
    > has-header:my-header
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Is this intentional and the name is optional?
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > Best regards,
    > > > >>> > > > >
    > > > >>> > > > > Sönke
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
    > tbentley@redhat.com>
    > > > >>> wrote:
    > > > >>> > > > > >
    > > > >>> > > > > > Does anyone have any comments, feedback or thoughts about
    > > > this?
    > > > >>> > > > > >
    > > > >>> > > > > > Thanks,
    > > > >>> > > > > >
    > > > >>> > > > > > Tom
    > > > >>> > > > > >
    > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
    > > > >>> tbentley@redhat.com>
    > > > >>> > > > > wrote:
    > > > >>> > > > > >
    > > > >>> > > > > > > Hi,
    > > > >>> > > > > > >
    > > > >>> > > > > > > I've opened KIP-585 which is intended to provide a
    > > > mechanism
    > > > >>> to
    > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be
    > grateful
    > > > >>> for any
    > > > >>> > > > > > > feedback:
    > > > >>> > > > > > >
    > > > >>> > > > >
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > >
    > > > >>>
    > > >
    > https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
    > > > >>> > > > > > >
    > > > >>> > > > > > > Many thanks,
    > > > >>> > > > > > >
    > > > >>> > > > > > > Tom
    > > > >>> > > > > > >
    > > > >>> > > > >
    > > > >>> > > >
    > > > >>> > >
    > > > >>>
    > > > >>>
    > > >
    > >
    >


Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Christopher Egerton <ch...@confluent.io>.
Hi Andrew,

I know a DSL seems like overkill, and I'm not attached to it by any means,
but I do think it serves a vital purpose in that it allows people who don't
know how or have the time to write Java code to act on data being processed
by connectors.

Are you proposing that the "RecordPredicate" be another pluggable interface
that users could then implement on their own? Or, would this be purely a
syntactic expansion to SMT configuration with some subset of hard-coded
predicates provided by the framework? I think there's value in the latter,
but the former doesn't seem like it'd bring much to the table as far as
users are concerned, and for developers, it is an improvement, but not a
major one.

Cheers,

Chris

On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
schofieldandrewj@gmail.com> wrote:

> I wonder whether we're getting a bit overcomplicated with this. I think all
> that's required here is to add an optional guard predicate for a
> Transformation.
> The predicate cannot end the Transformation chain, but it can allow or
> prevent a particular
> Transformation from running.
>
> How about this as syntax?
> transforms: extractInt
> transforms.extractInt.?type:
> org.apache.kafka.connect.predicates.TopicMatches
> transforms.extractInt.?regex: my-prefix-.*
> transforms.extractInt.type:
> org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.extractInt.field: c1
>
> The idea is that a Transformation can have an optional RecordPredicate.
> The RecordPredicate
> can have configuration items, in a similar way to a Transformation. The
> syntax of
> using a '?' prefix to separate configuration for the RecordPredicate from
> the configuration
> for the Transformation could conceivably clash with an existing
> Transformation
> but the chances are minimal.
>
> This syntax doesn't allow for an 'else', but if the KIP offers say a
> TopicMatches predicate
> then that can be configured to return FALSE if the predicate matches.
>
> I feel that a DSL for SMTs is overkill. If you need something that
> complex, it's
> perhaps too complex for a transformation chain and it's really a streaming
> application.
>
> Andrew Schofield
> IBM Event Streams
>
> On 2020/04/08 21:39:31, Christopher Egerton <ch...@confluent.io> wrote:
> > Hi Tom,
> >
> > With regards to the potential Transformation::validate method, I don't
> > quite follow your objections. The AbstractHerder class, ConnectorConfig
> > class, and embedding of ConfigDefs that happens with the two is all
> > internal logic and we're allowed to modify it however we want, as long as
> > it doesn't alter any public-facing APIs (and if it does, we just have to
> > document those changes in a KIP). We don't have to embed the ConfigDef
> for
> > a transformation chain inside another ConfigDef if the API we want to
> > present to our users doesn't play well with that part of the code base.
> > Additionally, beyond aligning with the existing API provided for
> > connectors, another advantage is that it becomes possible to validate
> > properties for a configuration in the context of all other properties, so
> > to be clear, it's not just about preserving what may be perceived as a
> > superficial similarity and comes with real functional benefits that can't
> > be provided (at least, not as easily) by dynamic construction of a
> > ConfigDef object.
> >
> > As far as the new proposal goes, I hate to say it, but I think we're
> stuck
> > with the worst of both worlds here. Adding the new RecordPredicate
> > interface seems like it defeats the whole purpose of SMTs, which is to
> > allow manipulation of a connector's event stream by users who don't
> > necessarily know how or have the time to write Java code of their own.
> This
> > is also why I'm in favor of adding a lightweight DSL for the condition;
> > emphasizing readability for people who aren't very familiar with Connect
> > and just want to get something going quickly should be a priority for
> SMTs.
> > But if that's not going to happen with this KIP, I'd still choose the
> > simpler, less-flexible approach initially outlined, in order to keep
> things
> > simple for people creating connectors and try to let them accomplish what
> > they want via configuration, not code.
> >
> > With regards to the question about where the line should be drawn and how
> > much is too much and comparisons to other stream processing frameworks, I
> > think the nature of SMTs draws the line quite nicely: you can only
> process
> > one message at a time. There's plenty of use cases out there for
> > heavier-duty processing frameworks like Kafka Streams, with aggregate
> > operations, joining of streams, expanding a single message into multiple
> > messages, etc. With SMTs, none of this is possible; the general use case
> is
> > to filter and clean a stream of data. If any of the heavier-weight logic
> > provided by, e.g., Streams, isn't required for a project, it should be
> > possible to get along with just a collection of sink connectors, source
> > connectors, a converter or two, and SMTs that smooth over any differences
> > in data format between what source connectors produce and what sink
> > connectors expect. This is why I'm comfortable suggesting heavy expansion
> > of the out-of-the-box SMTs that we provide with Connect; as long as
> they're
> > reasonable to configure, they can greatly reduce the operational burden
> on
> > anyone running and/or using a Connect cluster since they might entirely
> > replace additional services that would otherwise be required.
> >
> > All that said--this is a huge ask of someone who just wants to support
> the
> > SMT equivalent of an "if" statement, and it's totally understandable if
> > that's too much to ask. The one concern I have left is that if we expand
> > the SMT in the future, there become compatibility concerns since SMTs are
> > pretty tightly-coupled with the worker on which they run (although
> > technically, with some classpath/plugin path shuffling, you can run
> > different versions of the out-of-the-box SMTs from the Connect worker on
> > which they're run). Someone might write a connector config with the
> > If/Conditional/Whatever SMT with a condition type that works on one
> worker,
> > but that doesn't work on another worker that's running an earlier version
> > of Connect. This is why I'm in favor of adding extra predicates now
> instead
> > of later; if we're going to implement what has the potential to be a bit
> of
> > a Swiss army knife SMT with room for future expansion of configuration,
> and
> > we can think of a reasonable way to add that functionality now, it seems
> > better for users and administrators of Connect to try to do that now
> > instead of later.
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Since no one objected I've updated the KIP with the revised way to
> > > configure this transformation.
> > >
> > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com>
> wrote:
> > >
> > > > To come back about a point Chris made:
> > > >
> > > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> > > method,
> > > >> what would you think about adopting a similar approach as the
> framework
> > > >> uses with connectors, by adding a "Config validate(Map<String,
> String>
> > > >> props)" method that can perform custom validation outside of what
> can be
> > > >> performed by the ConfigDef's single-property-at-a-time validation?
> It
> > > may
> > > >> be a little heavyweight for use with this particular SMT, but it'd
> > > provide
> > > >> more flexibility for other SMT implementations and would mirror an
> API
> > > >> that
> > > >> developers targeting the framework are likely already familiar with.
> > > >>
> > > >
> > > > The validate() + config() approach taken for Connectors doesn't quite
> > > work
> > > > for Transformations.
> > > >
> > > > The default Connector.validate() basically just calls
> > > > `config().validate(connectorConfigs)` and returns a Config.
> Separately
> > > the
> > > > AbstractHeader also calls `config()` and uses the Config and
> ConfigDef to
> > > > build the ConfigInfos. The contract is not really defined well
> enough in
> > > > the javadoc, but this means a connector can use validate() to build
> the
> > > > ConfigDef which it will return from config().
> > > >
> > > > For Transformations we don't really want validate() to perform
> validation
> > > > at all since ultimately the transformations config will be embedded
> in
> > > the
> > > > connector's and will be validated by the connector itself. It
> wouldn't be
> > > > harmful if it _did_ perform validation, just unnecessary. But having
> a
> > > > validate() method that ought not to validate() seems like a recipe
> for
> > > > confusion.
> > > >
> > > > Also problematic is that there's no use for the Config returned from
> > > > Transformations.validate().
> > > >
> > > > So I'm not convinced that the superficial similarity really gains
> > > anything.
> > > >
> > > > Kind regards,
> > > >
> > > > Tom
> > > >
> > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com>
> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Hi all,
> > > >>
> > > >> Thanks for the discussion so far.
> > > >>
> > > >> It seems a bit weird to me that when configuring the Conditional SMT
> > > with
> > > >> a DSL you would use a concise, intuitive DSL for expressing the
> > > condition,
> > > >> but not for the transforms that it's guarding. It also seems
> natural, if
> > > >> you support this for conditionally applying SMTs, that you'd soon
> want
> > > to
> > > >> support the same thing for a generic filter transformer. Then, if
> you
> > > can
> > > >> configure a filter transformation using this DSL, it becomes odd
> that
> > > you
> > > >> can't do this for mapping transformations. I think it would be a
> > > mistake to
> > > >> go specifying an expression language for conditions when really that
> > > might
> > > >> just be part of a language for transformations.
> > > >>
> > > >> I think it would be possible today to write an SMT which allowed
> you to
> > > >> express the transformation in a DSL. Concretely, it's possible to
> > > imagine a
> > > >> single transformation taking a DSL something like this:
> > > >>
> > > >>   compose(
> > > >>     Flatten.Key(delimiter: ':'),
> > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
> > > >>        then: Flatten.Value(delimiter: '/')))
> > > >>
> > > >> All I've really done here, beyond what was already proposed, is
> rename
> > > >> Conditional to If, imagine a couple more bits of syntax
> > > >> (SMTs being constructed by class name and named argument invocation
> > > >> syntax to support SMT constructors) and add a higher level compose()
> > > >> function for chaining SMTs (which could easily be replaced with
> brace
> > > >> delimited blocks).
> > > >>
> > > >> That may be a discussion we should have, but I think in _this_ KIP
> we
> > > >> should focus on the conditional part, since the above example
> hopefully
> > > >> shows that it would be possible to reuse it in a DSL if there was
> > > appetite
> > > >> for that.
> > > >>
> > > >> With that in mind, and since my original suggestion for an
> abbreviated
> > > >> config syntax didn't appeal to people, I propose that we stick with
> the
> > > >> existing norms for configuring this.
> > > >> The previous example would look like this:
> > > >>
> > > >> transformation: flattenKey,if
> > > >> transformation.flattenKey.type: Flatten.Key
> > > >> transformation.flattenKey.delimiter: :
> > > >> transformation.if.type: If
> > > >> transformation.if.condition.type: TopicMatches
> > > >> transformation.if.condition.pattern: blah-.*
> > > >> transformation.if.then: flattenValue
> > > >> transformation.if.then.flattenValue.type: Flatten.Value
> > > >> transformation.if.then.flattenValue.delimiter: /
> > > >>
> > > >> Also, I'm inclined to think we should stick to just supporting
> > > >> TopicMatches and Not, since we've not identified an actual need for
> > > >> 'has-header', 'or' and 'and'.
> > > >>
> > > >> An example of the usage of Not would look like this:
> > > >>
> > > >> transformation: flattenKey,if
> > > >> transformation.flattenKey.type: Flatten.Key
> > > >> transformation.flattenKey.delimiter: :
> > > >> transformation.if.type: If
> > > >> transformation.if.condition.type: Not
> > > >> transformation.if.condition.operand: startsWithBlah
> > > >> transformation.if.condition.operand.startsWithBlah.type:
> TopicMatches
> > > >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
> > > >> transformation.if.then: flattenValue
> > > >> transformation.if.then.flattenValue.type: Flatten.Value
> > > >> transformation.if.then.flattenValue.delimiter: /
> > > >>
> > > >> Hopefully we can agree that this allows the conditional SMT part to
> make
> > > >> progress without getting bogged down.
> > > >>
> > > >> Thoughts? If this seems broadly reasonable, I'll update the KIP.
> > > >>
> > > >> Kind regards,
> > > >>
> > > >> Tom
> > > >>
> > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gunnar@hibernate.org
> >
> > > >> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> Thanks a lot for this initiative, Tom!
> > > >>>
> > > >>> To shed some light, the use case where this first came up, were
> issues
> > > >>> we saw with SMTs being applied to the different topics produced by
> the
> > > >>> Debezium change data capture connectors. There are different kinds
> of
> > > >>> topics (for change data, schema history, heartbeats etc.) and the
> > > >>> record structure to expect may vary between those. Hence we saw
> issues
> > > >>> with SMTs like ExtractField, which for instance only should be
> applied
> > > >>> to all change data topics but not the other ones.
> > > >>>
> > > >>> I like the overall approach; for Debezium's purposes, the simple
> topic
> > > >>> matching and negation operators would be sufficient already. I
> agree
> > > >>> with Chris and would prefer one single condition attribute, which
> > > >>> contains a single condition or potentially a logical expression
> with
> > > >>> not, and, etc. I think it's less ambiguous, in particular when it
> > > >>> comes to ordering of the different conditions and determining their
> > > >>> precedence.
> > > >>>
> > > >>> Would love to see this feature in one or another way in Connect.
> > > >>>
> > > >>> Best,
> > > >>>
> > > >>> --Gunnar
> > > >>>
> > > >>>
> > > >>>
> > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
> > > >>> tbentley@redhat.com>:
> > > >>> >
> > > >>> > Hi Chris and Sönke,
> > > >>> >
> > > >>> > Using the numbering from Chris's email...
> > > >>> >
> > > >>> > 1. That's a good point, I'll see what is needed to make that
> work.
> > > >>> >
> > > >>> > 2. I'm happy enough to add support for "and" and "or" as part of
> this
> > > >>> KIP
> > > >>> > if people can see a need for it.
> > > >>> >
> > > >>> > In a similar vein, I was wondering about whether it would be
> > > worthwhile
> > > >>> > having the equivalent of an "else" clause (what's in the KIP is
> > > >>> basically
> > > >>> > an "if" statement). Without support for "else" I think people
> would
> > > >>> often
> > > >>> > need two conditionals, with the condition of one being the
> negation
> > > of
> > > >>> the
> > > >>> > condition of another.
> > > >>> >
> > > >>> > 3. I can see the attraction of an expression language. The pros
> > > include
> > > >>> > being terse and familiar to programmers and potentially very
> flexible
> > > >>> if
> > > >>> > that's needed in the future. I had a play and implemented it
> using
> > > >>> ANTLR
> > > >>> > and it's not difficult to write a grammar and implement the
> functions
> > > >>> we've
> > > >>> > already discussed and get decent error messages when the
> expression
> > > is
> > > >>> > malformed. So on the one hand I quite like the idea. On the other
> > > hand
> > > >>> it
> > > >>> > feels like overkill for the use cases that have actually been
> > > >>> identified so
> > > >>> > far.
> > > >>> >
> > > >>> > @Sönke what do you make of the expression language idea?
> > > >>> >
> > > >>> > Kind regards,
> > > >>> >
> > > >>> > Tom
> > > >>> >
> > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
> > > >>> chrise@confluent.io>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Hi Tom,
> > > >>> > >
> > > >>> > > This looks great and I'd love to see the out-of-the-box SMTs
> become
> > > >>> even
> > > >>> > > more powerful with the improvements you've proposed! Got a few
> > > >>> remarks and
> > > >>> > > would be interested in your thoughts:
> > > >>> > >
> > > >>> > > 1. Instead of the new "ConfigDef config(Map<String, String>
> props)"
> > > >>> method,
> > > >>> > > what would you think about adopting a similar approach as the
> > > >>> framework
> > > >>> > > uses with connectors, by adding a "Config validate(Map<String,
> > > >>> String>
> > > >>> > > props)" method that can perform custom validation outside of
> what
> > > >>> can be
> > > >>> > > performed by the ConfigDef's single-property-at-a-time
> validation?
> > > >>> It may
> > > >>> > > be a little heavyweight for use with this particular SMT, but
> it'd
> > > >>> provide
> > > >>> > > more flexibility for other SMT implementations and would
> mirror an
> > > >>> API that
> > > >>> > > developers targeting the framework are likely already familiar
> > > with.
> > > >>> > > 2. The possibility for adding the logical operators "and" and
> "or"
> > > is
> > > >>> > > mentioned, but only as a potential future change and not one
> > > >>> proposed by
> > > >>> > > this KIP. Why not include those operators sooner rather than
> later?
> > > >>> > > 3. The syntax for named conditions that are then referenced in
> > > >>> logical
> > > >>> > > operators is... tricky. It took me a few attempts to grok the
> > > example
> > > >>> > > provided in the KIP after reading Sönke's question about the
> > > example
> > > >>> for
> > > >>> > > negation. What about a more sophisticated but less verbose
> syntax
> > > >>> that
> > > >>> > > supports a single configuration for the condition, even with
> > > logical
> > > >>> > > operators? I'm thinking something like
> > > >>> > > "transforms.conditionalExtract.condition:
> > > not(has-header:my-header)"
> > > >>> > > instead of the "transforms.conditionalExtract.condition:
> > > >>> not:hasMyHeader"
> > > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > has-header:my-header" properties. If support for a logical
> "and" is
> > > >>> added,
> > > >>> > > this could then be expanded to something like
> > > >>> > > "transforms.conditionalExtract.condition:
> > > and(has-header(my-header),
> > > >>> > > not(topic-matches(my-prefix-.*)))". There would be additional
> > > >>> complexity
> > > >>> > > here with the need to escape parentheses and commas that are
> > > >>> intended to be
> > > >>> > > treated literally (as part of a header name, for example)
> instead
> > > of
> > > >>> as
> > > >>> > > part of the syntax for the condition itself, but a little
> > > additional
> > > >>> > > complexity for edge cases like that may be warranted if it
> heavily
> > > >>> reduces
> > > >>> > > complexity for the common cases. The rationale for the proposed
> > > >>> > > parentheses-based syntax here instead of what's mentioned in
> the
> > > KIP
> > > >>> > > (something like "and: <condition1>, <condition2>") is to help
> with
> > > >>> > > readability; we probably wouldn't need that with the approach
> of
> > > >>> naming
> > > >>> > > conditions via separate properties, but things may get a little
> > > >>> nasty with
> > > >>> > > literal conditions included there, especially with the
> possibility
> > > of
> > > >>> > > nesting. Finally, the shift in syntax here should make it
> easier to
> > > >>> handle
> > > >>> > > cases like the header value matching brought up by Sönke; it
> might
> > > >>> look
> > > >>> > > something like "header-matches(header-name,
> header-value-pattern)".
> > > >>> > >
> > > >>> > > Cheers,
> > > >>> > >
> > > >>> > > Chris
> > > >>> > >
> > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
> tbentley@redhat.com>
> > > >>> wrote:
> > > >>> > >
> > > >>> > > > Hi Sönke,
> > > >>> > > >
> > > >>> > > > Thanks for taking a look.
> > > >>> > > >
> > > >>> > > > Let me answer in reverse order, since I think it might make
> more
> > > >>> sense
> > > >>> > > that
> > > >>> > > > way...
> > > >>> > > >
> > > >>> > > > Also, while writing that, I noticed that you have a version
> with
> > > >>> and
> > > >>> > > > > without "name" for your transformation in the KIP:
> > > >>> > > > >
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > has-header:my-header
> > > >>> > > > > and
> > > >>> > > > > transforms.conditionalExtract.condition:
> has-header:my-header
> > > >>> > > > >
> > > >>> > > >
> > > >>> > > > The example
> > > >>> > > >     transforms.conditionalExtract.condition:
> has-header:my-header
> > > >>> > > > is a "has-header" condition (the prefix of the config value),
> > > >>> which will
> > > >>> > > > match records with a "my-header" header (given in the
> suffix).
> > > >>> > > >
> > > >>> > > > The other example given is:
> > > >>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
> > > >>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > > has-header:my-header
> > > >>> > > > The root of the condition is a "not" condition (the prefix
> of the
> > > >>> value
> > > >>> > > for
> > > >>> > > > the transforms.conditionalExtract.condition key) of another
> named
> > > >>> > > condition
> > > >>> > > > called "hasMyHeader" (the suffix). Any name could be used
> for the
> > > >>> other
> > > >>> > > > condition. That other condition is configured at
> > > >>> > > > "transforms.conditionalExtract.condition.<conditionName>".
> That
> > > >>> condition
> > > >>> > > > is a "has-header" condition (the prefix), which will match
> > > records
> > > >>> with a
> > > >>> > > > "my-header" header (given in the suffix). So the
> "has-header:"
> > > >>> condition
> > > >>> > > > type would always require a suffix, as would the "not:"
> condition
> > > >>> type.
> > > >>> > > > Hypothetically you could have a "true" condition type (which
> > > would
> > > >>> not
> > > >>> > > > require a suffix), and the hypothetical binary conditions
> "and:"
> > > >>> and
> > > >>> > > "or:"
> > > >>> > > > would require a pair of other condition names.
> > > >>> > > >
> > > >>> > > > So what's proposed is a scheme for encoding conditions where
> the
> > > >>> > > condition
> > > >>> > > > type is the prefix of the value of some "....condition"
> config
> > > >>> key, and
> > > >>> > > the
> > > >>> > > > optional suffix provides parameters for the condition. This
> makes
> > > >>> those
> > > >>> > > > parameters a bit inflexible, but is relatively succinct.
> > > >>> > > >
> > > >>> > > > This leads on to your first point. You're right that use
> cases
> > > >>> might
> > > >>> > > appear
> > > >>> > > > which need other conditions, and we should make it flexible
> > > enough
> > > >>> to be
> > > >>> > > > able to cope with future use cases. On the other hand, I was
> > > >>> concerned
> > > >>> > > that
> > > >>> > > > we end up with something which is quite complicated to
> configure.
> > > >>> (There
> > > >>> > > > comes a point where it might makes more sense for the user to
> > > >>> write their
> > > >>> > > > own SMT).
> > > >>> > > >
> > > >>> > > > Just of the top of my head it might look like:
> > > >>> > > > >
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> type:has-header
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > > header-name:my-header
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > field-value:my-value
> > > >>> > > > >
> > > >>> > > >
> > > >>> > > > That won't work because the format is basically a
> > > >>> Properties/Map<String,
> > > >>> > > > String> and what you've suggested has duplicate keys.
> > > >>> > > >
> > > >>> > > > One thing I did briefly consider what the ability to treat
> > > >>> conditions as
> > > >>> > > > Configurable objects in their own right (opening up the
> > > >>> possibility of
> > > >>> > > > people supplying their own Conditions, just like they can
> supply
> > > >>> their
> > > >>> > > own
> > > >>> > > > SMTs). That might be configured something like this:
> > > >>> > > >
> > > >>> > > >     transforms.conditionalExtract.condition: not
> > > >>> > > >     transforms.conditionalExtract.condition.not.type: Not
> > > >>> > > >     transforms.conditionalExtract.condition.not.negated: foo
> > > >>> > > >     transforms.conditionalExtract.condition.foo.type:
> > > >>> HasHeaderWithValue
> > > >>> > > >     transforms.conditionalExtract.condition.foo.header:
> my-header
> > > >>> > > >     transforms.conditionalExtract.condition.foo.value:
> my-value
> > > >>> > > >
> > > >>> > > > I didn't propose that I couldn't see the use cases to justify
> > > this
> > > >>> kind
> > > >>> > > of
> > > >>> > > > complexity, especially as the common case would surely be
> > > matching
> > > >>> > > against
> > > >>> > > > topic name (to be honest I wasn't completely convinced by the
> > > need
> > > >>> for
> > > >>> > > > "has-header"). In the current version of the KIP that's just
> > > >>> > > >
> > > >>> > > >     transforms.conditionalExtract.condition: topic-matches:
> > > >>> my-prefix-.*
> > > >>> > > >
> > > >>> > > > but using the more flexible scheme that would require
> something
> > > >>> more like
> > > >>> > > > this:
> > > >>> > > >
> > > >>> > > >     transforms.conditionalExtract.condition: bar
> > > >>> > > >     transforms.conditionalExtract.condition.bar.type:
> TopicMatch
> > > >>> > > >     transforms.conditionalExtract.condition.bar.pattern:
> > > >>> my-prefix-.*
> > > >>> > > >
> > > >>> > > > If people know of use cases which would justify more
> > > >>> sophistication, I'm
> > > >>> > > > happy to reconsider.
> > > >>> > > >
> > > >>> > > > Thanks again for taking a look!
> > > >>> > > >
> > > >>> > > > Tom
> > > >>> > > >
> > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > > >>> > > > <so...@opencore.com.invalid> wrote:
> > > >>> > > >
> > > >>> > > > > Hi Tom,
> > > >>> > > > >
> > > >>> > > > > sounds useful to me, thanks for the KIP!
> > > >>> > > > > The only thought that I had while reading was that this
> will
> > > >>> probably
> > > >>> > > > raise
> > > >>> > > > > questions about more involved conditions fairly quickly.
> For
> > > >>> example
> > > >>> > > the
> > > >>> > > > > "has-header" will cause an appetite for conditions like
> > > >>> > > > > "this-header-has-that-value".
> > > >>> > > > > This would necessitate two parameters to be passed into the
> > > >>> condition,
> > > >>> > > > > which I think is not currently included in the KIP. I am
> not
> > > >>> saying add
> > > >>> > > > > this now, but might it make sense to discuss a concept of
> how
> > > >>> that
> > > >>> > > might
> > > >>> > > > > look now, to avoid potential changes later on.
> > > >>> > > > >
> > > >>> > > > > Just of the top of my head it might look like:
> > > >>> > > > >
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> type:has-header
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > > header-name:my-header
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > field-value:my-value
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Also, while writing that, I noticed that you have a version
> > > with
> > > >>> and
> > > >>> > > > > without "name" for your transformation in the KIP:
> > > >>> > > > >
> > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > >>> > > has-header:my-header
> > > >>> > > > > and
> > > >>> > > > > transforms.conditionalExtract.condition:
> has-header:my-header
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Is this intentional and the name is optional?
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Best regards,
> > > >>> > > > >
> > > >>> > > > > Sönke
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
> tbentley@redhat.com>
> > > >>> wrote:
> > > >>> > > > > >
> > > >>> > > > > > Does anyone have any comments, feedback or thoughts about
> > > this?
> > > >>> > > > > >
> > > >>> > > > > > Thanks,
> > > >>> > > > > >
> > > >>> > > > > > Tom
> > > >>> > > > > >
> > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
> > > >>> tbentley@redhat.com>
> > > >>> > > > > wrote:
> > > >>> > > > > >
> > > >>> > > > > > > Hi,
> > > >>> > > > > > >
> > > >>> > > > > > > I've opened KIP-585 which is intended to provide a
> > > mechanism
> > > >>> to
> > > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be
> grateful
> > > >>> for any
> > > >>> > > > > > > feedback:
> > > >>> > > > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > >>> > > > > > >
> > > >>> > > > > > > Many thanks,
> > > >>> > > > > > >
> > > >>> > > > > > > Tom
> > > >>> > > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Andrew Schofield <sc...@gmail.com>.
I wonder whether we're getting a bit overcomplicated with this. I think all
that's required here is to add an optional guard predicate for a Transformation.
The predicate cannot end the Transformation chain, but it can allow or prevent a particular
Transformation from running.

How about this as syntax?
transforms: extractInt
transforms.extractInt.?type: org.apache.kafka.connect.predicates.TopicMatches
transforms.extractInt.?regex: my-prefix-.*
transforms.extractInt.type: org.apache.kafka.connect.transforms.ExtractField$Key
transforms.extractInt.field: c1

The idea is that a Transformation can have an optional RecordPredicate. The RecordPredicate
can have configuration items, in a similar way to a Transformation. The syntax of
using a '?' prefix to separate configuration for the RecordPredicate from the configuration
for the Transformation could conceivably clash with an existing Transformation
but the chances are minimal.

This syntax doesn't allow for an 'else', but if the KIP offers say a TopicMatches predicate
then that can be configured to return FALSE if the predicate matches.

I feel that a DSL for SMTs is overkill. If you need something that complex, it's
perhaps too complex for a transformation chain and it's really a streaming application.

Andrew Schofield
IBM Event Streams

On 2020/04/08 21:39:31, Christopher Egerton <ch...@confluent.io> wrote: 
> Hi Tom,
> 
> With regards to the potential Transformation::validate method, I don't
> quite follow your objections. The AbstractHerder class, ConnectorConfig
> class, and embedding of ConfigDefs that happens with the two is all
> internal logic and we're allowed to modify it however we want, as long as
> it doesn't alter any public-facing APIs (and if it does, we just have to
> document those changes in a KIP). We don't have to embed the ConfigDef for
> a transformation chain inside another ConfigDef if the API we want to
> present to our users doesn't play well with that part of the code base.
> Additionally, beyond aligning with the existing API provided for
> connectors, another advantage is that it becomes possible to validate
> properties for a configuration in the context of all other properties, so
> to be clear, it's not just about preserving what may be perceived as a
> superficial similarity and comes with real functional benefits that can't
> be provided (at least, not as easily) by dynamic construction of a
> ConfigDef object.
> 
> As far as the new proposal goes, I hate to say it, but I think we're stuck
> with the worst of both worlds here. Adding the new RecordPredicate
> interface seems like it defeats the whole purpose of SMTs, which is to
> allow manipulation of a connector's event stream by users who don't
> necessarily know how or have the time to write Java code of their own. This
> is also why I'm in favor of adding a lightweight DSL for the condition;
> emphasizing readability for people who aren't very familiar with Connect
> and just want to get something going quickly should be a priority for SMTs.
> But if that's not going to happen with this KIP, I'd still choose the
> simpler, less-flexible approach initially outlined, in order to keep things
> simple for people creating connectors and try to let them accomplish what
> they want via configuration, not code.
> 
> With regards to the question about where the line should be drawn and how
> much is too much and comparisons to other stream processing frameworks, I
> think the nature of SMTs draws the line quite nicely: you can only process
> one message at a time. There's plenty of use cases out there for
> heavier-duty processing frameworks like Kafka Streams, with aggregate
> operations, joining of streams, expanding a single message into multiple
> messages, etc. With SMTs, none of this is possible; the general use case is
> to filter and clean a stream of data. If any of the heavier-weight logic
> provided by, e.g., Streams, isn't required for a project, it should be
> possible to get along with just a collection of sink connectors, source
> connectors, a converter or two, and SMTs that smooth over any differences
> in data format between what source connectors produce and what sink
> connectors expect. This is why I'm comfortable suggesting heavy expansion
> of the out-of-the-box SMTs that we provide with Connect; as long as they're
> reasonable to configure, they can greatly reduce the operational burden on
> anyone running and/or using a Connect cluster since they might entirely
> replace additional services that would otherwise be required.
> 
> All that said--this is a huge ask of someone who just wants to support the
> SMT equivalent of an "if" statement, and it's totally understandable if
> that's too much to ask. The one concern I have left is that if we expand
> the SMT in the future, there become compatibility concerns since SMTs are
> pretty tightly-coupled with the worker on which they run (although
> technically, with some classpath/plugin path shuffling, you can run
> different versions of the out-of-the-box SMTs from the Connect worker on
> which they're run). Someone might write a connector config with the
> If/Conditional/Whatever SMT with a condition type that works on one worker,
> but that doesn't work on another worker that's running an earlier version
> of Connect. This is why I'm in favor of adding extra predicates now instead
> of later; if we're going to implement what has the potential to be a bit of
> a Swiss army knife SMT with room for future expansion of configuration, and
> we can think of a reasonable way to add that functionality now, it seems
> better for users and administrators of Connect to try to do that now
> instead of later.
> 
> Cheers,
> 
> Chris
> 
> On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com> wrote:
> 
> > Since no one objected I've updated the KIP with the revised way to
> > configure this transformation.
> >
> > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > To come back about a point Chris made:
> > >
> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> > method,
> > >> what would you think about adopting a similar approach as the framework
> > >> uses with connectors, by adding a "Config validate(Map<String, String>
> > >> props)" method that can perform custom validation outside of what can be
> > >> performed by the ConfigDef's single-property-at-a-time validation? It
> > may
> > >> be a little heavyweight for use with this particular SMT, but it'd
> > provide
> > >> more flexibility for other SMT implementations and would mirror an API
> > >> that
> > >> developers targeting the framework are likely already familiar with.
> > >>
> > >
> > > The validate() + config() approach taken for Connectors doesn't quite
> > work
> > > for Transformations.
> > >
> > > The default Connector.validate() basically just calls
> > > `config().validate(connectorConfigs)` and returns a Config. Separately
> > the
> > > AbstractHeader also calls `config()` and uses the Config and ConfigDef to
> > > build the ConfigInfos. The contract is not really defined well enough in
> > > the javadoc, but this means a connector can use validate() to build the
> > > ConfigDef which it will return from config().
> > >
> > > For Transformations we don't really want validate() to perform validation
> > > at all since ultimately the transformations config will be embedded in
> > the
> > > connector's and will be validated by the connector itself. It wouldn't be
> > > harmful if it _did_ perform validation, just unnecessary. But having a
> > > validate() method that ought not to validate() seems like a recipe for
> > > confusion.
> > >
> > > Also problematic is that there's no use for the Config returned from
> > > Transformations.validate().
> > >
> > > So I'm not convinced that the superficial similarity really gains
> > anything.
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Hi all,
> > >>
> > >> Thanks for the discussion so far.
> > >>
> > >> It seems a bit weird to me that when configuring the Conditional SMT
> > with
> > >> a DSL you would use a concise, intuitive DSL for expressing the
> > condition,
> > >> but not for the transforms that it's guarding. It also seems natural, if
> > >> you support this for conditionally applying SMTs, that you'd soon want
> > to
> > >> support the same thing for a generic filter transformer. Then, if you
> > can
> > >> configure a filter transformation using this DSL, it becomes odd that
> > you
> > >> can't do this for mapping transformations. I think it would be a
> > mistake to
> > >> go specifying an expression language for conditions when really that
> > might
> > >> just be part of a language for transformations.
> > >>
> > >> I think it would be possible today to write an SMT which allowed you to
> > >> express the transformation in a DSL. Concretely, it's possible to
> > imagine a
> > >> single transformation taking a DSL something like this:
> > >>
> > >>   compose(
> > >>     Flatten.Key(delimiter: ':'),
> > >>     If(condition: TopicMatches(pattern: "blah-.*"),
> > >>        then: Flatten.Value(delimiter: '/')))
> > >>
> > >> All I've really done here, beyond what was already proposed, is rename
> > >> Conditional to If, imagine a couple more bits of syntax
> > >> (SMTs being constructed by class name and named argument invocation
> > >> syntax to support SMT constructors) and add a higher level compose()
> > >> function for chaining SMTs (which could easily be replaced with brace
> > >> delimited blocks).
> > >>
> > >> That may be a discussion we should have, but I think in _this_ KIP we
> > >> should focus on the conditional part, since the above example hopefully
> > >> shows that it would be possible to reuse it in a DSL if there was
> > appetite
> > >> for that.
> > >>
> > >> With that in mind, and since my original suggestion for an abbreviated
> > >> config syntax didn't appeal to people, I propose that we stick with the
> > >> existing norms for configuring this.
> > >> The previous example would look like this:
> > >>
> > >> transformation: flattenKey,if
> > >> transformation.flattenKey.type: Flatten.Key
> > >> transformation.flattenKey.delimiter: :
> > >> transformation.if.type: If
> > >> transformation.if.condition.type: TopicMatches
> > >> transformation.if.condition.pattern: blah-.*
> > >> transformation.if.then: flattenValue
> > >> transformation.if.then.flattenValue.type: Flatten.Value
> > >> transformation.if.then.flattenValue.delimiter: /
> > >>
> > >> Also, I'm inclined to think we should stick to just supporting
> > >> TopicMatches and Not, since we've not identified an actual need for
> > >> 'has-header', 'or' and 'and'.
> > >>
> > >> An example of the usage of Not would look like this:
> > >>
> > >> transformation: flattenKey,if
> > >> transformation.flattenKey.type: Flatten.Key
> > >> transformation.flattenKey.delimiter: :
> > >> transformation.if.type: If
> > >> transformation.if.condition.type: Not
> > >> transformation.if.condition.operand: startsWithBlah
> > >> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
> > >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
> > >> transformation.if.then: flattenValue
> > >> transformation.if.then.flattenValue.type: Flatten.Value
> > >> transformation.if.then.flattenValue.delimiter: /
> > >>
> > >> Hopefully we can agree that this allows the conditional SMT part to make
> > >> progress without getting bogged down.
> > >>
> > >> Thoughts? If this seems broadly reasonable, I'll update the KIP.
> > >>
> > >> Kind regards,
> > >>
> > >> Tom
> > >>
> > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gu...@hibernate.org>
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> Thanks a lot for this initiative, Tom!
> > >>>
> > >>> To shed some light, the use case where this first came up, were issues
> > >>> we saw with SMTs being applied to the different topics produced by the
> > >>> Debezium change data capture connectors. There are different kinds of
> > >>> topics (for change data, schema history, heartbeats etc.) and the
> > >>> record structure to expect may vary between those. Hence we saw issues
> > >>> with SMTs like ExtractField, which for instance only should be applied
> > >>> to all change data topics but not the other ones.
> > >>>
> > >>> I like the overall approach; for Debezium's purposes, the simple topic
> > >>> matching and negation operators would be sufficient already. I agree
> > >>> with Chris and would prefer one single condition attribute, which
> > >>> contains a single condition or potentially a logical expression with
> > >>> not, and, etc. I think it's less ambiguous, in particular when it
> > >>> comes to ordering of the different conditions and determining their
> > >>> precedence.
> > >>>
> > >>> Would love to see this feature in one or another way in Connect.
> > >>>
> > >>> Best,
> > >>>
> > >>> --Gunnar
> > >>>
> > >>>
> > >>>
> > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
> > >>> tbentley@redhat.com>:
> > >>> >
> > >>> > Hi Chris and Sönke,
> > >>> >
> > >>> > Using the numbering from Chris's email...
> > >>> >
> > >>> > 1. That's a good point, I'll see what is needed to make that work.
> > >>> >
> > >>> > 2. I'm happy enough to add support for "and" and "or" as part of this
> > >>> KIP
> > >>> > if people can see a need for it.
> > >>> >
> > >>> > In a similar vein, I was wondering about whether it would be
> > worthwhile
> > >>> > having the equivalent of an "else" clause (what's in the KIP is
> > >>> basically
> > >>> > an "if" statement). Without support for "else" I think people would
> > >>> often
> > >>> > need two conditionals, with the condition of one being the negation
> > of
> > >>> the
> > >>> > condition of another.
> > >>> >
> > >>> > 3. I can see the attraction of an expression language. The pros
> > include
> > >>> > being terse and familiar to programmers and potentially very flexible
> > >>> if
> > >>> > that's needed in the future. I had a play and implemented it using
> > >>> ANTLR
> > >>> > and it's not difficult to write a grammar and implement the functions
> > >>> we've
> > >>> > already discussed and get decent error messages when the expression
> > is
> > >>> > malformed. So on the one hand I quite like the idea. On the other
> > hand
> > >>> it
> > >>> > feels like overkill for the use cases that have actually been
> > >>> identified so
> > >>> > far.
> > >>> >
> > >>> > @Sönke what do you make of the expression language idea?
> > >>> >
> > >>> > Kind regards,
> > >>> >
> > >>> > Tom
> > >>> >
> > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
> > >>> chrise@confluent.io>
> > >>> > wrote:
> > >>> >
> > >>> > > Hi Tom,
> > >>> > >
> > >>> > > This looks great and I'd love to see the out-of-the-box SMTs become
> > >>> even
> > >>> > > more powerful with the improvements you've proposed! Got a few
> > >>> remarks and
> > >>> > > would be interested in your thoughts:
> > >>> > >
> > >>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> > >>> method,
> > >>> > > what would you think about adopting a similar approach as the
> > >>> framework
> > >>> > > uses with connectors, by adding a "Config validate(Map<String,
> > >>> String>
> > >>> > > props)" method that can perform custom validation outside of what
> > >>> can be
> > >>> > > performed by the ConfigDef's single-property-at-a-time validation?
> > >>> It may
> > >>> > > be a little heavyweight for use with this particular SMT, but it'd
> > >>> provide
> > >>> > > more flexibility for other SMT implementations and would mirror an
> > >>> API that
> > >>> > > developers targeting the framework are likely already familiar
> > with.
> > >>> > > 2. The possibility for adding the logical operators "and" and "or"
> > is
> > >>> > > mentioned, but only as a potential future change and not one
> > >>> proposed by
> > >>> > > this KIP. Why not include those operators sooner rather than later?
> > >>> > > 3. The syntax for named conditions that are then referenced in
> > >>> logical
> > >>> > > operators is... tricky. It took me a few attempts to grok the
> > example
> > >>> > > provided in the KIP after reading Sönke's question about the
> > example
> > >>> for
> > >>> > > negation. What about a more sophisticated but less verbose syntax
> > >>> that
> > >>> > > supports a single configuration for the condition, even with
> > logical
> > >>> > > operators? I'm thinking something like
> > >>> > > "transforms.conditionalExtract.condition:
> > not(has-header:my-header)"
> > >>> > > instead of the "transforms.conditionalExtract.condition:
> > >>> not:hasMyHeader"
> > >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > has-header:my-header" properties. If support for a logical "and" is
> > >>> added,
> > >>> > > this could then be expanded to something like
> > >>> > > "transforms.conditionalExtract.condition:
> > and(has-header(my-header),
> > >>> > > not(topic-matches(my-prefix-.*)))". There would be additional
> > >>> complexity
> > >>> > > here with the need to escape parentheses and commas that are
> > >>> intended to be
> > >>> > > treated literally (as part of a header name, for example) instead
> > of
> > >>> as
> > >>> > > part of the syntax for the condition itself, but a little
> > additional
> > >>> > > complexity for edge cases like that may be warranted if it heavily
> > >>> reduces
> > >>> > > complexity for the common cases. The rationale for the proposed
> > >>> > > parentheses-based syntax here instead of what's mentioned in the
> > KIP
> > >>> > > (something like "and: <condition1>, <condition2>") is to help with
> > >>> > > readability; we probably wouldn't need that with the approach of
> > >>> naming
> > >>> > > conditions via separate properties, but things may get a little
> > >>> nasty with
> > >>> > > literal conditions included there, especially with the possibility
> > of
> > >>> > > nesting. Finally, the shift in syntax here should make it easier to
> > >>> handle
> > >>> > > cases like the header value matching brought up by Sönke; it might
> > >>> look
> > >>> > > something like "header-matches(header-name, header-value-pattern)".
> > >>> > >
> > >>> > > Cheers,
> > >>> > >
> > >>> > > Chris
> > >>> > >
> > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com>
> > >>> wrote:
> > >>> > >
> > >>> > > > Hi Sönke,
> > >>> > > >
> > >>> > > > Thanks for taking a look.
> > >>> > > >
> > >>> > > > Let me answer in reverse order, since I think it might make more
> > >>> sense
> > >>> > > that
> > >>> > > > way...
> > >>> > > >
> > >>> > > > Also, while writing that, I noticed that you have a version with
> > >>> and
> > >>> > > > > without "name" for your transformation in the KIP:
> > >>> > > > >
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > has-header:my-header
> > >>> > > > > and
> > >>> > > > > transforms.conditionalExtract.condition: has-header:my-header
> > >>> > > > >
> > >>> > > >
> > >>> > > > The example
> > >>> > > >     transforms.conditionalExtract.condition: has-header:my-header
> > >>> > > > is a "has-header" condition (the prefix of the config value),
> > >>> which will
> > >>> > > > match records with a "my-header" header (given in the suffix).
> > >>> > > >
> > >>> > > > The other example given is:
> > >>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
> > >>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > > has-header:my-header
> > >>> > > > The root of the condition is a "not" condition (the prefix of the
> > >>> value
> > >>> > > for
> > >>> > > > the transforms.conditionalExtract.condition key) of another named
> > >>> > > condition
> > >>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
> > >>> other
> > >>> > > > condition. That other condition is configured at
> > >>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
> > >>> condition
> > >>> > > > is a "has-header" condition (the prefix), which will match
> > records
> > >>> with a
> > >>> > > > "my-header" header (given in the suffix). So the "has-header:"
> > >>> condition
> > >>> > > > type would always require a suffix, as would the "not:" condition
> > >>> type.
> > >>> > > > Hypothetically you could have a "true" condition type (which
> > would
> > >>> not
> > >>> > > > require a suffix), and the hypothetical binary conditions "and:"
> > >>> and
> > >>> > > "or:"
> > >>> > > > would require a pair of other condition names.
> > >>> > > >
> > >>> > > > So what's proposed is a scheme for encoding conditions where the
> > >>> > > condition
> > >>> > > > type is the prefix of the value of some "....condition" config
> > >>> key, and
> > >>> > > the
> > >>> > > > optional suffix provides parameters for the condition. This makes
> > >>> those
> > >>> > > > parameters a bit inflexible, but is relatively succinct.
> > >>> > > >
> > >>> > > > This leads on to your first point. You're right that use cases
> > >>> might
> > >>> > > appear
> > >>> > > > which need other conditions, and we should make it flexible
> > enough
> > >>> to be
> > >>> > > > able to cope with future use cases. On the other hand, I was
> > >>> concerned
> > >>> > > that
> > >>> > > > we end up with something which is quite complicated to configure.
> > >>> (There
> > >>> > > > comes a point where it might makes more sense for the user to
> > >>> write their
> > >>> > > > own SMT).
> > >>> > > >
> > >>> > > > Just of the top of my head it might look like:
> > >>> > > > >
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> type:has-header
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > > header-name:my-header
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > field-value:my-value
> > >>> > > > >
> > >>> > > >
> > >>> > > > That won't work because the format is basically a
> > >>> Properties/Map<String,
> > >>> > > > String> and what you've suggested has duplicate keys.
> > >>> > > >
> > >>> > > > One thing I did briefly consider what the ability to treat
> > >>> conditions as
> > >>> > > > Configurable objects in their own right (opening up the
> > >>> possibility of
> > >>> > > > people supplying their own Conditions, just like they can supply
> > >>> their
> > >>> > > own
> > >>> > > > SMTs). That might be configured something like this:
> > >>> > > >
> > >>> > > >     transforms.conditionalExtract.condition: not
> > >>> > > >     transforms.conditionalExtract.condition.not.type: Not
> > >>> > > >     transforms.conditionalExtract.condition.not.negated: foo
> > >>> > > >     transforms.conditionalExtract.condition.foo.type:
> > >>> HasHeaderWithValue
> > >>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
> > >>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
> > >>> > > >
> > >>> > > > I didn't propose that I couldn't see the use cases to justify
> > this
> > >>> kind
> > >>> > > of
> > >>> > > > complexity, especially as the common case would surely be
> > matching
> > >>> > > against
> > >>> > > > topic name (to be honest I wasn't completely convinced by the
> > need
> > >>> for
> > >>> > > > "has-header"). In the current version of the KIP that's just
> > >>> > > >
> > >>> > > >     transforms.conditionalExtract.condition: topic-matches:
> > >>> my-prefix-.*
> > >>> > > >
> > >>> > > > but using the more flexible scheme that would require something
> > >>> more like
> > >>> > > > this:
> > >>> > > >
> > >>> > > >     transforms.conditionalExtract.condition: bar
> > >>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> > >>> > > >     transforms.conditionalExtract.condition.bar.pattern:
> > >>> my-prefix-.*
> > >>> > > >
> > >>> > > > If people know of use cases which would justify more
> > >>> sophistication, I'm
> > >>> > > > happy to reconsider.
> > >>> > > >
> > >>> > > > Thanks again for taking a look!
> > >>> > > >
> > >>> > > > Tom
> > >>> > > >
> > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > >>> > > > <so...@opencore.com.invalid> wrote:
> > >>> > > >
> > >>> > > > > Hi Tom,
> > >>> > > > >
> > >>> > > > > sounds useful to me, thanks for the KIP!
> > >>> > > > > The only thought that I had while reading was that this will
> > >>> probably
> > >>> > > > raise
> > >>> > > > > questions about more involved conditions fairly quickly. For
> > >>> example
> > >>> > > the
> > >>> > > > > "has-header" will cause an appetite for conditions like
> > >>> > > > > "this-header-has-that-value".
> > >>> > > > > This would necessitate two parameters to be passed into the
> > >>> condition,
> > >>> > > > > which I think is not currently included in the KIP. I am not
> > >>> saying add
> > >>> > > > > this now, but might it make sense to discuss a concept of how
> > >>> that
> > >>> > > might
> > >>> > > > > look now, to avoid potential changes later on.
> > >>> > > > >
> > >>> > > > > Just of the top of my head it might look like:
> > >>> > > > >
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> type:has-header
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > > header-name:my-header
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > field-value:my-value
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Also, while writing that, I noticed that you have a version
> > with
> > >>> and
> > >>> > > > > without "name" for your transformation in the KIP:
> > >>> > > > >
> > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > >>> > > has-header:my-header
> > >>> > > > > and
> > >>> > > > > transforms.conditionalExtract.condition: has-header:my-header
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Is this intentional and the name is optional?
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Best regards,
> > >>> > > > >
> > >>> > > > > Sönke
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
> > >>> wrote:
> > >>> > > > > >
> > >>> > > > > > Does anyone have any comments, feedback or thoughts about
> > this?
> > >>> > > > > >
> > >>> > > > > > Thanks,
> > >>> > > > > >
> > >>> > > > > > Tom
> > >>> > > > > >
> > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
> > >>> tbentley@redhat.com>
> > >>> > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > Hi,
> > >>> > > > > > >
> > >>> > > > > > > I've opened KIP-585 which is intended to provide a
> > mechanism
> > >>> to
> > >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
> > >>> for any
> > >>> > > > > > > feedback:
> > >>> > > > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > >>> > > > > > >
> > >>> > > > > > > Many thanks,
> > >>> > > > > > >
> > >>> > > > > > > Tom
> > >>> > > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>>
> > >>>
> >
> 

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Christopher Egerton <ch...@confluent.io>.
Hi Tom,

With regards to the potential Transformation::validate method, I don't
quite follow your objections. The AbstractHerder class, ConnectorConfig
class, and embedding of ConfigDefs that happens with the two is all
internal logic and we're allowed to modify it however we want, as long as
it doesn't alter any public-facing APIs (and if it does, we just have to
document those changes in a KIP). We don't have to embed the ConfigDef for
a transformation chain inside another ConfigDef if the API we want to
present to our users doesn't play well with that part of the code base.
Additionally, beyond aligning with the existing API provided for
connectors, another advantage is that it becomes possible to validate
properties for a configuration in the context of all other properties, so
to be clear, it's not just about preserving what may be perceived as a
superficial similarity and comes with real functional benefits that can't
be provided (at least, not as easily) by dynamic construction of a
ConfigDef object.

As far as the new proposal goes, I hate to say it, but I think we're stuck
with the worst of both worlds here. Adding the new RecordPredicate
interface seems like it defeats the whole purpose of SMTs, which is to
allow manipulation of a connector's event stream by users who don't
necessarily know how or have the time to write Java code of their own. This
is also why I'm in favor of adding a lightweight DSL for the condition;
emphasizing readability for people who aren't very familiar with Connect
and just want to get something going quickly should be a priority for SMTs.
But if that's not going to happen with this KIP, I'd still choose the
simpler, less-flexible approach initially outlined, in order to keep things
simple for people creating connectors and try to let them accomplish what
they want via configuration, not code.

With regards to the question about where the line should be drawn and how
much is too much and comparisons to other stream processing frameworks, I
think the nature of SMTs draws the line quite nicely: you can only process
one message at a time. There's plenty of use cases out there for
heavier-duty processing frameworks like Kafka Streams, with aggregate
operations, joining of streams, expanding a single message into multiple
messages, etc. With SMTs, none of this is possible; the general use case is
to filter and clean a stream of data. If any of the heavier-weight logic
provided by, e.g., Streams, isn't required for a project, it should be
possible to get along with just a collection of sink connectors, source
connectors, a converter or two, and SMTs that smooth over any differences
in data format between what source connectors produce and what sink
connectors expect. This is why I'm comfortable suggesting heavy expansion
of the out-of-the-box SMTs that we provide with Connect; as long as they're
reasonable to configure, they can greatly reduce the operational burden on
anyone running and/or using a Connect cluster since they might entirely
replace additional services that would otherwise be required.

All that said--this is a huge ask of someone who just wants to support the
SMT equivalent of an "if" statement, and it's totally understandable if
that's too much to ask. The one concern I have left is that if we expand
the SMT in the future, there become compatibility concerns since SMTs are
pretty tightly-coupled with the worker on which they run (although
technically, with some classpath/plugin path shuffling, you can run
different versions of the out-of-the-box SMTs from the Connect worker on
which they're run). Someone might write a connector config with the
If/Conditional/Whatever SMT with a condition type that works on one worker,
but that doesn't work on another worker that's running an earlier version
of Connect. This is why I'm in favor of adding extra predicates now instead
of later; if we're going to implement what has the potential to be a bit of
a Swiss army knife SMT with room for future expansion of configuration, and
we can think of a reasonable way to add that functionality now, it seems
better for users and administrators of Connect to try to do that now
instead of later.

Cheers,

Chris

On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <tb...@redhat.com> wrote:

> Since no one objected I've updated the KIP with the revised way to
> configure this transformation.
>
> On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com> wrote:
>
> > To come back about a point Chris made:
> >
> > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> method,
> >> what would you think about adopting a similar approach as the framework
> >> uses with connectors, by adding a "Config validate(Map<String, String>
> >> props)" method that can perform custom validation outside of what can be
> >> performed by the ConfigDef's single-property-at-a-time validation? It
> may
> >> be a little heavyweight for use with this particular SMT, but it'd
> provide
> >> more flexibility for other SMT implementations and would mirror an API
> >> that
> >> developers targeting the framework are likely already familiar with.
> >>
> >
> > The validate() + config() approach taken for Connectors doesn't quite
> work
> > for Transformations.
> >
> > The default Connector.validate() basically just calls
> > `config().validate(connectorConfigs)` and returns a Config. Separately
> the
> > AbstractHeader also calls `config()` and uses the Config and ConfigDef to
> > build the ConfigInfos. The contract is not really defined well enough in
> > the javadoc, but this means a connector can use validate() to build the
> > ConfigDef which it will return from config().
> >
> > For Transformations we don't really want validate() to perform validation
> > at all since ultimately the transformations config will be embedded in
> the
> > connector's and will be validated by the connector itself. It wouldn't be
> > harmful if it _did_ perform validation, just unnecessary. But having a
> > validate() method that ought not to validate() seems like a recipe for
> > confusion.
> >
> > Also problematic is that there's no use for the Config returned from
> > Transformations.validate().
> >
> > So I'm not convinced that the superficial similarity really gains
> anything.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com> wrote:
> >
> >> Hi,
> >>
> >> Hi all,
> >>
> >> Thanks for the discussion so far.
> >>
> >> It seems a bit weird to me that when configuring the Conditional SMT
> with
> >> a DSL you would use a concise, intuitive DSL for expressing the
> condition,
> >> but not for the transforms that it's guarding. It also seems natural, if
> >> you support this for conditionally applying SMTs, that you'd soon want
> to
> >> support the same thing for a generic filter transformer. Then, if you
> can
> >> configure a filter transformation using this DSL, it becomes odd that
> you
> >> can't do this for mapping transformations. I think it would be a
> mistake to
> >> go specifying an expression language for conditions when really that
> might
> >> just be part of a language for transformations.
> >>
> >> I think it would be possible today to write an SMT which allowed you to
> >> express the transformation in a DSL. Concretely, it's possible to
> imagine a
> >> single transformation taking a DSL something like this:
> >>
> >>   compose(
> >>     Flatten.Key(delimiter: ':'),
> >>     If(condition: TopicMatches(pattern: "blah-.*"),
> >>        then: Flatten.Value(delimiter: '/')))
> >>
> >> All I've really done here, beyond what was already proposed, is rename
> >> Conditional to If, imagine a couple more bits of syntax
> >> (SMTs being constructed by class name and named argument invocation
> >> syntax to support SMT constructors) and add a higher level compose()
> >> function for chaining SMTs (which could easily be replaced with brace
> >> delimited blocks).
> >>
> >> That may be a discussion we should have, but I think in _this_ KIP we
> >> should focus on the conditional part, since the above example hopefully
> >> shows that it would be possible to reuse it in a DSL if there was
> appetite
> >> for that.
> >>
> >> With that in mind, and since my original suggestion for an abbreviated
> >> config syntax didn't appeal to people, I propose that we stick with the
> >> existing norms for configuring this.
> >> The previous example would look like this:
> >>
> >> transformation: flattenKey,if
> >> transformation.flattenKey.type: Flatten.Key
> >> transformation.flattenKey.delimiter: :
> >> transformation.if.type: If
> >> transformation.if.condition.type: TopicMatches
> >> transformation.if.condition.pattern: blah-.*
> >> transformation.if.then: flattenValue
> >> transformation.if.then.flattenValue.type: Flatten.Value
> >> transformation.if.then.flattenValue.delimiter: /
> >>
> >> Also, I'm inclined to think we should stick to just supporting
> >> TopicMatches and Not, since we've not identified an actual need for
> >> 'has-header', 'or' and 'and'.
> >>
> >> An example of the usage of Not would look like this:
> >>
> >> transformation: flattenKey,if
> >> transformation.flattenKey.type: Flatten.Key
> >> transformation.flattenKey.delimiter: :
> >> transformation.if.type: If
> >> transformation.if.condition.type: Not
> >> transformation.if.condition.operand: startsWithBlah
> >> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
> >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
> >> transformation.if.then: flattenValue
> >> transformation.if.then.flattenValue.type: Flatten.Value
> >> transformation.if.then.flattenValue.delimiter: /
> >>
> >> Hopefully we can agree that this allows the conditional SMT part to make
> >> progress without getting bogged down.
> >>
> >> Thoughts? If this seems broadly reasonable, I'll update the KIP.
> >>
> >> Kind regards,
> >>
> >> Tom
> >>
> >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gu...@hibernate.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Thanks a lot for this initiative, Tom!
> >>>
> >>> To shed some light, the use case where this first came up, were issues
> >>> we saw with SMTs being applied to the different topics produced by the
> >>> Debezium change data capture connectors. There are different kinds of
> >>> topics (for change data, schema history, heartbeats etc.) and the
> >>> record structure to expect may vary between those. Hence we saw issues
> >>> with SMTs like ExtractField, which for instance only should be applied
> >>> to all change data topics but not the other ones.
> >>>
> >>> I like the overall approach; for Debezium's purposes, the simple topic
> >>> matching and negation operators would be sufficient already. I agree
> >>> with Chris and would prefer one single condition attribute, which
> >>> contains a single condition or potentially a logical expression with
> >>> not, and, etc. I think it's less ambiguous, in particular when it
> >>> comes to ordering of the different conditions and determining their
> >>> precedence.
> >>>
> >>> Would love to see this feature in one or another way in Connect.
> >>>
> >>> Best,
> >>>
> >>> --Gunnar
> >>>
> >>>
> >>>
> >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
> >>> tbentley@redhat.com>:
> >>> >
> >>> > Hi Chris and Sönke,
> >>> >
> >>> > Using the numbering from Chris's email...
> >>> >
> >>> > 1. That's a good point, I'll see what is needed to make that work.
> >>> >
> >>> > 2. I'm happy enough to add support for "and" and "or" as part of this
> >>> KIP
> >>> > if people can see a need for it.
> >>> >
> >>> > In a similar vein, I was wondering about whether it would be
> worthwhile
> >>> > having the equivalent of an "else" clause (what's in the KIP is
> >>> basically
> >>> > an "if" statement). Without support for "else" I think people would
> >>> often
> >>> > need two conditionals, with the condition of one being the negation
> of
> >>> the
> >>> > condition of another.
> >>> >
> >>> > 3. I can see the attraction of an expression language. The pros
> include
> >>> > being terse and familiar to programmers and potentially very flexible
> >>> if
> >>> > that's needed in the future. I had a play and implemented it using
> >>> ANTLR
> >>> > and it's not difficult to write a grammar and implement the functions
> >>> we've
> >>> > already discussed and get decent error messages when the expression
> is
> >>> > malformed. So on the one hand I quite like the idea. On the other
> hand
> >>> it
> >>> > feels like overkill for the use cases that have actually been
> >>> identified so
> >>> > far.
> >>> >
> >>> > @Sönke what do you make of the expression language idea?
> >>> >
> >>> > Kind regards,
> >>> >
> >>> > Tom
> >>> >
> >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
> >>> chrise@confluent.io>
> >>> > wrote:
> >>> >
> >>> > > Hi Tom,
> >>> > >
> >>> > > This looks great and I'd love to see the out-of-the-box SMTs become
> >>> even
> >>> > > more powerful with the improvements you've proposed! Got a few
> >>> remarks and
> >>> > > would be interested in your thoughts:
> >>> > >
> >>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> >>> method,
> >>> > > what would you think about adopting a similar approach as the
> >>> framework
> >>> > > uses with connectors, by adding a "Config validate(Map<String,
> >>> String>
> >>> > > props)" method that can perform custom validation outside of what
> >>> can be
> >>> > > performed by the ConfigDef's single-property-at-a-time validation?
> >>> It may
> >>> > > be a little heavyweight for use with this particular SMT, but it'd
> >>> provide
> >>> > > more flexibility for other SMT implementations and would mirror an
> >>> API that
> >>> > > developers targeting the framework are likely already familiar
> with.
> >>> > > 2. The possibility for adding the logical operators "and" and "or"
> is
> >>> > > mentioned, but only as a potential future change and not one
> >>> proposed by
> >>> > > this KIP. Why not include those operators sooner rather than later?
> >>> > > 3. The syntax for named conditions that are then referenced in
> >>> logical
> >>> > > operators is... tricky. It took me a few attempts to grok the
> example
> >>> > > provided in the KIP after reading Sönke's question about the
> example
> >>> for
> >>> > > negation. What about a more sophisticated but less verbose syntax
> >>> that
> >>> > > supports a single configuration for the condition, even with
> logical
> >>> > > operators? I'm thinking something like
> >>> > > "transforms.conditionalExtract.condition:
> not(has-header:my-header)"
> >>> > > instead of the "transforms.conditionalExtract.condition:
> >>> not:hasMyHeader"
> >>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > has-header:my-header" properties. If support for a logical "and" is
> >>> added,
> >>> > > this could then be expanded to something like
> >>> > > "transforms.conditionalExtract.condition:
> and(has-header(my-header),
> >>> > > not(topic-matches(my-prefix-.*)))". There would be additional
> >>> complexity
> >>> > > here with the need to escape parentheses and commas that are
> >>> intended to be
> >>> > > treated literally (as part of a header name, for example) instead
> of
> >>> as
> >>> > > part of the syntax for the condition itself, but a little
> additional
> >>> > > complexity for edge cases like that may be warranted if it heavily
> >>> reduces
> >>> > > complexity for the common cases. The rationale for the proposed
> >>> > > parentheses-based syntax here instead of what's mentioned in the
> KIP
> >>> > > (something like "and: <condition1>, <condition2>") is to help with
> >>> > > readability; we probably wouldn't need that with the approach of
> >>> naming
> >>> > > conditions via separate properties, but things may get a little
> >>> nasty with
> >>> > > literal conditions included there, especially with the possibility
> of
> >>> > > nesting. Finally, the shift in syntax here should make it easier to
> >>> handle
> >>> > > cases like the header value matching brought up by Sönke; it might
> >>> look
> >>> > > something like "header-matches(header-name, header-value-pattern)".
> >>> > >
> >>> > > Cheers,
> >>> > >
> >>> > > Chris
> >>> > >
> >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com>
> >>> wrote:
> >>> > >
> >>> > > > Hi Sönke,
> >>> > > >
> >>> > > > Thanks for taking a look.
> >>> > > >
> >>> > > > Let me answer in reverse order, since I think it might make more
> >>> sense
> >>> > > that
> >>> > > > way...
> >>> > > >
> >>> > > > Also, while writing that, I noticed that you have a version with
> >>> and
> >>> > > > > without "name" for your transformation in the KIP:
> >>> > > > >
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > has-header:my-header
> >>> > > > > and
> >>> > > > > transforms.conditionalExtract.condition: has-header:my-header
> >>> > > > >
> >>> > > >
> >>> > > > The example
> >>> > > >     transforms.conditionalExtract.condition: has-header:my-header
> >>> > > > is a "has-header" condition (the prefix of the config value),
> >>> which will
> >>> > > > match records with a "my-header" header (given in the suffix).
> >>> > > >
> >>> > > > The other example given is:
> >>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
> >>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > > has-header:my-header
> >>> > > > The root of the condition is a "not" condition (the prefix of the
> >>> value
> >>> > > for
> >>> > > > the transforms.conditionalExtract.condition key) of another named
> >>> > > condition
> >>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
> >>> other
> >>> > > > condition. That other condition is configured at
> >>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
> >>> condition
> >>> > > > is a "has-header" condition (the prefix), which will match
> records
> >>> with a
> >>> > > > "my-header" header (given in the suffix). So the "has-header:"
> >>> condition
> >>> > > > type would always require a suffix, as would the "not:" condition
> >>> type.
> >>> > > > Hypothetically you could have a "true" condition type (which
> would
> >>> not
> >>> > > > require a suffix), and the hypothetical binary conditions "and:"
> >>> and
> >>> > > "or:"
> >>> > > > would require a pair of other condition names.
> >>> > > >
> >>> > > > So what's proposed is a scheme for encoding conditions where the
> >>> > > condition
> >>> > > > type is the prefix of the value of some "....condition" config
> >>> key, and
> >>> > > the
> >>> > > > optional suffix provides parameters for the condition. This makes
> >>> those
> >>> > > > parameters a bit inflexible, but is relatively succinct.
> >>> > > >
> >>> > > > This leads on to your first point. You're right that use cases
> >>> might
> >>> > > appear
> >>> > > > which need other conditions, and we should make it flexible
> enough
> >>> to be
> >>> > > > able to cope with future use cases. On the other hand, I was
> >>> concerned
> >>> > > that
> >>> > > > we end up with something which is quite complicated to configure.
> >>> (There
> >>> > > > comes a point where it might makes more sense for the user to
> >>> write their
> >>> > > > own SMT).
> >>> > > >
> >>> > > > Just of the top of my head it might look like:
> >>> > > > >
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> type:has-header
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > > header-name:my-header
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > field-value:my-value
> >>> > > > >
> >>> > > >
> >>> > > > That won't work because the format is basically a
> >>> Properties/Map<String,
> >>> > > > String> and what you've suggested has duplicate keys.
> >>> > > >
> >>> > > > One thing I did briefly consider what the ability to treat
> >>> conditions as
> >>> > > > Configurable objects in their own right (opening up the
> >>> possibility of
> >>> > > > people supplying their own Conditions, just like they can supply
> >>> their
> >>> > > own
> >>> > > > SMTs). That might be configured something like this:
> >>> > > >
> >>> > > >     transforms.conditionalExtract.condition: not
> >>> > > >     transforms.conditionalExtract.condition.not.type: Not
> >>> > > >     transforms.conditionalExtract.condition.not.negated: foo
> >>> > > >     transforms.conditionalExtract.condition.foo.type:
> >>> HasHeaderWithValue
> >>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
> >>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
> >>> > > >
> >>> > > > I didn't propose that I couldn't see the use cases to justify
> this
> >>> kind
> >>> > > of
> >>> > > > complexity, especially as the common case would surely be
> matching
> >>> > > against
> >>> > > > topic name (to be honest I wasn't completely convinced by the
> need
> >>> for
> >>> > > > "has-header"). In the current version of the KIP that's just
> >>> > > >
> >>> > > >     transforms.conditionalExtract.condition: topic-matches:
> >>> my-prefix-.*
> >>> > > >
> >>> > > > but using the more flexible scheme that would require something
> >>> more like
> >>> > > > this:
> >>> > > >
> >>> > > >     transforms.conditionalExtract.condition: bar
> >>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> >>> > > >     transforms.conditionalExtract.condition.bar.pattern:
> >>> my-prefix-.*
> >>> > > >
> >>> > > > If people know of use cases which would justify more
> >>> sophistication, I'm
> >>> > > > happy to reconsider.
> >>> > > >
> >>> > > > Thanks again for taking a look!
> >>> > > >
> >>> > > > Tom
> >>> > > >
> >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> >>> > > > <so...@opencore.com.invalid> wrote:
> >>> > > >
> >>> > > > > Hi Tom,
> >>> > > > >
> >>> > > > > sounds useful to me, thanks for the KIP!
> >>> > > > > The only thought that I had while reading was that this will
> >>> probably
> >>> > > > raise
> >>> > > > > questions about more involved conditions fairly quickly. For
> >>> example
> >>> > > the
> >>> > > > > "has-header" will cause an appetite for conditions like
> >>> > > > > "this-header-has-that-value".
> >>> > > > > This would necessitate two parameters to be passed into the
> >>> condition,
> >>> > > > > which I think is not currently included in the KIP. I am not
> >>> saying add
> >>> > > > > this now, but might it make sense to discuss a concept of how
> >>> that
> >>> > > might
> >>> > > > > look now, to avoid potential changes later on.
> >>> > > > >
> >>> > > > > Just of the top of my head it might look like:
> >>> > > > >
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> type:has-header
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > > header-name:my-header
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > field-value:my-value
> >>> > > > >
> >>> > > > >
> >>> > > > > Also, while writing that, I noticed that you have a version
> with
> >>> and
> >>> > > > > without "name" for your transformation in the KIP:
> >>> > > > >
> >>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> >>> > > has-header:my-header
> >>> > > > > and
> >>> > > > > transforms.conditionalExtract.condition: has-header:my-header
> >>> > > > >
> >>> > > > >
> >>> > > > > Is this intentional and the name is optional?
> >>> > > > >
> >>> > > > >
> >>> > > > > Best regards,
> >>> > > > >
> >>> > > > > Sönke
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
> >>> wrote:
> >>> > > > > >
> >>> > > > > > Does anyone have any comments, feedback or thoughts about
> this?
> >>> > > > > >
> >>> > > > > > Thanks,
> >>> > > > > >
> >>> > > > > > Tom
> >>> > > > > >
> >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
> >>> tbentley@redhat.com>
> >>> > > > > wrote:
> >>> > > > > >
> >>> > > > > > > Hi,
> >>> > > > > > >
> >>> > > > > > > I've opened KIP-585 which is intended to provide a
> mechanism
> >>> to
> >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
> >>> for any
> >>> > > > > > > feedback:
> >>> > > > > > >
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> >>> > > > > > >
> >>> > > > > > > Many thanks,
> >>> > > > > > >
> >>> > > > > > > Tom
> >>> > > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>>
> >>>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Since no one objected I've updated the KIP with the revised way to
configure this transformation.

On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tb...@redhat.com> wrote:

> To come back about a point Chris made:
>
> 1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
>> what would you think about adopting a similar approach as the framework
>> uses with connectors, by adding a "Config validate(Map<String, String>
>> props)" method that can perform custom validation outside of what can be
>> performed by the ConfigDef's single-property-at-a-time validation? It may
>> be a little heavyweight for use with this particular SMT, but it'd provide
>> more flexibility for other SMT implementations and would mirror an API
>> that
>> developers targeting the framework are likely already familiar with.
>>
>
> The validate() + config() approach taken for Connectors doesn't quite work
> for Transformations.
>
> The default Connector.validate() basically just calls
> `config().validate(connectorConfigs)` and returns a Config. Separately the
> AbstractHeader also calls `config()` and uses the Config and ConfigDef to
> build the ConfigInfos. The contract is not really defined well enough in
> the javadoc, but this means a connector can use validate() to build the
> ConfigDef which it will return from config().
>
> For Transformations we don't really want validate() to perform validation
> at all since ultimately the transformations config will be embedded in the
> connector's and will be validated by the connector itself. It wouldn't be
> harmful if it _did_ perform validation, just unnecessary. But having a
> validate() method that ought not to validate() seems like a recipe for
> confusion.
>
> Also problematic is that there's no use for the Config returned from
> Transformations.validate().
>
> So I'm not convinced that the superficial similarity really gains anything.
>
> Kind regards,
>
> Tom
>
> On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com> wrote:
>
>> Hi,
>>
>> Hi all,
>>
>> Thanks for the discussion so far.
>>
>> It seems a bit weird to me that when configuring the Conditional SMT with
>> a DSL you would use a concise, intuitive DSL for expressing the condition,
>> but not for the transforms that it's guarding. It also seems natural, if
>> you support this for conditionally applying SMTs, that you'd soon want to
>> support the same thing for a generic filter transformer. Then, if you can
>> configure a filter transformation using this DSL, it becomes odd that you
>> can't do this for mapping transformations. I think it would be a mistake to
>> go specifying an expression language for conditions when really that might
>> just be part of a language for transformations.
>>
>> I think it would be possible today to write an SMT which allowed you to
>> express the transformation in a DSL. Concretely, it's possible to imagine a
>> single transformation taking a DSL something like this:
>>
>>   compose(
>>     Flatten.Key(delimiter: ':'),
>>     If(condition: TopicMatches(pattern: "blah-.*"),
>>        then: Flatten.Value(delimiter: '/')))
>>
>> All I've really done here, beyond what was already proposed, is rename
>> Conditional to If, imagine a couple more bits of syntax
>> (SMTs being constructed by class name and named argument invocation
>> syntax to support SMT constructors) and add a higher level compose()
>> function for chaining SMTs (which could easily be replaced with brace
>> delimited blocks).
>>
>> That may be a discussion we should have, but I think in _this_ KIP we
>> should focus on the conditional part, since the above example hopefully
>> shows that it would be possible to reuse it in a DSL if there was appetite
>> for that.
>>
>> With that in mind, and since my original suggestion for an abbreviated
>> config syntax didn't appeal to people, I propose that we stick with the
>> existing norms for configuring this.
>> The previous example would look like this:
>>
>> transformation: flattenKey,if
>> transformation.flattenKey.type: Flatten.Key
>> transformation.flattenKey.delimiter: :
>> transformation.if.type: If
>> transformation.if.condition.type: TopicMatches
>> transformation.if.condition.pattern: blah-.*
>> transformation.if.then: flattenValue
>> transformation.if.then.flattenValue.type: Flatten.Value
>> transformation.if.then.flattenValue.delimiter: /
>>
>> Also, I'm inclined to think we should stick to just supporting
>> TopicMatches and Not, since we've not identified an actual need for
>> 'has-header', 'or' and 'and'.
>>
>> An example of the usage of Not would look like this:
>>
>> transformation: flattenKey,if
>> transformation.flattenKey.type: Flatten.Key
>> transformation.flattenKey.delimiter: :
>> transformation.if.type: If
>> transformation.if.condition.type: Not
>> transformation.if.condition.operand: startsWithBlah
>> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
>> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
>> transformation.if.then: flattenValue
>> transformation.if.then.flattenValue.type: Flatten.Value
>> transformation.if.then.flattenValue.delimiter: /
>>
>> Hopefully we can agree that this allows the conditional SMT part to make
>> progress without getting bogged down.
>>
>> Thoughts? If this seems broadly reasonable, I'll update the KIP.
>>
>> Kind regards,
>>
>> Tom
>>
>> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gu...@hibernate.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> Thanks a lot for this initiative, Tom!
>>>
>>> To shed some light, the use case where this first came up, were issues
>>> we saw with SMTs being applied to the different topics produced by the
>>> Debezium change data capture connectors. There are different kinds of
>>> topics (for change data, schema history, heartbeats etc.) and the
>>> record structure to expect may vary between those. Hence we saw issues
>>> with SMTs like ExtractField, which for instance only should be applied
>>> to all change data topics but not the other ones.
>>>
>>> I like the overall approach; for Debezium's purposes, the simple topic
>>> matching and negation operators would be sufficient already. I agree
>>> with Chris and would prefer one single condition attribute, which
>>> contains a single condition or potentially a logical expression with
>>> not, and, etc. I think it's less ambiguous, in particular when it
>>> comes to ordering of the different conditions and determining their
>>> precedence.
>>>
>>> Would love to see this feature in one or another way in Connect.
>>>
>>> Best,
>>>
>>> --Gunnar
>>>
>>>
>>>
>>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>>> tbentley@redhat.com>:
>>> >
>>> > Hi Chris and Sönke,
>>> >
>>> > Using the numbering from Chris's email...
>>> >
>>> > 1. That's a good point, I'll see what is needed to make that work.
>>> >
>>> > 2. I'm happy enough to add support for "and" and "or" as part of this
>>> KIP
>>> > if people can see a need for it.
>>> >
>>> > In a similar vein, I was wondering about whether it would be worthwhile
>>> > having the equivalent of an "else" clause (what's in the KIP is
>>> basically
>>> > an "if" statement). Without support for "else" I think people would
>>> often
>>> > need two conditionals, with the condition of one being the negation of
>>> the
>>> > condition of another.
>>> >
>>> > 3. I can see the attraction of an expression language. The pros include
>>> > being terse and familiar to programmers and potentially very flexible
>>> if
>>> > that's needed in the future. I had a play and implemented it using
>>> ANTLR
>>> > and it's not difficult to write a grammar and implement the functions
>>> we've
>>> > already discussed and get decent error messages when the expression is
>>> > malformed. So on the one hand I quite like the idea. On the other hand
>>> it
>>> > feels like overkill for the use cases that have actually been
>>> identified so
>>> > far.
>>> >
>>> > @Sönke what do you make of the expression language idea?
>>> >
>>> > Kind regards,
>>> >
>>> > Tom
>>> >
>>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
>>> chrise@confluent.io>
>>> > wrote:
>>> >
>>> > > Hi Tom,
>>> > >
>>> > > This looks great and I'd love to see the out-of-the-box SMTs become
>>> even
>>> > > more powerful with the improvements you've proposed! Got a few
>>> remarks and
>>> > > would be interested in your thoughts:
>>> > >
>>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
>>> method,
>>> > > what would you think about adopting a similar approach as the
>>> framework
>>> > > uses with connectors, by adding a "Config validate(Map<String,
>>> String>
>>> > > props)" method that can perform custom validation outside of what
>>> can be
>>> > > performed by the ConfigDef's single-property-at-a-time validation?
>>> It may
>>> > > be a little heavyweight for use with this particular SMT, but it'd
>>> provide
>>> > > more flexibility for other SMT implementations and would mirror an
>>> API that
>>> > > developers targeting the framework are likely already familiar with.
>>> > > 2. The possibility for adding the logical operators "and" and "or" is
>>> > > mentioned, but only as a potential future change and not one
>>> proposed by
>>> > > this KIP. Why not include those operators sooner rather than later?
>>> > > 3. The syntax for named conditions that are then referenced in
>>> logical
>>> > > operators is... tricky. It took me a few attempts to grok the example
>>> > > provided in the KIP after reading Sönke's question about the example
>>> for
>>> > > negation. What about a more sophisticated but less verbose syntax
>>> that
>>> > > supports a single configuration for the condition, even with logical
>>> > > operators? I'm thinking something like
>>> > > "transforms.conditionalExtract.condition: not(has-header:my-header)"
>>> > > instead of the "transforms.conditionalExtract.condition:
>>> not:hasMyHeader"
>>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header" properties. If support for a logical "and" is
>>> added,
>>> > > this could then be expanded to something like
>>> > > "transforms.conditionalExtract.condition: and(has-header(my-header),
>>> > > not(topic-matches(my-prefix-.*)))". There would be additional
>>> complexity
>>> > > here with the need to escape parentheses and commas that are
>>> intended to be
>>> > > treated literally (as part of a header name, for example) instead of
>>> as
>>> > > part of the syntax for the condition itself, but a little additional
>>> > > complexity for edge cases like that may be warranted if it heavily
>>> reduces
>>> > > complexity for the common cases. The rationale for the proposed
>>> > > parentheses-based syntax here instead of what's mentioned in the KIP
>>> > > (something like "and: <condition1>, <condition2>") is to help with
>>> > > readability; we probably wouldn't need that with the approach of
>>> naming
>>> > > conditions via separate properties, but things may get a little
>>> nasty with
>>> > > literal conditions included there, especially with the possibility of
>>> > > nesting. Finally, the shift in syntax here should make it easier to
>>> handle
>>> > > cases like the header value matching brought up by Sönke; it might
>>> look
>>> > > something like "header-matches(header-name, header-value-pattern)".
>>> > >
>>> > > Cheers,
>>> > >
>>> > > Chris
>>> > >
>>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com>
>>> wrote:
>>> > >
>>> > > > Hi Sönke,
>>> > > >
>>> > > > Thanks for taking a look.
>>> > > >
>>> > > > Let me answer in reverse order, since I think it might make more
>>> sense
>>> > > that
>>> > > > way...
>>> > > >
>>> > > > Also, while writing that, I noticed that you have a version with
>>> and
>>> > > > > without "name" for your transformation in the KIP:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header
>>> > > > > and
>>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>>> > > > >
>>> > > >
>>> > > > The example
>>> > > >     transforms.conditionalExtract.condition: has-header:my-header
>>> > > > is a "has-header" condition (the prefix of the config value),
>>> which will
>>> > > > match records with a "my-header" header (given in the suffix).
>>> > > >
>>> > > > The other example given is:
>>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
>>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > has-header:my-header
>>> > > > The root of the condition is a "not" condition (the prefix of the
>>> value
>>> > > for
>>> > > > the transforms.conditionalExtract.condition key) of another named
>>> > > condition
>>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
>>> other
>>> > > > condition. That other condition is configured at
>>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
>>> condition
>>> > > > is a "has-header" condition (the prefix), which will match records
>>> with a
>>> > > > "my-header" header (given in the suffix). So the "has-header:"
>>> condition
>>> > > > type would always require a suffix, as would the "not:" condition
>>> type.
>>> > > > Hypothetically you could have a "true" condition type (which would
>>> not
>>> > > > require a suffix), and the hypothetical binary conditions "and:"
>>> and
>>> > > "or:"
>>> > > > would require a pair of other condition names.
>>> > > >
>>> > > > So what's proposed is a scheme for encoding conditions where the
>>> > > condition
>>> > > > type is the prefix of the value of some "....condition" config
>>> key, and
>>> > > the
>>> > > > optional suffix provides parameters for the condition. This makes
>>> those
>>> > > > parameters a bit inflexible, but is relatively succinct.
>>> > > >
>>> > > > This leads on to your first point. You're right that use cases
>>> might
>>> > > appear
>>> > > > which need other conditions, and we should make it flexible enough
>>> to be
>>> > > > able to cope with future use cases. On the other hand, I was
>>> concerned
>>> > > that
>>> > > > we end up with something which is quite complicated to configure.
>>> (There
>>> > > > comes a point where it might makes more sense for the user to
>>> write their
>>> > > > own SMT).
>>> > > >
>>> > > > Just of the top of my head it might look like:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> type:has-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > header-name:my-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > field-value:my-value
>>> > > > >
>>> > > >
>>> > > > That won't work because the format is basically a
>>> Properties/Map<String,
>>> > > > String> and what you've suggested has duplicate keys.
>>> > > >
>>> > > > One thing I did briefly consider what the ability to treat
>>> conditions as
>>> > > > Configurable objects in their own right (opening up the
>>> possibility of
>>> > > > people supplying their own Conditions, just like they can supply
>>> their
>>> > > own
>>> > > > SMTs). That might be configured something like this:
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: not
>>> > > >     transforms.conditionalExtract.condition.not.type: Not
>>> > > >     transforms.conditionalExtract.condition.not.negated: foo
>>> > > >     transforms.conditionalExtract.condition.foo.type:
>>> HasHeaderWithValue
>>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
>>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
>>> > > >
>>> > > > I didn't propose that I couldn't see the use cases to justify this
>>> kind
>>> > > of
>>> > > > complexity, especially as the common case would surely be matching
>>> > > against
>>> > > > topic name (to be honest I wasn't completely convinced by the need
>>> for
>>> > > > "has-header"). In the current version of the KIP that's just
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: topic-matches:
>>> my-prefix-.*
>>> > > >
>>> > > > but using the more flexible scheme that would require something
>>> more like
>>> > > > this:
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: bar
>>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
>>> > > >     transforms.conditionalExtract.condition.bar.pattern:
>>> my-prefix-.*
>>> > > >
>>> > > > If people know of use cases which would justify more
>>> sophistication, I'm
>>> > > > happy to reconsider.
>>> > > >
>>> > > > Thanks again for taking a look!
>>> > > >
>>> > > > Tom
>>> > > >
>>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>>> > > > <so...@opencore.com.invalid> wrote:
>>> > > >
>>> > > > > Hi Tom,
>>> > > > >
>>> > > > > sounds useful to me, thanks for the KIP!
>>> > > > > The only thought that I had while reading was that this will
>>> probably
>>> > > > raise
>>> > > > > questions about more involved conditions fairly quickly. For
>>> example
>>> > > the
>>> > > > > "has-header" will cause an appetite for conditions like
>>> > > > > "this-header-has-that-value".
>>> > > > > This would necessitate two parameters to be passed into the
>>> condition,
>>> > > > > which I think is not currently included in the KIP. I am not
>>> saying add
>>> > > > > this now, but might it make sense to discuss a concept of how
>>> that
>>> > > might
>>> > > > > look now, to avoid potential changes later on.
>>> > > > >
>>> > > > > Just of the top of my head it might look like:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> type:has-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > header-name:my-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > field-value:my-value
>>> > > > >
>>> > > > >
>>> > > > > Also, while writing that, I noticed that you have a version with
>>> and
>>> > > > > without "name" for your transformation in the KIP:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header
>>> > > > > and
>>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>>> > > > >
>>> > > > >
>>> > > > > Is this intentional and the name is optional?
>>> > > > >
>>> > > > >
>>> > > > > Best regards,
>>> > > > >
>>> > > > > Sönke
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
>>> wrote:
>>> > > > > >
>>> > > > > > Does anyone have any comments, feedback or thoughts about this?
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Tom
>>> > > > > >
>>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
>>> tbentley@redhat.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > Hi,
>>> > > > > > >
>>> > > > > > > I've opened KIP-585 which is intended to provide a mechanism
>>> to
>>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
>>> for any
>>> > > > > > > feedback:
>>> > > > > > >
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>>> > > > > > >
>>> > > > > > > Many thanks,
>>> > > > > > >
>>> > > > > > > Tom
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>>
>>>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
To come back about a point Chris made:

1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
> what would you think about adopting a similar approach as the framework
> uses with connectors, by adding a "Config validate(Map<String, String>
> props)" method that can perform custom validation outside of what can be
> performed by the ConfigDef's single-property-at-a-time validation? It may
> be a little heavyweight for use with this particular SMT, but it'd provide
> more flexibility for other SMT implementations and would mirror an API that
> developers targeting the framework are likely already familiar with.
>

The validate() + config() approach taken for Connectors doesn't quite work
for Transformations.

The default Connector.validate() basically just calls
`config().validate(connectorConfigs)` and returns a Config. Separately the
AbstractHeader also calls `config()` and uses the Config and ConfigDef to
build the ConfigInfos. The contract is not really defined well enough in
the javadoc, but this means a connector can use validate() to build the
ConfigDef which it will return from config().

For Transformations we don't really want validate() to perform validation
at all since ultimately the transformations config will be embedded in the
connector's and will be validated by the connector itself. It wouldn't be
harmful if it _did_ perform validation, just unnecessary. But having a
validate() method that ought not to validate() seems like a recipe for
confusion.

Also problematic is that there's no use for the Config returned from
Transformations.validate().

So I'm not convinced that the superficial similarity really gains anything.

Kind regards,

Tom

On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tb...@redhat.com> wrote:

> Hi,
>
> Hi all,
>
> Thanks for the discussion so far.
>
> It seems a bit weird to me that when configuring the Conditional SMT with
> a DSL you would use a concise, intuitive DSL for expressing the condition,
> but not for the transforms that it's guarding. It also seems natural, if
> you support this for conditionally applying SMTs, that you'd soon want to
> support the same thing for a generic filter transformer. Then, if you can
> configure a filter transformation using this DSL, it becomes odd that you
> can't do this for mapping transformations. I think it would be a mistake to
> go specifying an expression language for conditions when really that might
> just be part of a language for transformations.
>
> I think it would be possible today to write an SMT which allowed you to
> express the transformation in a DSL. Concretely, it's possible to imagine a
> single transformation taking a DSL something like this:
>
>   compose(
>     Flatten.Key(delimiter: ':'),
>     If(condition: TopicMatches(pattern: "blah-.*"),
>        then: Flatten.Value(delimiter: '/')))
>
> All I've really done here, beyond what was already proposed, is rename
> Conditional to If, imagine a couple more bits of syntax
> (SMTs being constructed by class name and named argument invocation syntax
> to support SMT constructors) and add a higher level compose() function for
> chaining SMTs (which could easily be replaced with brace delimited blocks).
>
> That may be a discussion we should have, but I think in _this_ KIP we
> should focus on the conditional part, since the above example hopefully
> shows that it would be possible to reuse it in a DSL if there was appetite
> for that.
>
> With that in mind, and since my original suggestion for an abbreviated
> config syntax didn't appeal to people, I propose that we stick with the
> existing norms for configuring this.
> The previous example would look like this:
>
> transformation: flattenKey,if
> transformation.flattenKey.type: Flatten.Key
> transformation.flattenKey.delimiter: :
> transformation.if.type: If
> transformation.if.condition.type: TopicMatches
> transformation.if.condition.pattern: blah-.*
> transformation.if.then: flattenValue
> transformation.if.then.flattenValue.type: Flatten.Value
> transformation.if.then.flattenValue.delimiter: /
>
> Also, I'm inclined to think we should stick to just supporting
> TopicMatches and Not, since we've not identified an actual need for
> 'has-header', 'or' and 'and'.
>
> An example of the usage of Not would look like this:
>
> transformation: flattenKey,if
> transformation.flattenKey.type: Flatten.Key
> transformation.flattenKey.delimiter: :
> transformation.if.type: If
> transformation.if.condition.type: Not
> transformation.if.condition.operand: startsWithBlah
> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
> transformation.if.then: flattenValue
> transformation.if.then.flattenValue.type: Flatten.Value
> transformation.if.then.flattenValue.delimiter: /
>
> Hopefully we can agree that this allows the conditional SMT part to make
> progress without getting bogged down.
>
> Thoughts? If this seems broadly reasonable, I'll update the KIP.
>
> Kind regards,
>
> Tom
>
> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gu...@hibernate.org>
> wrote:
>
>> Hi all,
>>
>> Thanks a lot for this initiative, Tom!
>>
>> To shed some light, the use case where this first came up, were issues
>> we saw with SMTs being applied to the different topics produced by the
>> Debezium change data capture connectors. There are different kinds of
>> topics (for change data, schema history, heartbeats etc.) and the
>> record structure to expect may vary between those. Hence we saw issues
>> with SMTs like ExtractField, which for instance only should be applied
>> to all change data topics but not the other ones.
>>
>> I like the overall approach; for Debezium's purposes, the simple topic
>> matching and negation operators would be sufficient already. I agree
>> with Chris and would prefer one single condition attribute, which
>> contains a single condition or potentially a logical expression with
>> not, and, etc. I think it's less ambiguous, in particular when it
>> comes to ordering of the different conditions and determining their
>> precedence.
>>
>> Would love to see this feature in one or another way in Connect.
>>
>> Best,
>>
>> --Gunnar
>>
>>
>>
>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>> tbentley@redhat.com>:
>> >
>> > Hi Chris and Sönke,
>> >
>> > Using the numbering from Chris's email...
>> >
>> > 1. That's a good point, I'll see what is needed to make that work.
>> >
>> > 2. I'm happy enough to add support for "and" and "or" as part of this
>> KIP
>> > if people can see a need for it.
>> >
>> > In a similar vein, I was wondering about whether it would be worthwhile
>> > having the equivalent of an "else" clause (what's in the KIP is
>> basically
>> > an "if" statement). Without support for "else" I think people would
>> often
>> > need two conditionals, with the condition of one being the negation of
>> the
>> > condition of another.
>> >
>> > 3. I can see the attraction of an expression language. The pros include
>> > being terse and familiar to programmers and potentially very flexible if
>> > that's needed in the future. I had a play and implemented it using ANTLR
>> > and it's not difficult to write a grammar and implement the functions
>> we've
>> > already discussed and get decent error messages when the expression is
>> > malformed. So on the one hand I quite like the idea. On the other hand
>> it
>> > feels like overkill for the use cases that have actually been
>> identified so
>> > far.
>> >
>> > @Sönke what do you make of the expression language idea?
>> >
>> > Kind regards,
>> >
>> > Tom
>> >
>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <chrise@confluent.io
>> >
>> > wrote:
>> >
>> > > Hi Tom,
>> > >
>> > > This looks great and I'd love to see the out-of-the-box SMTs become
>> even
>> > > more powerful with the improvements you've proposed! Got a few
>> remarks and
>> > > would be interested in your thoughts:
>> > >
>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
>> method,
>> > > what would you think about adopting a similar approach as the
>> framework
>> > > uses with connectors, by adding a "Config validate(Map<String, String>
>> > > props)" method that can perform custom validation outside of what can
>> be
>> > > performed by the ConfigDef's single-property-at-a-time validation? It
>> may
>> > > be a little heavyweight for use with this particular SMT, but it'd
>> provide
>> > > more flexibility for other SMT implementations and would mirror an
>> API that
>> > > developers targeting the framework are likely already familiar with.
>> > > 2. The possibility for adding the logical operators "and" and "or" is
>> > > mentioned, but only as a potential future change and not one proposed
>> by
>> > > this KIP. Why not include those operators sooner rather than later?
>> > > 3. The syntax for named conditions that are then referenced in logical
>> > > operators is... tricky. It took me a few attempts to grok the example
>> > > provided in the KIP after reading Sönke's question about the example
>> for
>> > > negation. What about a more sophisticated but less verbose syntax that
>> > > supports a single configuration for the condition, even with logical
>> > > operators? I'm thinking something like
>> > > "transforms.conditionalExtract.condition: not(has-header:my-header)"
>> > > instead of the "transforms.conditionalExtract.condition:
>> not:hasMyHeader"
>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header" properties. If support for a logical "and" is
>> added,
>> > > this could then be expanded to something like
>> > > "transforms.conditionalExtract.condition: and(has-header(my-header),
>> > > not(topic-matches(my-prefix-.*)))". There would be additional
>> complexity
>> > > here with the need to escape parentheses and commas that are intended
>> to be
>> > > treated literally (as part of a header name, for example) instead of
>> as
>> > > part of the syntax for the condition itself, but a little additional
>> > > complexity for edge cases like that may be warranted if it heavily
>> reduces
>> > > complexity for the common cases. The rationale for the proposed
>> > > parentheses-based syntax here instead of what's mentioned in the KIP
>> > > (something like "and: <condition1>, <condition2>") is to help with
>> > > readability; we probably wouldn't need that with the approach of
>> naming
>> > > conditions via separate properties, but things may get a little nasty
>> with
>> > > literal conditions included there, especially with the possibility of
>> > > nesting. Finally, the shift in syntax here should make it easier to
>> handle
>> > > cases like the header value matching brought up by Sönke; it might
>> look
>> > > something like "header-matches(header-name, header-value-pattern)".
>> > >
>> > > Cheers,
>> > >
>> > > Chris
>> > >
>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com>
>> wrote:
>> > >
>> > > > Hi Sönke,
>> > > >
>> > > > Thanks for taking a look.
>> > > >
>> > > > Let me answer in reverse order, since I think it might make more
>> sense
>> > > that
>> > > > way...
>> > > >
>> > > > Also, while writing that, I noticed that you have a version with and
>> > > > > without "name" for your transformation in the KIP:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header
>> > > > > and
>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>> > > > >
>> > > >
>> > > > The example
>> > > >     transforms.conditionalExtract.condition: has-header:my-header
>> > > > is a "has-header" condition (the prefix of the config value), which
>> will
>> > > > match records with a "my-header" header (given in the suffix).
>> > > >
>> > > > The other example given is:
>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
>> > > > has-header:my-header
>> > > > The root of the condition is a "not" condition (the prefix of the
>> value
>> > > for
>> > > > the transforms.conditionalExtract.condition key) of another named
>> > > condition
>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
>> other
>> > > > condition. That other condition is configured at
>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
>> condition
>> > > > is a "has-header" condition (the prefix), which will match records
>> with a
>> > > > "my-header" header (given in the suffix). So the "has-header:"
>> condition
>> > > > type would always require a suffix, as would the "not:" condition
>> type.
>> > > > Hypothetically you could have a "true" condition type (which would
>> not
>> > > > require a suffix), and the hypothetical binary conditions "and:" and
>> > > "or:"
>> > > > would require a pair of other condition names.
>> > > >
>> > > > So what's proposed is a scheme for encoding conditions where the
>> > > condition
>> > > > type is the prefix of the value of some "....condition" config key,
>> and
>> > > the
>> > > > optional suffix provides parameters for the condition. This makes
>> those
>> > > > parameters a bit inflexible, but is relatively succinct.
>> > > >
>> > > > This leads on to your first point. You're right that use cases might
>> > > appear
>> > > > which need other conditions, and we should make it flexible enough
>> to be
>> > > > able to cope with future use cases. On the other hand, I was
>> concerned
>> > > that
>> > > > we end up with something which is quite complicated to configure.
>> (There
>> > > > comes a point where it might makes more sense for the user to write
>> their
>> > > > own SMT).
>> > > >
>> > > > Just of the top of my head it might look like:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> type:has-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > > header-name:my-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > field-value:my-value
>> > > > >
>> > > >
>> > > > That won't work because the format is basically a
>> Properties/Map<String,
>> > > > String> and what you've suggested has duplicate keys.
>> > > >
>> > > > One thing I did briefly consider what the ability to treat
>> conditions as
>> > > > Configurable objects in their own right (opening up the possibility
>> of
>> > > > people supplying their own Conditions, just like they can supply
>> their
>> > > own
>> > > > SMTs). That might be configured something like this:
>> > > >
>> > > >     transforms.conditionalExtract.condition: not
>> > > >     transforms.conditionalExtract.condition.not.type: Not
>> > > >     transforms.conditionalExtract.condition.not.negated: foo
>> > > >     transforms.conditionalExtract.condition.foo.type:
>> HasHeaderWithValue
>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
>> > > >
>> > > > I didn't propose that I couldn't see the use cases to justify this
>> kind
>> > > of
>> > > > complexity, especially as the common case would surely be matching
>> > > against
>> > > > topic name (to be honest I wasn't completely convinced by the need
>> for
>> > > > "has-header"). In the current version of the KIP that's just
>> > > >
>> > > >     transforms.conditionalExtract.condition: topic-matches:
>> my-prefix-.*
>> > > >
>> > > > but using the more flexible scheme that would require something
>> more like
>> > > > this:
>> > > >
>> > > >     transforms.conditionalExtract.condition: bar
>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
>> > > >     transforms.conditionalExtract.condition.bar.pattern:
>> my-prefix-.*
>> > > >
>> > > > If people know of use cases which would justify more
>> sophistication, I'm
>> > > > happy to reconsider.
>> > > >
>> > > > Thanks again for taking a look!
>> > > >
>> > > > Tom
>> > > >
>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>> > > > <so...@opencore.com.invalid> wrote:
>> > > >
>> > > > > Hi Tom,
>> > > > >
>> > > > > sounds useful to me, thanks for the KIP!
>> > > > > The only thought that I had while reading was that this will
>> probably
>> > > > raise
>> > > > > questions about more involved conditions fairly quickly. For
>> example
>> > > the
>> > > > > "has-header" will cause an appetite for conditions like
>> > > > > "this-header-has-that-value".
>> > > > > This would necessitate two parameters to be passed into the
>> condition,
>> > > > > which I think is not currently included in the KIP. I am not
>> saying add
>> > > > > this now, but might it make sense to discuss a concept of how that
>> > > might
>> > > > > look now, to avoid potential changes later on.
>> > > > >
>> > > > > Just of the top of my head it might look like:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> type:has-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > > header-name:my-header
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > field-value:my-value
>> > > > >
>> > > > >
>> > > > > Also, while writing that, I noticed that you have a version with
>> and
>> > > > > without "name" for your transformation in the KIP:
>> > > > >
>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>> > > has-header:my-header
>> > > > > and
>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>> > > > >
>> > > > >
>> > > > > Is this intentional and the name is optional?
>> > > > >
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Sönke
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
>> wrote:
>> > > > > >
>> > > > > > Does anyone have any comments, feedback or thoughts about this?
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Tom
>> > > > > >
>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
>> tbentley@redhat.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I've opened KIP-585 which is intended to provide a mechanism
>> to
>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
>> for any
>> > > > > > > feedback:
>> > > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>> > > > > > >
>> > > > > > > Many thanks,
>> > > > > > >
>> > > > > > > Tom
>> > > > > > >
>> > > > >
>> > > >
>> > >
>>
>>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi,

Hi all,

Thanks for the discussion so far.

It seems a bit weird to me that when configuring the Conditional SMT with a
DSL you would use a concise, intuitive DSL for expressing the condition,
but not for the transforms that it's guarding. It also seems natural, if
you support this for conditionally applying SMTs, that you'd soon want to
support the same thing for a generic filter transformer. Then, if you can
configure a filter transformation using this DSL, it becomes odd that you
can't do this for mapping transformations. I think it would be a mistake to
go specifying an expression language for conditions when really that might
just be part of a language for transformations.

I think it would be possible today to write an SMT which allowed you to
express the transformation in a DSL. Concretely, it's possible to imagine a
single transformation taking a DSL something like this:

  compose(
    Flatten.Key(delimiter: ':'),
    If(condition: TopicMatches(pattern: "blah-.*"),
       then: Flatten.Value(delimiter: '/')))

All I've really done here, beyond what was already proposed, is rename
Conditional to If, imagine a couple more bits of syntax
(SMTs being constructed by class name and named argument invocation syntax
to support SMT constructors) and add a higher level compose() function for
chaining SMTs (which could easily be replaced with brace delimited blocks).

That may be a discussion we should have, but I think in _this_ KIP we
should focus on the conditional part, since the above example hopefully
shows that it would be possible to reuse it in a DSL if there was appetite
for that.

With that in mind, and since my original suggestion for an abbreviated
config syntax didn't appeal to people, I propose that we stick with the
existing norms for configuring this.
The previous example would look like this:

transformation: flattenKey,if
transformation.flattenKey.type: Flatten.Key
transformation.flattenKey.delimiter: :
transformation.if.type: If
transformation.if.condition.type: TopicMatches
transformation.if.condition.pattern: blah-.*
transformation.if.then: flattenValue
transformation.if.then.flattenValue.type: Flatten.Value
transformation.if.then.flattenValue.delimiter: /

Also, I'm inclined to think we should stick to just supporting TopicMatches
and Not, since we've not identified an actual need for 'has-header', 'or'
and 'and'.

An example of the usage of Not would look like this:

transformation: flattenKey,if
transformation.flattenKey.type: Flatten.Key
transformation.flattenKey.delimiter: :
transformation.if.type: If
transformation.if.condition.type: Not
transformation.if.condition.operand: startsWithBlah
transformation.if.condition.operand.startsWithBlah.type: TopicMatches
transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
transformation.if.then: flattenValue
transformation.if.then.flattenValue.type: Flatten.Value
transformation.if.then.flattenValue.delimiter: /

Hopefully we can agree that this allows the conditional SMT part to make
progress without getting bogged down.

Thoughts? If this seems broadly reasonable, I'll update the KIP.

Kind regards,

Tom

On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gu...@hibernate.org> wrote:

> Hi all,
>
> Thanks a lot for this initiative, Tom!
>
> To shed some light, the use case where this first came up, were issues
> we saw with SMTs being applied to the different topics produced by the
> Debezium change data capture connectors. There are different kinds of
> topics (for change data, schema history, heartbeats etc.) and the
> record structure to expect may vary between those. Hence we saw issues
> with SMTs like ExtractField, which for instance only should be applied
> to all change data topics but not the other ones.
>
> I like the overall approach; for Debezium's purposes, the simple topic
> matching and negation operators would be sufficient already. I agree
> with Chris and would prefer one single condition attribute, which
> contains a single condition or potentially a logical expression with
> not, and, etc. I think it's less ambiguous, in particular when it
> comes to ordering of the different conditions and determining their
> precedence.
>
> Would love to see this feature in one or another way in Connect.
>
> Best,
>
> --Gunnar
>
>
>
> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <tbentley@redhat.com
> >:
> >
> > Hi Chris and Sönke,
> >
> > Using the numbering from Chris's email...
> >
> > 1. That's a good point, I'll see what is needed to make that work.
> >
> > 2. I'm happy enough to add support for "and" and "or" as part of this KIP
> > if people can see a need for it.
> >
> > In a similar vein, I was wondering about whether it would be worthwhile
> > having the equivalent of an "else" clause (what's in the KIP is basically
> > an "if" statement). Without support for "else" I think people would often
> > need two conditionals, with the condition of one being the negation of
> the
> > condition of another.
> >
> > 3. I can see the attraction of an expression language. The pros include
> > being terse and familiar to programmers and potentially very flexible if
> > that's needed in the future. I had a play and implemented it using ANTLR
> > and it's not difficult to write a grammar and implement the functions
> we've
> > already discussed and get decent error messages when the expression is
> > malformed. So on the one hand I quite like the idea. On the other hand it
> > feels like overkill for the use cases that have actually been identified
> so
> > far.
> >
> > @Sönke what do you make of the expression language idea?
> >
> > Kind regards,
> >
> > Tom
> >
> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <ch...@confluent.io>
> > wrote:
> >
> > > Hi Tom,
> > >
> > > This looks great and I'd love to see the out-of-the-box SMTs become
> even
> > > more powerful with the improvements you've proposed! Got a few remarks
> and
> > > would be interested in your thoughts:
> > >
> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
> method,
> > > what would you think about adopting a similar approach as the framework
> > > uses with connectors, by adding a "Config validate(Map<String, String>
> > > props)" method that can perform custom validation outside of what can
> be
> > > performed by the ConfigDef's single-property-at-a-time validation? It
> may
> > > be a little heavyweight for use with this particular SMT, but it'd
> provide
> > > more flexibility for other SMT implementations and would mirror an API
> that
> > > developers targeting the framework are likely already familiar with.
> > > 2. The possibility for adding the logical operators "and" and "or" is
> > > mentioned, but only as a potential future change and not one proposed
> by
> > > this KIP. Why not include those operators sooner rather than later?
> > > 3. The syntax for named conditions that are then referenced in logical
> > > operators is... tricky. It took me a few attempts to grok the example
> > > provided in the KIP after reading Sönke's question about the example
> for
> > > negation. What about a more sophisticated but less verbose syntax that
> > > supports a single configuration for the condition, even with logical
> > > operators? I'm thinking something like
> > > "transforms.conditionalExtract.condition: not(has-header:my-header)"
> > > instead of the "transforms.conditionalExtract.condition:
> not:hasMyHeader"
> > > and "transforms.conditionalExtract.condition.hasMyHeader:
> > > has-header:my-header" properties. If support for a logical "and" is
> added,
> > > this could then be expanded to something like
> > > "transforms.conditionalExtract.condition: and(has-header(my-header),
> > > not(topic-matches(my-prefix-.*)))". There would be additional
> complexity
> > > here with the need to escape parentheses and commas that are intended
> to be
> > > treated literally (as part of a header name, for example) instead of as
> > > part of the syntax for the condition itself, but a little additional
> > > complexity for edge cases like that may be warranted if it heavily
> reduces
> > > complexity for the common cases. The rationale for the proposed
> > > parentheses-based syntax here instead of what's mentioned in the KIP
> > > (something like "and: <condition1>, <condition2>") is to help with
> > > readability; we probably wouldn't need that with the approach of naming
> > > conditions via separate properties, but things may get a little nasty
> with
> > > literal conditions included there, especially with the possibility of
> > > nesting. Finally, the shift in syntax here should make it easier to
> handle
> > > cases like the header value matching brought up by Sönke; it might look
> > > something like "header-matches(header-name, header-value-pattern)".
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com>
> wrote:
> > >
> > > > Hi Sönke,
> > > >
> > > > Thanks for taking a look.
> > > >
> > > > Let me answer in reverse order, since I think it might make more
> sense
> > > that
> > > > way...
> > > >
> > > > Also, while writing that, I noticed that you have a version with and
> > > > > without "name" for your transformation in the KIP:
> > > > >
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > has-header:my-header
> > > > > and
> > > > > transforms.conditionalExtract.condition: has-header:my-header
> > > > >
> > > >
> > > > The example
> > > >     transforms.conditionalExtract.condition: has-header:my-header
> > > > is a "has-header" condition (the prefix of the config value), which
> will
> > > > match records with a "my-header" header (given in the suffix).
> > > >
> > > > The other example given is:
> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
> > > >     transforms.conditionalExtract.condition.hasMyHeader:
> > > > has-header:my-header
> > > > The root of the condition is a "not" condition (the prefix of the
> value
> > > for
> > > > the transforms.conditionalExtract.condition key) of another named
> > > condition
> > > > called "hasMyHeader" (the suffix). Any name could be used for the
> other
> > > > condition. That other condition is configured at
> > > > "transforms.conditionalExtract.condition.<conditionName>". That
> condition
> > > > is a "has-header" condition (the prefix), which will match records
> with a
> > > > "my-header" header (given in the suffix). So the "has-header:"
> condition
> > > > type would always require a suffix, as would the "not:" condition
> type.
> > > > Hypothetically you could have a "true" condition type (which would
> not
> > > > require a suffix), and the hypothetical binary conditions "and:" and
> > > "or:"
> > > > would require a pair of other condition names.
> > > >
> > > > So what's proposed is a scheme for encoding conditions where the
> > > condition
> > > > type is the prefix of the value of some "....condition" config key,
> and
> > > the
> > > > optional suffix provides parameters for the condition. This makes
> those
> > > > parameters a bit inflexible, but is relatively succinct.
> > > >
> > > > This leads on to your first point. You're right that use cases might
> > > appear
> > > > which need other conditions, and we should make it flexible enough
> to be
> > > > able to cope with future use cases. On the other hand, I was
> concerned
> > > that
> > > > we end up with something which is quite complicated to configure.
> (There
> > > > comes a point where it might makes more sense for the user to write
> their
> > > > own SMT).
> > > >
> > > > Just of the top of my head it might look like:
> > > > >
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> type:has-header
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > > header-name:my-header
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > field-value:my-value
> > > > >
> > > >
> > > > That won't work because the format is basically a
> Properties/Map<String,
> > > > String> and what you've suggested has duplicate keys.
> > > >
> > > > One thing I did briefly consider what the ability to treat
> conditions as
> > > > Configurable objects in their own right (opening up the possibility
> of
> > > > people supplying their own Conditions, just like they can supply
> their
> > > own
> > > > SMTs). That might be configured something like this:
> > > >
> > > >     transforms.conditionalExtract.condition: not
> > > >     transforms.conditionalExtract.condition.not.type: Not
> > > >     transforms.conditionalExtract.condition.not.negated: foo
> > > >     transforms.conditionalExtract.condition.foo.type:
> HasHeaderWithValue
> > > >     transforms.conditionalExtract.condition.foo.header: my-header
> > > >     transforms.conditionalExtract.condition.foo.value: my-value
> > > >
> > > > I didn't propose that I couldn't see the use cases to justify this
> kind
> > > of
> > > > complexity, especially as the common case would surely be matching
> > > against
> > > > topic name (to be honest I wasn't completely convinced by the need
> for
> > > > "has-header"). In the current version of the KIP that's just
> > > >
> > > >     transforms.conditionalExtract.condition: topic-matches:
> my-prefix-.*
> > > >
> > > > but using the more flexible scheme that would require something more
> like
> > > > this:
> > > >
> > > >     transforms.conditionalExtract.condition: bar
> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> > > >     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
> > > >
> > > > If people know of use cases which would justify more sophistication,
> I'm
> > > > happy to reconsider.
> > > >
> > > > Thanks again for taking a look!
> > > >
> > > > Tom
> > > >
> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > > > <so...@opencore.com.invalid> wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > sounds useful to me, thanks for the KIP!
> > > > > The only thought that I had while reading was that this will
> probably
> > > > raise
> > > > > questions about more involved conditions fairly quickly. For
> example
> > > the
> > > > > "has-header" will cause an appetite for conditions like
> > > > > "this-header-has-that-value".
> > > > > This would necessitate two parameters to be passed into the
> condition,
> > > > > which I think is not currently included in the KIP. I am not
> saying add
> > > > > this now, but might it make sense to discuss a concept of how that
> > > might
> > > > > look now, to avoid potential changes later on.
> > > > >
> > > > > Just of the top of my head it might look like:
> > > > >
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> type:has-header
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > > header-name:my-header
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > field-value:my-value
> > > > >
> > > > >
> > > > > Also, while writing that, I noticed that you have a version with
> and
> > > > > without "name" for your transformation in the KIP:
> > > > >
> > > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > has-header:my-header
> > > > > and
> > > > > transforms.conditionalExtract.condition: has-header:my-header
> > > > >
> > > > >
> > > > > Is this intentional and the name is optional?
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Sönke
> > > > >
> > > > >
> > > > >
> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com>
> wrote:
> > > > > >
> > > > > > Does anyone have any comments, feedback or thoughts about this?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
> tbentley@redhat.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for
> any
> > > > > > > feedback:
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > > > > >
> > > > > > > Many thanks,
> > > > > > >
> > > > > > > Tom
> > > > > > >
> > > > >
> > > >
> > >
>
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Gunnar Morling <gu...@hibernate.org>.
Hi all,

Thanks a lot for this initiative, Tom!

To shed some light, the use case where this first came up, were issues
we saw with SMTs being applied to the different topics produced by the
Debezium change data capture connectors. There are different kinds of
topics (for change data, schema history, heartbeats etc.) and the
record structure to expect may vary between those. Hence we saw issues
with SMTs like ExtractField, which for instance only should be applied
to all change data topics but not the other ones.

I like the overall approach; for Debezium's purposes, the simple topic
matching and negation operators would be sufficient already. I agree
with Chris and would prefer one single condition attribute, which
contains a single condition or potentially a logical expression with
not, and, etc. I think it's less ambiguous, in particular when it
comes to ordering of the different conditions and determining their
precedence.

Would love to see this feature in one or another way in Connect.

Best,

--Gunnar



Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <tb...@redhat.com>:
>
> Hi Chris and Sönke,
>
> Using the numbering from Chris's email...
>
> 1. That's a good point, I'll see what is needed to make that work.
>
> 2. I'm happy enough to add support for "and" and "or" as part of this KIP
> if people can see a need for it.
>
> In a similar vein, I was wondering about whether it would be worthwhile
> having the equivalent of an "else" clause (what's in the KIP is basically
> an "if" statement). Without support for "else" I think people would often
> need two conditionals, with the condition of one being the negation of the
> condition of another.
>
> 3. I can see the attraction of an expression language. The pros include
> being terse and familiar to programmers and potentially very flexible if
> that's needed in the future. I had a play and implemented it using ANTLR
> and it's not difficult to write a grammar and implement the functions we've
> already discussed and get decent error messages when the expression is
> malformed. So on the one hand I quite like the idea. On the other hand it
> feels like overkill for the use cases that have actually been identified so
> far.
>
> @Sönke what do you make of the expression language idea?
>
> Kind regards,
>
> Tom
>
> On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <ch...@confluent.io>
> wrote:
>
> > Hi Tom,
> >
> > This looks great and I'd love to see the out-of-the-box SMTs become even
> > more powerful with the improvements you've proposed! Got a few remarks and
> > would be interested in your thoughts:
> >
> > 1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
> > what would you think about adopting a similar approach as the framework
> > uses with connectors, by adding a "Config validate(Map<String, String>
> > props)" method that can perform custom validation outside of what can be
> > performed by the ConfigDef's single-property-at-a-time validation? It may
> > be a little heavyweight for use with this particular SMT, but it'd provide
> > more flexibility for other SMT implementations and would mirror an API that
> > developers targeting the framework are likely already familiar with.
> > 2. The possibility for adding the logical operators "and" and "or" is
> > mentioned, but only as a potential future change and not one proposed by
> > this KIP. Why not include those operators sooner rather than later?
> > 3. The syntax for named conditions that are then referenced in logical
> > operators is... tricky. It took me a few attempts to grok the example
> > provided in the KIP after reading Sönke's question about the example for
> > negation. What about a more sophisticated but less verbose syntax that
> > supports a single configuration for the condition, even with logical
> > operators? I'm thinking something like
> > "transforms.conditionalExtract.condition: not(has-header:my-header)"
> > instead of the "transforms.conditionalExtract.condition: not:hasMyHeader"
> > and "transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header" properties. If support for a logical "and" is added,
> > this could then be expanded to something like
> > "transforms.conditionalExtract.condition: and(has-header(my-header),
> > not(topic-matches(my-prefix-.*)))". There would be additional complexity
> > here with the need to escape parentheses and commas that are intended to be
> > treated literally (as part of a header name, for example) instead of as
> > part of the syntax for the condition itself, but a little additional
> > complexity for edge cases like that may be warranted if it heavily reduces
> > complexity for the common cases. The rationale for the proposed
> > parentheses-based syntax here instead of what's mentioned in the KIP
> > (something like "and: <condition1>, <condition2>") is to help with
> > readability; we probably wouldn't need that with the approach of naming
> > conditions via separate properties, but things may get a little nasty with
> > literal conditions included there, especially with the possibility of
> > nesting. Finally, the shift in syntax here should make it easier to handle
> > cases like the header value matching brought up by Sönke; it might look
> > something like "header-matches(header-name, header-value-pattern)".
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Hi Sönke,
> > >
> > > Thanks for taking a look.
> > >
> > > Let me answer in reverse order, since I think it might make more sense
> > that
> > > way...
> > >
> > > Also, while writing that, I noticed that you have a version with and
> > > > without "name" for your transformation in the KIP:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > > > and
> > > > transforms.conditionalExtract.condition: has-header:my-header
> > > >
> > >
> > > The example
> > >     transforms.conditionalExtract.condition: has-header:my-header
> > > is a "has-header" condition (the prefix of the config value), which will
> > > match records with a "my-header" header (given in the suffix).
> > >
> > > The other example given is:
> > >     transforms.conditionalExtract.condition: not:hasMyHeader
> > >     transforms.conditionalExtract.condition.hasMyHeader:
> > > has-header:my-header
> > > The root of the condition is a "not" condition (the prefix of the value
> > for
> > > the transforms.conditionalExtract.condition key) of another named
> > condition
> > > called "hasMyHeader" (the suffix). Any name could be used for the other
> > > condition. That other condition is configured at
> > > "transforms.conditionalExtract.condition.<conditionName>". That condition
> > > is a "has-header" condition (the prefix), which will match records with a
> > > "my-header" header (given in the suffix). So the "has-header:" condition
> > > type would always require a suffix, as would the "not:" condition type.
> > > Hypothetically you could have a "true" condition type (which would not
> > > require a suffix), and the hypothetical binary conditions "and:" and
> > "or:"
> > > would require a pair of other condition names.
> > >
> > > So what's proposed is a scheme for encoding conditions where the
> > condition
> > > type is the prefix of the value of some "....condition" config key, and
> > the
> > > optional suffix provides parameters for the condition. This makes those
> > > parameters a bit inflexible, but is relatively succinct.
> > >
> > > This leads on to your first point. You're right that use cases might
> > appear
> > > which need other conditions, and we should make it flexible enough to be
> > > able to cope with future use cases. On the other hand, I was concerned
> > that
> > > we end up with something which is quite complicated to configure. (There
> > > comes a point where it might makes more sense for the user to write their
> > > own SMT).
> > >
> > > Just of the top of my head it might look like:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > header-name:my-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > field-value:my-value
> > > >
> > >
> > > That won't work because the format is basically a Properties/Map<String,
> > > String> and what you've suggested has duplicate keys.
> > >
> > > One thing I did briefly consider what the ability to treat conditions as
> > > Configurable objects in their own right (opening up the possibility of
> > > people supplying their own Conditions, just like they can supply their
> > own
> > > SMTs). That might be configured something like this:
> > >
> > >     transforms.conditionalExtract.condition: not
> > >     transforms.conditionalExtract.condition.not.type: Not
> > >     transforms.conditionalExtract.condition.not.negated: foo
> > >     transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
> > >     transforms.conditionalExtract.condition.foo.header: my-header
> > >     transforms.conditionalExtract.condition.foo.value: my-value
> > >
> > > I didn't propose that I couldn't see the use cases to justify this kind
> > of
> > > complexity, especially as the common case would surely be matching
> > against
> > > topic name (to be honest I wasn't completely convinced by the need for
> > > "has-header"). In the current version of the KIP that's just
> > >
> > >     transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
> > >
> > > but using the more flexible scheme that would require something more like
> > > this:
> > >
> > >     transforms.conditionalExtract.condition: bar
> > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> > >     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
> > >
> > > If people know of use cases which would justify more sophistication, I'm
> > > happy to reconsider.
> > >
> > > Thanks again for taking a look!
> > >
> > > Tom
> > >
> > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > > <so...@opencore.com.invalid> wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > sounds useful to me, thanks for the KIP!
> > > > The only thought that I had while reading was that this will probably
> > > raise
> > > > questions about more involved conditions fairly quickly. For example
> > the
> > > > "has-header" will cause an appetite for conditions like
> > > > "this-header-has-that-value".
> > > > This would necessitate two parameters to be passed into the condition,
> > > > which I think is not currently included in the KIP. I am not saying add
> > > > this now, but might it make sense to discuss a concept of how that
> > might
> > > > look now, to avoid potential changes later on.
> > > >
> > > > Just of the top of my head it might look like:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > header-name:my-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > field-value:my-value
> > > >
> > > >
> > > > Also, while writing that, I noticed that you have a version with and
> > > > without "name" for your transformation in the KIP:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > > > and
> > > > transforms.conditionalExtract.condition: has-header:my-header
> > > >
> > > >
> > > > Is this intentional and the name is optional?
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Sönke
> > > >
> > > >
> > > >
> > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
> > > > >
> > > > > Does anyone have any comments, feedback or thoughts about this?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > > > feedback:
> > > > > >
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > > > >
> > > > > > Many thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Chris and Sönke,

Using the numbering from Chris's email...

1. That's a good point, I'll see what is needed to make that work.

2. I'm happy enough to add support for "and" and "or" as part of this KIP
if people can see a need for it.

In a similar vein, I was wondering about whether it would be worthwhile
having the equivalent of an "else" clause (what's in the KIP is basically
an "if" statement). Without support for "else" I think people would often
need two conditionals, with the condition of one being the negation of the
condition of another.

3. I can see the attraction of an expression language. The pros include
being terse and familiar to programmers and potentially very flexible if
that's needed in the future. I had a play and implemented it using ANTLR
and it's not difficult to write a grammar and implement the functions we've
already discussed and get decent error messages when the expression is
malformed. So on the one hand I quite like the idea. On the other hand it
feels like overkill for the use cases that have actually been identified so
far.

@Sönke what do you make of the expression language idea?

Kind regards,

Tom

On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <ch...@confluent.io>
wrote:

> Hi Tom,
>
> This looks great and I'd love to see the out-of-the-box SMTs become even
> more powerful with the improvements you've proposed! Got a few remarks and
> would be interested in your thoughts:
>
> 1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
> what would you think about adopting a similar approach as the framework
> uses with connectors, by adding a "Config validate(Map<String, String>
> props)" method that can perform custom validation outside of what can be
> performed by the ConfigDef's single-property-at-a-time validation? It may
> be a little heavyweight for use with this particular SMT, but it'd provide
> more flexibility for other SMT implementations and would mirror an API that
> developers targeting the framework are likely already familiar with.
> 2. The possibility for adding the logical operators "and" and "or" is
> mentioned, but only as a potential future change and not one proposed by
> this KIP. Why not include those operators sooner rather than later?
> 3. The syntax for named conditions that are then referenced in logical
> operators is... tricky. It took me a few attempts to grok the example
> provided in the KIP after reading Sönke's question about the example for
> negation. What about a more sophisticated but less verbose syntax that
> supports a single configuration for the condition, even with logical
> operators? I'm thinking something like
> "transforms.conditionalExtract.condition: not(has-header:my-header)"
> instead of the "transforms.conditionalExtract.condition: not:hasMyHeader"
> and "transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header" properties. If support for a logical "and" is added,
> this could then be expanded to something like
> "transforms.conditionalExtract.condition: and(has-header(my-header),
> not(topic-matches(my-prefix-.*)))". There would be additional complexity
> here with the need to escape parentheses and commas that are intended to be
> treated literally (as part of a header name, for example) instead of as
> part of the syntax for the condition itself, but a little additional
> complexity for edge cases like that may be warranted if it heavily reduces
> complexity for the common cases. The rationale for the proposed
> parentheses-based syntax here instead of what's mentioned in the KIP
> (something like "and: <condition1>, <condition2>") is to help with
> readability; we probably wouldn't need that with the approach of naming
> conditions via separate properties, but things may get a little nasty with
> literal conditions included there, especially with the possibility of
> nesting. Finally, the shift in syntax here should make it easier to handle
> cases like the header value matching brought up by Sönke; it might look
> something like "header-matches(header-name, header-value-pattern)".
>
> Cheers,
>
> Chris
>
> On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi Sönke,
> >
> > Thanks for taking a look.
> >
> > Let me answer in reverse order, since I think it might make more sense
> that
> > way...
> >
> > Also, while writing that, I noticed that you have a version with and
> > > without "name" for your transformation in the KIP:
> > >
> > > transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header
> > > and
> > > transforms.conditionalExtract.condition: has-header:my-header
> > >
> >
> > The example
> >     transforms.conditionalExtract.condition: has-header:my-header
> > is a "has-header" condition (the prefix of the config value), which will
> > match records with a "my-header" header (given in the suffix).
> >
> > The other example given is:
> >     transforms.conditionalExtract.condition: not:hasMyHeader
> >     transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > The root of the condition is a "not" condition (the prefix of the value
> for
> > the transforms.conditionalExtract.condition key) of another named
> condition
> > called "hasMyHeader" (the suffix). Any name could be used for the other
> > condition. That other condition is configured at
> > "transforms.conditionalExtract.condition.<conditionName>". That condition
> > is a "has-header" condition (the prefix), which will match records with a
> > "my-header" header (given in the suffix). So the "has-header:" condition
> > type would always require a suffix, as would the "not:" condition type.
> > Hypothetically you could have a "true" condition type (which would not
> > require a suffix), and the hypothetical binary conditions "and:" and
> "or:"
> > would require a pair of other condition names.
> >
> > So what's proposed is a scheme for encoding conditions where the
> condition
> > type is the prefix of the value of some "....condition" config key, and
> the
> > optional suffix provides parameters for the condition. This makes those
> > parameters a bit inflexible, but is relatively succinct.
> >
> > This leads on to your first point. You're right that use cases might
> appear
> > which need other conditions, and we should make it flexible enough to be
> > able to cope with future use cases. On the other hand, I was concerned
> that
> > we end up with something which is quite complicated to configure. (There
> > comes a point where it might makes more sense for the user to write their
> > own SMT).
> >
> > Just of the top of my head it might look like:
> > >
> > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > transforms.conditionalExtract.condition.hasMyHeader:
> > header-name:my-header
> > > transforms.conditionalExtract.condition.hasMyHeader:
> field-value:my-value
> > >
> >
> > That won't work because the format is basically a Properties/Map<String,
> > String> and what you've suggested has duplicate keys.
> >
> > One thing I did briefly consider what the ability to treat conditions as
> > Configurable objects in their own right (opening up the possibility of
> > people supplying their own Conditions, just like they can supply their
> own
> > SMTs). That might be configured something like this:
> >
> >     transforms.conditionalExtract.condition: not
> >     transforms.conditionalExtract.condition.not.type: Not
> >     transforms.conditionalExtract.condition.not.negated: foo
> >     transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
> >     transforms.conditionalExtract.condition.foo.header: my-header
> >     transforms.conditionalExtract.condition.foo.value: my-value
> >
> > I didn't propose that I couldn't see the use cases to justify this kind
> of
> > complexity, especially as the common case would surely be matching
> against
> > topic name (to be honest I wasn't completely convinced by the need for
> > "has-header"). In the current version of the KIP that's just
> >
> >     transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
> >
> > but using the more flexible scheme that would require something more like
> > this:
> >
> >     transforms.conditionalExtract.condition: bar
> >     transforms.conditionalExtract.condition.bar.type: TopicMatch
> >     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
> >
> > If people know of use cases which would justify more sophistication, I'm
> > happy to reconsider.
> >
> > Thanks again for taking a look!
> >
> > Tom
> >
> > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > <so...@opencore.com.invalid> wrote:
> >
> > > Hi Tom,
> > >
> > > sounds useful to me, thanks for the KIP!
> > > The only thought that I had while reading was that this will probably
> > raise
> > > questions about more involved conditions fairly quickly. For example
> the
> > > "has-header" will cause an appetite for conditions like
> > > "this-header-has-that-value".
> > > This would necessitate two parameters to be passed into the condition,
> > > which I think is not currently included in the KIP. I am not saying add
> > > this now, but might it make sense to discuss a concept of how that
> might
> > > look now, to avoid potential changes later on.
> > >
> > > Just of the top of my head it might look like:
> > >
> > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > transforms.conditionalExtract.condition.hasMyHeader:
> > header-name:my-header
> > > transforms.conditionalExtract.condition.hasMyHeader:
> field-value:my-value
> > >
> > >
> > > Also, while writing that, I noticed that you have a version with and
> > > without "name" for your transformation in the KIP:
> > >
> > > transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header
> > > and
> > > transforms.conditionalExtract.condition: has-header:my-header
> > >
> > >
> > > Is this intentional and the name is optional?
> > >
> > >
> > > Best regards,
> > >
> > > Sönke
> > >
> > >
> > >
> > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
> > > >
> > > > Does anyone have any comments, feedback or thoughts about this?
> > > >
> > > > Thanks,
> > > >
> > > > Tom
> > > >
> > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > > feedback:
> > > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > > >
> > > > > Many thanks,
> > > > >
> > > > > Tom
> > > > >
> > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Christopher Egerton <ch...@confluent.io>.
Hi Tom,

This looks great and I'd love to see the out-of-the-box SMTs become even
more powerful with the improvements you've proposed! Got a few remarks and
would be interested in your thoughts:

1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
what would you think about adopting a similar approach as the framework
uses with connectors, by adding a "Config validate(Map<String, String>
props)" method that can perform custom validation outside of what can be
performed by the ConfigDef's single-property-at-a-time validation? It may
be a little heavyweight for use with this particular SMT, but it'd provide
more flexibility for other SMT implementations and would mirror an API that
developers targeting the framework are likely already familiar with.
2. The possibility for adding the logical operators "and" and "or" is
mentioned, but only as a potential future change and not one proposed by
this KIP. Why not include those operators sooner rather than later?
3. The syntax for named conditions that are then referenced in logical
operators is... tricky. It took me a few attempts to grok the example
provided in the KIP after reading Sönke's question about the example for
negation. What about a more sophisticated but less verbose syntax that
supports a single configuration for the condition, even with logical
operators? I'm thinking something like
"transforms.conditionalExtract.condition: not(has-header:my-header)"
instead of the "transforms.conditionalExtract.condition: not:hasMyHeader"
and "transforms.conditionalExtract.condition.hasMyHeader:
has-header:my-header" properties. If support for a logical "and" is added,
this could then be expanded to something like
"transforms.conditionalExtract.condition: and(has-header(my-header),
not(topic-matches(my-prefix-.*)))". There would be additional complexity
here with the need to escape parentheses and commas that are intended to be
treated literally (as part of a header name, for example) instead of as
part of the syntax for the condition itself, but a little additional
complexity for edge cases like that may be warranted if it heavily reduces
complexity for the common cases. The rationale for the proposed
parentheses-based syntax here instead of what's mentioned in the KIP
(something like "and: <condition1>, <condition2>") is to help with
readability; we probably wouldn't need that with the approach of naming
conditions via separate properties, but things may get a little nasty with
literal conditions included there, especially with the possibility of
nesting. Finally, the shift in syntax here should make it easier to handle
cases like the header value matching brought up by Sönke; it might look
something like "header-matches(header-name, header-value-pattern)".

Cheers,

Chris

On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Sönke,
>
> Thanks for taking a look.
>
> Let me answer in reverse order, since I think it might make more sense that
> way...
>
> Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
>
> The example
>     transforms.conditionalExtract.condition: has-header:my-header
> is a "has-header" condition (the prefix of the config value), which will
> match records with a "my-header" header (given in the suffix).
>
> The other example given is:
>     transforms.conditionalExtract.condition: not:hasMyHeader
>     transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header
> The root of the condition is a "not" condition (the prefix of the value for
> the transforms.conditionalExtract.condition key) of another named condition
> called "hasMyHeader" (the suffix). Any name could be used for the other
> condition. That other condition is configured at
> "transforms.conditionalExtract.condition.<conditionName>". That condition
> is a "has-header" condition (the prefix), which will match records with a
> "my-header" header (given in the suffix). So the "has-header:" condition
> type would always require a suffix, as would the "not:" condition type.
> Hypothetically you could have a "true" condition type (which would not
> require a suffix), and the hypothetical binary conditions "and:" and "or:"
> would require a pair of other condition names.
>
> So what's proposed is a scheme for encoding conditions where the condition
> type is the prefix of the value of some "....condition" config key, and the
> optional suffix provides parameters for the condition. This makes those
> parameters a bit inflexible, but is relatively succinct.
>
> This leads on to your first point. You're right that use cases might appear
> which need other conditions, and we should make it flexible enough to be
> able to cope with future use cases. On the other hand, I was concerned that
> we end up with something which is quite complicated to configure. (There
> comes a point where it might makes more sense for the user to write their
> own SMT).
>
> Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
>
> That won't work because the format is basically a Properties/Map<String,
> String> and what you've suggested has duplicate keys.
>
> One thing I did briefly consider what the ability to treat conditions as
> Configurable objects in their own right (opening up the possibility of
> people supplying their own Conditions, just like they can supply their own
> SMTs). That might be configured something like this:
>
>     transforms.conditionalExtract.condition: not
>     transforms.conditionalExtract.condition.not.type: Not
>     transforms.conditionalExtract.condition.not.negated: foo
>     transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
>     transforms.conditionalExtract.condition.foo.header: my-header
>     transforms.conditionalExtract.condition.foo.value: my-value
>
> I didn't propose that I couldn't see the use cases to justify this kind of
> complexity, especially as the common case would surely be matching against
> topic name (to be honest I wasn't completely convinced by the need for
> "has-header"). In the current version of the KIP that's just
>
>     transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
>
> but using the more flexible scheme that would require something more like
> this:
>
>     transforms.conditionalExtract.condition: bar
>     transforms.conditionalExtract.condition.bar.type: TopicMatch
>     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
>
> If people know of use cases which would justify more sophistication, I'm
> happy to reconsider.
>
> Thanks again for taking a look!
>
> Tom
>
> On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> <so...@opencore.com.invalid> wrote:
>
> > Hi Tom,
> >
> > sounds useful to me, thanks for the KIP!
> > The only thought that I had while reading was that this will probably
> raise
> > questions about more involved conditions fairly quickly. For example the
> > "has-header" will cause an appetite for conditions like
> > "this-header-has-that-value".
> > This would necessitate two parameters to be passed into the condition,
> > which I think is not currently included in the KIP. I am not saying add
> > this now, but might it make sense to discuss a concept of how that might
> > look now, to avoid potential changes later on.
> >
> > Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
> >
> > Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
> >
> > Is this intentional and the name is optional?
> >
> >
> > Best regards,
> >
> > Sönke
> >
> >
> >
> > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
> > >
> > > Does anyone have any comments, feedback or thoughts about this?
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > feedback:
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > >
> > > > Many thanks,
> > > >
> > > > Tom
> > > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Sönke Liebau <so...@opencore.com.INVALID>.
Hi Tom,

thanks for your detailed explanation!
I may have mis-phrased my mail a little bit regarding the definition of
conditions and caused you extra effort with that, sorry! I understood the
not condition and its relationship to the has-header condition, I was just
wondering whether the "<conditionName>" in
"transforms.conditionalExtract.condition.<conditionName>" is really
optional or if you just forgot to add it to the other example.

Regarding the topic of more complex transformations I am torn to be honest.
When I am with customers I usually tell them about Connect being a fairly
simple tool to move data from/to Kafka. In the next sentence I'll usually
tell them that they'll probably reach its limits very quickly if they start
using it for more involved jobs and that there are tools around that are
much better suited for that.
So unless we want to recreate something like Nifi, I think you are probably
right in keeping it simple. This would probably encourage the consideration
of switching to another tool earlier rather than later, before too much
effort has been sunk into an implementation.

Summing it up, I am happy for you to proceed in any way you deem useful and
I'll stay out of your hair :)

Best regards,
Sönke


On Wed, 1 Apr 2020 at 18:00, Tom Bentley <tb...@redhat.com> wrote:

> Hi Sönke,
>
> Thanks for taking a look.
>
> Let me answer in reverse order, since I think it might make more sense that
> way...
>
> Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
>
> The example
>     transforms.conditionalExtract.condition: has-header:my-header
> is a "has-header" condition (the prefix of the config value), which will
> match records with a "my-header" header (given in the suffix).
>
> The other example given is:
>     transforms.conditionalExtract.condition: not:hasMyHeader
>     transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header
> The root of the condition is a "not" condition (the prefix of the value for
> the transforms.conditionalExtract.condition key) of another named condition
> called "hasMyHeader" (the suffix). Any name could be used for the other
> condition. That other condition is configured at
> "transforms.conditionalExtract.condition.<conditionName>". That condition
> is a "has-header" condition (the prefix), which will match records with a
> "my-header" header (given in the suffix). So the "has-header:" condition
> type would always require a suffix, as would the "not:" condition type.
> Hypothetically you could have a "true" condition type (which would not
> require a suffix), and the hypothetical binary conditions "and:" and "or:"
> would require a pair of other condition names.
>
> So what's proposed is a scheme for encoding conditions where the condition
> type is the prefix of the value of some "....condition" config key, and the
> optional suffix provides parameters for the condition. This makes those
> parameters a bit inflexible, but is relatively succinct.
>
> This leads on to your first point. You're right that use cases might appear
> which need other conditions, and we should make it flexible enough to be
> able to cope with future use cases. On the other hand, I was concerned that
> we end up with something which is quite complicated to configure. (There
> comes a point where it might makes more sense for the user to write their
> own SMT).
>
> Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
>
> That won't work because the format is basically a Properties/Map<String,
> String> and what you've suggested has duplicate keys.
>
> One thing I did briefly consider what the ability to treat conditions as
> Configurable objects in their own right (opening up the possibility of
> people supplying their own Conditions, just like they can supply their own
> SMTs). That might be configured something like this:
>
>     transforms.conditionalExtract.condition: not
>     transforms.conditionalExtract.condition.not.type: Not
>     transforms.conditionalExtract.condition.not.negated: foo
>     transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
>     transforms.conditionalExtract.condition.foo.header: my-header
>     transforms.conditionalExtract.condition.foo.value: my-value
>
> I didn't propose that I couldn't see the use cases to justify this kind of
> complexity, especially as the common case would surely be matching against
> topic name (to be honest I wasn't completely convinced by the need for
> "has-header"). In the current version of the KIP that's just
>
>     transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
>
> but using the more flexible scheme that would require something more like
> this:
>
>     transforms.conditionalExtract.condition: bar
>     transforms.conditionalExtract.condition.bar.type: TopicMatch
>     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
>
> If people know of use cases which would justify more sophistication, I'm
> happy to reconsider.
>
> Thanks again for taking a look!
>
> Tom
>
> On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> <so...@opencore.com.invalid> wrote:
>
> > Hi Tom,
> >
> > sounds useful to me, thanks for the KIP!
> > The only thought that I had while reading was that this will probably
> raise
> > questions about more involved conditions fairly quickly. For example the
> > "has-header" will cause an appetite for conditions like
> > "this-header-has-that-value".
> > This would necessitate two parameters to be passed into the condition,
> > which I think is not currently included in the KIP. I am not saying add
> > this now, but might it make sense to discuss a concept of how that might
> > look now, to avoid potential changes later on.
> >
> > Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
> >
> > Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
> >
> > Is this intentional and the name is optional?
> >
> >
> > Best regards,
> >
> > Sönke
> >
> >
> >
> > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
> > >
> > > Does anyone have any comments, feedback or thoughts about this?
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > feedback:
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > >
> > > > Many thanks,
> > > >
> > > > Tom
> > > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Hi Sönke,

Thanks for taking a look.

Let me answer in reverse order, since I think it might make more sense that
way...

Also, while writing that, I noticed that you have a version with and
> without "name" for your transformation in the KIP:
>
> transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> and
> transforms.conditionalExtract.condition: has-header:my-header
>

The example
    transforms.conditionalExtract.condition: has-header:my-header
is a "has-header" condition (the prefix of the config value), which will
match records with a "my-header" header (given in the suffix).

The other example given is:
    transforms.conditionalExtract.condition: not:hasMyHeader
    transforms.conditionalExtract.condition.hasMyHeader:
has-header:my-header
The root of the condition is a "not" condition (the prefix of the value for
the transforms.conditionalExtract.condition key) of another named condition
called "hasMyHeader" (the suffix). Any name could be used for the other
condition. That other condition is configured at
"transforms.conditionalExtract.condition.<conditionName>". That condition
is a "has-header" condition (the prefix), which will match records with a
"my-header" header (given in the suffix). So the "has-header:" condition
type would always require a suffix, as would the "not:" condition type.
Hypothetically you could have a "true" condition type (which would not
require a suffix), and the hypothetical binary conditions "and:" and "or:"
would require a pair of other condition names.

So what's proposed is a scheme for encoding conditions where the condition
type is the prefix of the value of some "....condition" config key, and the
optional suffix provides parameters for the condition. This makes those
parameters a bit inflexible, but is relatively succinct.

This leads on to your first point. You're right that use cases might appear
which need other conditions, and we should make it flexible enough to be
able to cope with future use cases. On the other hand, I was concerned that
we end up with something which is quite complicated to configure. (There
comes a point where it might makes more sense for the user to write their
own SMT).

Just of the top of my head it might look like:
>
> transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> transforms.conditionalExtract.condition.hasMyHeader: header-name:my-header
> transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
>

That won't work because the format is basically a Properties/Map<String,
String> and what you've suggested has duplicate keys.

One thing I did briefly consider what the ability to treat conditions as
Configurable objects in their own right (opening up the possibility of
people supplying their own Conditions, just like they can supply their own
SMTs). That might be configured something like this:

    transforms.conditionalExtract.condition: not
    transforms.conditionalExtract.condition.not.type: Not
    transforms.conditionalExtract.condition.not.negated: foo
    transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
    transforms.conditionalExtract.condition.foo.header: my-header
    transforms.conditionalExtract.condition.foo.value: my-value

I didn't propose that I couldn't see the use cases to justify this kind of
complexity, especially as the common case would surely be matching against
topic name (to be honest I wasn't completely convinced by the need for
"has-header"). In the current version of the KIP that's just

    transforms.conditionalExtract.condition: topic-matches: my-prefix-.*

but using the more flexible scheme that would require something more like
this:

    transforms.conditionalExtract.condition: bar
    transforms.conditionalExtract.condition.bar.type: TopicMatch
    transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*

If people know of use cases which would justify more sophistication, I'm
happy to reconsider.

Thanks again for taking a look!

Tom

On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
<so...@opencore.com.invalid> wrote:

> Hi Tom,
>
> sounds useful to me, thanks for the KIP!
> The only thought that I had while reading was that this will probably raise
> questions about more involved conditions fairly quickly. For example the
> "has-header" will cause an appetite for conditions like
> "this-header-has-that-value".
> This would necessitate two parameters to be passed into the condition,
> which I think is not currently included in the KIP. I am not saying add
> this now, but might it make sense to discuss a concept of how that might
> look now, to avoid potential changes later on.
>
> Just of the top of my head it might look like:
>
> transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> transforms.conditionalExtract.condition.hasMyHeader: header-name:my-header
> transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
>
>
> Also, while writing that, I noticed that you have a version with and
> without "name" for your transformation in the KIP:
>
> transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> and
> transforms.conditionalExtract.condition: has-header:my-header
>
>
> Is this intentional and the name is optional?
>
>
> Best regards,
>
> Sönke
>
>
>
> On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
> >
> > Does anyone have any comments, feedback or thoughts about this?
> >
> > Thanks,
> >
> > Tom
> >
> > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com>
> wrote:
> >
> > > Hi,
> > >
> > > I've opened KIP-585 which is intended to provide a mechanism to
> > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > feedback:
> > >
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > >
> > > Many thanks,
> > >
> > > Tom
> > >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Sönke Liebau <so...@opencore.com.INVALID>.
Hi Tom,

sounds useful to me, thanks for the KIP!
The only thought that I had while reading was that this will probably raise
questions about more involved conditions fairly quickly. For example the
"has-header" will cause an appetite for conditions like
"this-header-has-that-value".
This would necessitate two parameters to be passed into the condition,
which I think is not currently included in the KIP. I am not saying add
this now, but might it make sense to discuss a concept of how that might
look now, to avoid potential changes later on.

Just of the top of my head it might look like:

transforms.conditionalExtract.condition.hasMyHeader: type:has-header
transforms.conditionalExtract.condition.hasMyHeader: header-name:my-header
transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value


Also, while writing that, I noticed that you have a version with and
without "name" for your transformation in the KIP:

transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
and
transforms.conditionalExtract.condition: has-header:my-header


Is this intentional and the name is optional?


Best regards,

Sönke



On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tb...@redhat.com> wrote:
>
> Does anyone have any comments, feedback or thoughts about this?
>
> Thanks,
>
> Tom
>
> On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi,
> >
> > I've opened KIP-585 which is intended to provide a mechanism to
> > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > feedback:
> >
https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> >
> > Many thanks,
> >
> > Tom
> >

Re: [DISCUSS] KIP-585: Conditional SMT

Posted by Tom Bentley <tb...@redhat.com>.
Does anyone have any comments, feedback or thoughts about this?

Thanks,

Tom

On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi,
>
> I've opened KIP-585 which is intended to provide a mechanism to
> conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> feedback:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>
> Many thanks,
>
> Tom
>