You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Joshua Grisham <gr...@gmail.com> on 2020/11/07 16:11:05 UTC

[DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Hello everyone!
(Thanks to Tom Bentley for directing me in this direction!)

I have made a series of changes to some of the standard Connect transforms
to meet some of the challenges at my company to consume messages using
Connect, and have been running them for a few weeks now as custom SMTs.

I realized that several of these changes might actually be really good
features to be included in the standard transforms so I will open some KIPs
to go along with PRs which I already submitted (apologize for going a bit
backwards in the process as this was my first time working with the Kafka
project).

The first one I have created a KIP for is KIP-682 on the TimestampConverter
transform.

Basically the 2 limitations that I am proposing to remove are the ability
to only convert one field at a time, but instead convert multiple fields
using a common config, and that if any producer sends a slightly different
date/timestamp string format on any message then the transformation
fails... so to instead add the ability to support variations in the string
input format.

More info here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats

I would appreciate any kind of discussion to make improvements and it would
be great to see changes like this being pushed up.

Thanks for now!

Best regards,
Joshua Grisham

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Mickael Maison <mi...@gmail.com>.
Thanks Joshua for this KIP, it looks like a useful addition.

I have a few questions:

1. In Public Interfaces and Proposed Changes, the KIP mentions the
current "field" setting will be renamed to "fields". Do you instead
mean a new setting "fields" will be added and the existing setting
will be marked deprecated?

2. What happens if a user specifies both "field" and "fields"? Is one
setting taking precedence? Is it a bad configuration that will be
rejected? Same for the format and format.input/format.output settings.

3. Can you also add the proposed documentation for the new settings in the KIP?

Best,
Mickael

On Mon, Nov 22, 2021 at 11:37 AM Joshua Grisham <gr...@gmail.com> wrote:
>
> Hi Randall, thanks again for the great feedback! Sorry it took me a minute
> to get back on this one--I have been a bit full up over the last few
> months--but now I will try to brush it off and see if we can figure out if
> it makes sense to move forward and possibly close this one out.
>
> Regarding backward compatibility -- the intention of my proposed changes
> (including the candidate PR) was that it still is backward compatible, but
> I realize that what I wrote in the KIP was more theoretical about what
> could happen instead of what I actually did in the PR (that I actually made
> it backward-compatible).  So now I have updated the text in the KIP to
> hopefully more accurately reflect this and I think/hope that it, along with
> what is in the candidate PR, addresses everything you mentioned in your
> first and second paragraph.
>
> For clarity here is exactly what I have done in the PR:
>
> First I saw that the way the config properties for TimestampConverter
> lacked the same structure as it did for other SMTs (e.g. ReplaceField as
> one example), especially since I was adding several new properties, so I
> added the same ConfigName and ConfigDefault interfaces (instead of just
> having various string attributes on the TimestampConverter class itself).
> The existing public class  string attributes FIELD_CONFIG,
> TARGET_TYPE_CONFIG, and FORMAT_CONFIG all still exist and all still can be
> continued to used by anyone else, they will just get warnings at build time
> since I marked them with @Deprecated.  I did remove the private class
> strings since they were no longer used in the class but this should not
> have any impact on compatibility.
>
> Second I added the new properties to CONFIG_DEF but kept the existing
> properties there (so they still exist and can be used) and in the
> configure() method I have used ConfigUtils.translateDeprecatedConfigs() to
> alias the old "field" property to the new "fields" property automatically
> just like was done in ReplaceField from the change from KIP-629 (
> https://github.com/apache/kafka/commit/eab61cad2c418786ab5e58aa1267f689d82a61d1
> )
> In the end it means that a user can still have their connector use the
> "field" property but when the configure() method is called it will just put
> whatever they set in "field" to "fields" automatically.
>
> Also the property "format" still behaves exactly the same as before if it
> is used (it continues to function as a SimpleDateFormatter input and
> output) it is just that there is an additional capability added by
> voluntarily adopting the two new properties "format.input" and
> "format.output".
>
> As far as I can tell, in theory someone could have an existing connector,
> using TimestampConverter (or something else which uses the API even),
> upgrade to this version, and not change anything else, and it still behaves
> the same as before.  But perhaps it should not be marked as @Deprecated to
> avoid the warnings or otherwise is there a clearly defined deprecation
> policy that can be followed? I tried to search and look in the Kafka docs
> but could not find anything after several attempts and rabbit-holes
> unfortunately (closest I seem to find is just a list or documentation on
> what has been deprecated in each release).
>
> However please do say if there is anything else you think is missing either
> in the implementation or now in the updated KIP regarding this!
>
> Also hopefully with my updates to the KIP it has addressed your other
> comments or questions, but please feel free to give it another look when
> you have time and I would welcome any feedback that you have.
>
> I also noticed there was one small change where my PR now has a merge
> conflict but it should be pretty minimal to update as long as everything
> else is still looking ok.. I will try to take a look at this soon as well
> if I am able.
>
> Thanks again and have a great week!
>
> Best,
> Joshua
>
>
>
> Den ons 28 juli 2021 kl 16:05 skrev Randall Hauch <rh...@gmail.com>:
>
> > Thanks for the contribution, Joshua. Overall I think this is a really good
> > improvement to this connector, for all the reasons cited in the motivation
> > section. I do have some requests, though.
> >
> > The biggest issue I have is that the change is not backward compatible.
> > We've worked really hard to make sure that all Connect APIs (including
> > configuration properties) are always backward compatible because this makes
> > it trivial for users to upgrade from one Connect release to a newer one.
> > After all, any users that are already using this SMT in their connectors
> > should be able to upgrade Connect and still have the SMT behave exactly as
> > it did in earlier releases -- unless they opt in to changing their
> > configuration to use the new feature. So I would request that the KIP and
> > proposed implementation be changed to not *change* existing configuration
> > properties. We can of course introduce new config properties, but they
> > should have a default that results in exactly the same old behavior as in
> > earlier releases. A third option is to create a new SMT, though we should
> > try really hard to just improve the existing SMT since that's generally a
> > much better UX. Assuming we can make the existing transformation
> > work, the option to create a new SMT should be documented in the Rejected
> > Alternatives section, too.
> >
> > For example, there are (at least?) two options with regard to specifying
> > the field names. 1) just keep `field` as the name even though a
> > comma-separated list is allowed, or 2) add `fields` but keep (and
> > deprecate?) the existing `field` configuration, using `fields` if defined
> > or if `field` is not defined, and maybe even supporting comma separated
> > lists of names in both fields. The same goes for `format` and
> > `format.output`. In general, I think 2 *might* provide a better experience
> > at the cost of higher implementation complexity. Regardless of which you
> > choose, the KIP should explicitly propose one approach, and document the
> > other(s) in the rejected alternatives section. After all, the KIP should be
> > very clear about what will be proposed.
> >
> > In terms of deprecating the old configs in option 2, it'd be fine to do
> > this in the documentation (and code, if applicable) as part of this
> > feature. That means they could be removed at the earliest in the next major
> > release after the feature is merged, which would be AK 4.0. And it would be
> > good to mention this explicitly in the KIP so that we don't need another
> > KIP just to remove the deprecation; so something like "This configuration
> > property will be deprecated and documented as such, and eventually removed
> > in a subsequent major release."
> >
> > Supporting nested fields would also be a nice touch here, too. However, as
> > you mention, we'd likely want to make the same change to other SMTs, and as
> > such we might consider adding support for nested names on all SMTs with a
> > separate KIP. The two KIPs could be written such that they are independent.
> >
> > Finally, a KIP need not go into much detail about the implementation, and
> > should instead focus on the API and behavioral changes. For example, it's
> > probably not necessary for the KIP to mention that the implementation would
> > use DateTimeFormatter. However, it would be worth mentioning explicitly
> > that the `format.input` configuration property will take a regular
> > expression that is compatible with the JDK's DateTimeFormatter.ofPattern
> > method
> > -- this very clearly states the constraint on the input values.
> >
> > Again, very nice job on the KIP so far!
> >
> > Best regards,
> >
> > Randall
> >
> > On Sat, May 15, 2021 at 6:50 AM Joshua Grisham <gr...@gmail.com>
> > wrote:
> >
> > > Hello again Kafka Developers community!
> > >
> > > I thought it has been some time and I would try to dust KIP-682 off and
> > see
> > > if anyone wanted to take a discussion on them, if it still makes sense or
> > > if this should go in a different direction.
> > >
> > > Anyone have thoughts on this?  That Connect's TimestampConverter could be
> > > updated to support converting multiple fields in one pass of the
> > > transformation (instead of chaining multiple instances together if you
> > have
> > > multiple timestamps to convert) and that it could support a regex-like
> > > pattern for matching string input instead of only allowing one string
> > > format and expect that all producers are sending in 100% the same format?
> > >
> > > Since some time has gone there are a few conflicts in my proposed PR but
> > > they seem quite trivial (some error message and formatting of a test
> > result
> > > from a quick glance) so it should not be hard to resolve them.
> > >
> > > Thanks!
> > >
> > > Best,
> > > Joshua
> > >
> > >
> > > On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <gr...@gmail.com>
> > > wrote:
> > >
> > > > Hi Brandon,
> > > >
> > > > I have added what i called "recursive" support to find child fields on
> > > any
> > > > level to the Cast and ReplaceField transforms as well, the PR is here
> > > > https://github.com/apache/kafka/pull/9493
> > > >
> > > > I plan to try and write up a KIP for that one as well maybe in the next
> > > > day or two. It is not exactly the same as specifically targeting one
> > > nested
> > > > field based on its full path, but instead will look for the field name
> > at
> > > > any level of the structure and cast/replace all of them if they match.
> > > >
> > > > The same could be added in theory to almost all of the transforms i
> > guess
> > > > but maybe that is for another discussion 😊
> > > >
> > > > But for now with this TimestampConverter the only thing which I have
> > > > proposed are the two mentioned before - multiple fields and support for
> > > > pattern like matching of string timestamps.
> > > >
> > > > Joshua
> > > >
> > > > Den lör 7 nov. 2020 18:42Brandon Brown <br...@bbrownsound.com>
> > skrev:
> > > >
> > > >> I love this idea! I have a current KIP for a hash transform and am
> > > >> working adding multi field/nested support to that one. I can think of
> > a
> > > few
> > > >> times we’ve needed functionality like that. I’d add that adding
> > support
> > > for
> > > >> transforming nested fields would be a great feature for this.
> > > >>
> > > >> -Brandon
> > > >>
> > > >> Brandon Brown
> > > >> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > Hello everyone!
> > > >> > (Thanks to Tom Bentley for directing me in this direction!)
> > > >> >
> > > >> > I have made a series of changes to some of the standard Connect
> > > >> transforms
> > > >> > to meet some of the challenges at my company to consume messages
> > using
> > > >> > Connect, and have been running them for a few weeks now as custom
> > > SMTs.
> > > >> >
> > > >> > I realized that several of these changes might actually be really
> > good
> > > >> > features to be included in the standard transforms so I will open
> > some
> > > >> KIPs
> > > >> > to go along with PRs which I already submitted (apologize for going
> > a
> > > >> bit
> > > >> > backwards in the process as this was my first time working with the
> > > >> Kafka
> > > >> > project).
> > > >> >
> > > >> > The first one I have created a KIP for is KIP-682 on the
> > > >> TimestampConverter
> > > >> > transform.
> > > >> >
> > > >> > Basically the 2 limitations that I am proposing to remove are the
> > > >> ability
> > > >> > to only convert one field at a time, but instead convert multiple
> > > fields
> > > >> > using a common config, and that if any producer sends a slightly
> > > >> different
> > > >> > date/timestamp string format on any message then the transformation
> > > >> > fails... so to instead add the ability to support variations in the
> > > >> string
> > > >> > input format.
> > > >> >
> > > >> > More info here:
> > > >> >
> > > >> >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> > > >> >
> > > >> > I would appreciate any kind of discussion to make improvements and
> > it
> > > >> would
> > > >> > be great to see changes like this being pushed up.
> > > >> >
> > > >> > Thanks for now!
> > > >> >
> > > >> > Best regards,
> > > >> > Joshua Grisham
> > > >>
> > > >
> > >
> >

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Joshua Grisham <gr...@gmail.com>.
Hi Randall, thanks again for the great feedback! Sorry it took me a minute
to get back on this one--I have been a bit full up over the last few
months--but now I will try to brush it off and see if we can figure out if
it makes sense to move forward and possibly close this one out.

Regarding backward compatibility -- the intention of my proposed changes
(including the candidate PR) was that it still is backward compatible, but
I realize that what I wrote in the KIP was more theoretical about what
could happen instead of what I actually did in the PR (that I actually made
it backward-compatible).  So now I have updated the text in the KIP to
hopefully more accurately reflect this and I think/hope that it, along with
what is in the candidate PR, addresses everything you mentioned in your
first and second paragraph.

For clarity here is exactly what I have done in the PR:

First I saw that the way the config properties for TimestampConverter
lacked the same structure as it did for other SMTs (e.g. ReplaceField as
one example), especially since I was adding several new properties, so I
added the same ConfigName and ConfigDefault interfaces (instead of just
having various string attributes on the TimestampConverter class itself).
The existing public class  string attributes FIELD_CONFIG,
TARGET_TYPE_CONFIG, and FORMAT_CONFIG all still exist and all still can be
continued to used by anyone else, they will just get warnings at build time
since I marked them with @Deprecated.  I did remove the private class
strings since they were no longer used in the class but this should not
have any impact on compatibility.

Second I added the new properties to CONFIG_DEF but kept the existing
properties there (so they still exist and can be used) and in the
configure() method I have used ConfigUtils.translateDeprecatedConfigs() to
alias the old "field" property to the new "fields" property automatically
just like was done in ReplaceField from the change from KIP-629 (
https://github.com/apache/kafka/commit/eab61cad2c418786ab5e58aa1267f689d82a61d1
)
In the end it means that a user can still have their connector use the
"field" property but when the configure() method is called it will just put
whatever they set in "field" to "fields" automatically.

Also the property "format" still behaves exactly the same as before if it
is used (it continues to function as a SimpleDateFormatter input and
output) it is just that there is an additional capability added by
voluntarily adopting the two new properties "format.input" and
"format.output".

As far as I can tell, in theory someone could have an existing connector,
using TimestampConverter (or something else which uses the API even),
upgrade to this version, and not change anything else, and it still behaves
the same as before.  But perhaps it should not be marked as @Deprecated to
avoid the warnings or otherwise is there a clearly defined deprecation
policy that can be followed? I tried to search and look in the Kafka docs
but could not find anything after several attempts and rabbit-holes
unfortunately (closest I seem to find is just a list or documentation on
what has been deprecated in each release).

However please do say if there is anything else you think is missing either
in the implementation or now in the updated KIP regarding this!

Also hopefully with my updates to the KIP it has addressed your other
comments or questions, but please feel free to give it another look when
you have time and I would welcome any feedback that you have.

I also noticed there was one small change where my PR now has a merge
conflict but it should be pretty minimal to update as long as everything
else is still looking ok.. I will try to take a look at this soon as well
if I am able.

Thanks again and have a great week!

Best,
Joshua



Den ons 28 juli 2021 kl 16:05 skrev Randall Hauch <rh...@gmail.com>:

> Thanks for the contribution, Joshua. Overall I think this is a really good
> improvement to this connector, for all the reasons cited in the motivation
> section. I do have some requests, though.
>
> The biggest issue I have is that the change is not backward compatible.
> We've worked really hard to make sure that all Connect APIs (including
> configuration properties) are always backward compatible because this makes
> it trivial for users to upgrade from one Connect release to a newer one.
> After all, any users that are already using this SMT in their connectors
> should be able to upgrade Connect and still have the SMT behave exactly as
> it did in earlier releases -- unless they opt in to changing their
> configuration to use the new feature. So I would request that the KIP and
> proposed implementation be changed to not *change* existing configuration
> properties. We can of course introduce new config properties, but they
> should have a default that results in exactly the same old behavior as in
> earlier releases. A third option is to create a new SMT, though we should
> try really hard to just improve the existing SMT since that's generally a
> much better UX. Assuming we can make the existing transformation
> work, the option to create a new SMT should be documented in the Rejected
> Alternatives section, too.
>
> For example, there are (at least?) two options with regard to specifying
> the field names. 1) just keep `field` as the name even though a
> comma-separated list is allowed, or 2) add `fields` but keep (and
> deprecate?) the existing `field` configuration, using `fields` if defined
> or if `field` is not defined, and maybe even supporting comma separated
> lists of names in both fields. The same goes for `format` and
> `format.output`. In general, I think 2 *might* provide a better experience
> at the cost of higher implementation complexity. Regardless of which you
> choose, the KIP should explicitly propose one approach, and document the
> other(s) in the rejected alternatives section. After all, the KIP should be
> very clear about what will be proposed.
>
> In terms of deprecating the old configs in option 2, it'd be fine to do
> this in the documentation (and code, if applicable) as part of this
> feature. That means they could be removed at the earliest in the next major
> release after the feature is merged, which would be AK 4.0. And it would be
> good to mention this explicitly in the KIP so that we don't need another
> KIP just to remove the deprecation; so something like "This configuration
> property will be deprecated and documented as such, and eventually removed
> in a subsequent major release."
>
> Supporting nested fields would also be a nice touch here, too. However, as
> you mention, we'd likely want to make the same change to other SMTs, and as
> such we might consider adding support for nested names on all SMTs with a
> separate KIP. The two KIPs could be written such that they are independent.
>
> Finally, a KIP need not go into much detail about the implementation, and
> should instead focus on the API and behavioral changes. For example, it's
> probably not necessary for the KIP to mention that the implementation would
> use DateTimeFormatter. However, it would be worth mentioning explicitly
> that the `format.input` configuration property will take a regular
> expression that is compatible with the JDK's DateTimeFormatter.ofPattern
> method
> -- this very clearly states the constraint on the input values.
>
> Again, very nice job on the KIP so far!
>
> Best regards,
>
> Randall
>
> On Sat, May 15, 2021 at 6:50 AM Joshua Grisham <gr...@gmail.com>
> wrote:
>
> > Hello again Kafka Developers community!
> >
> > I thought it has been some time and I would try to dust KIP-682 off and
> see
> > if anyone wanted to take a discussion on them, if it still makes sense or
> > if this should go in a different direction.
> >
> > Anyone have thoughts on this?  That Connect's TimestampConverter could be
> > updated to support converting multiple fields in one pass of the
> > transformation (instead of chaining multiple instances together if you
> have
> > multiple timestamps to convert) and that it could support a regex-like
> > pattern for matching string input instead of only allowing one string
> > format and expect that all producers are sending in 100% the same format?
> >
> > Since some time has gone there are a few conflicts in my proposed PR but
> > they seem quite trivial (some error message and formatting of a test
> result
> > from a quick glance) so it should not be hard to resolve them.
> >
> > Thanks!
> >
> > Best,
> > Joshua
> >
> >
> > On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <gr...@gmail.com>
> > wrote:
> >
> > > Hi Brandon,
> > >
> > > I have added what i called "recursive" support to find child fields on
> > any
> > > level to the Cast and ReplaceField transforms as well, the PR is here
> > > https://github.com/apache/kafka/pull/9493
> > >
> > > I plan to try and write up a KIP for that one as well maybe in the next
> > > day or two. It is not exactly the same as specifically targeting one
> > nested
> > > field based on its full path, but instead will look for the field name
> at
> > > any level of the structure and cast/replace all of them if they match.
> > >
> > > The same could be added in theory to almost all of the transforms i
> guess
> > > but maybe that is for another discussion 😊
> > >
> > > But for now with this TimestampConverter the only thing which I have
> > > proposed are the two mentioned before - multiple fields and support for
> > > pattern like matching of string timestamps.
> > >
> > > Joshua
> > >
> > > Den lör 7 nov. 2020 18:42Brandon Brown <br...@bbrownsound.com>
> skrev:
> > >
> > >> I love this idea! I have a current KIP for a hash transform and am
> > >> working adding multi field/nested support to that one. I can think of
> a
> > few
> > >> times we’ve needed functionality like that. I’d add that adding
> support
> > for
> > >> transforming nested fields would be a great feature for this.
> > >>
> > >> -Brandon
> > >>
> > >> Brandon Brown
> > >> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Hello everyone!
> > >> > (Thanks to Tom Bentley for directing me in this direction!)
> > >> >
> > >> > I have made a series of changes to some of the standard Connect
> > >> transforms
> > >> > to meet some of the challenges at my company to consume messages
> using
> > >> > Connect, and have been running them for a few weeks now as custom
> > SMTs.
> > >> >
> > >> > I realized that several of these changes might actually be really
> good
> > >> > features to be included in the standard transforms so I will open
> some
> > >> KIPs
> > >> > to go along with PRs which I already submitted (apologize for going
> a
> > >> bit
> > >> > backwards in the process as this was my first time working with the
> > >> Kafka
> > >> > project).
> > >> >
> > >> > The first one I have created a KIP for is KIP-682 on the
> > >> TimestampConverter
> > >> > transform.
> > >> >
> > >> > Basically the 2 limitations that I am proposing to remove are the
> > >> ability
> > >> > to only convert one field at a time, but instead convert multiple
> > fields
> > >> > using a common config, and that if any producer sends a slightly
> > >> different
> > >> > date/timestamp string format on any message then the transformation
> > >> > fails... so to instead add the ability to support variations in the
> > >> string
> > >> > input format.
> > >> >
> > >> > More info here:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> > >> >
> > >> > I would appreciate any kind of discussion to make improvements and
> it
> > >> would
> > >> > be great to see changes like this being pushed up.
> > >> >
> > >> > Thanks for now!
> > >> >
> > >> > Best regards,
> > >> > Joshua Grisham
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Randall Hauch <rh...@gmail.com>.
Thanks for the contribution, Joshua. Overall I think this is a really good
improvement to this connector, for all the reasons cited in the motivation
section. I do have some requests, though.

The biggest issue I have is that the change is not backward compatible.
We've worked really hard to make sure that all Connect APIs (including
configuration properties) are always backward compatible because this makes
it trivial for users to upgrade from one Connect release to a newer one.
After all, any users that are already using this SMT in their connectors
should be able to upgrade Connect and still have the SMT behave exactly as
it did in earlier releases -- unless they opt in to changing their
configuration to use the new feature. So I would request that the KIP and
proposed implementation be changed to not *change* existing configuration
properties. We can of course introduce new config properties, but they
should have a default that results in exactly the same old behavior as in
earlier releases. A third option is to create a new SMT, though we should
try really hard to just improve the existing SMT since that's generally a
much better UX. Assuming we can make the existing transformation
work, the option to create a new SMT should be documented in the Rejected
Alternatives section, too.

For example, there are (at least?) two options with regard to specifying
the field names. 1) just keep `field` as the name even though a
comma-separated list is allowed, or 2) add `fields` but keep (and
deprecate?) the existing `field` configuration, using `fields` if defined
or if `field` is not defined, and maybe even supporting comma separated
lists of names in both fields. The same goes for `format` and
`format.output`. In general, I think 2 *might* provide a better experience
at the cost of higher implementation complexity. Regardless of which you
choose, the KIP should explicitly propose one approach, and document the
other(s) in the rejected alternatives section. After all, the KIP should be
very clear about what will be proposed.

In terms of deprecating the old configs in option 2, it'd be fine to do
this in the documentation (and code, if applicable) as part of this
feature. That means they could be removed at the earliest in the next major
release after the feature is merged, which would be AK 4.0. And it would be
good to mention this explicitly in the KIP so that we don't need another
KIP just to remove the deprecation; so something like "This configuration
property will be deprecated and documented as such, and eventually removed
in a subsequent major release."

Supporting nested fields would also be a nice touch here, too. However, as
you mention, we'd likely want to make the same change to other SMTs, and as
such we might consider adding support for nested names on all SMTs with a
separate KIP. The two KIPs could be written such that they are independent.

Finally, a KIP need not go into much detail about the implementation, and
should instead focus on the API and behavioral changes. For example, it's
probably not necessary for the KIP to mention that the implementation would
use DateTimeFormatter. However, it would be worth mentioning explicitly
that the `format.input` configuration property will take a regular
expression that is compatible with the JDK's DateTimeFormatter.ofPattern method
-- this very clearly states the constraint on the input values.

Again, very nice job on the KIP so far!

Best regards,

Randall

On Sat, May 15, 2021 at 6:50 AM Joshua Grisham <gr...@gmail.com>
wrote:

> Hello again Kafka Developers community!
>
> I thought it has been some time and I would try to dust KIP-682 off and see
> if anyone wanted to take a discussion on them, if it still makes sense or
> if this should go in a different direction.
>
> Anyone have thoughts on this?  That Connect's TimestampConverter could be
> updated to support converting multiple fields in one pass of the
> transformation (instead of chaining multiple instances together if you have
> multiple timestamps to convert) and that it could support a regex-like
> pattern for matching string input instead of only allowing one string
> format and expect that all producers are sending in 100% the same format?
>
> Since some time has gone there are a few conflicts in my proposed PR but
> they seem quite trivial (some error message and formatting of a test result
> from a quick glance) so it should not be hard to resolve them.
>
> Thanks!
>
> Best,
> Joshua
>
>
> On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <gr...@gmail.com>
> wrote:
>
> > Hi Brandon,
> >
> > I have added what i called "recursive" support to find child fields on
> any
> > level to the Cast and ReplaceField transforms as well, the PR is here
> > https://github.com/apache/kafka/pull/9493
> >
> > I plan to try and write up a KIP for that one as well maybe in the next
> > day or two. It is not exactly the same as specifically targeting one
> nested
> > field based on its full path, but instead will look for the field name at
> > any level of the structure and cast/replace all of them if they match.
> >
> > The same could be added in theory to almost all of the transforms i guess
> > but maybe that is for another discussion 😊
> >
> > But for now with this TimestampConverter the only thing which I have
> > proposed are the two mentioned before - multiple fields and support for
> > pattern like matching of string timestamps.
> >
> > Joshua
> >
> > Den lör 7 nov. 2020 18:42Brandon Brown <br...@bbrownsound.com> skrev:
> >
> >> I love this idea! I have a current KIP for a hash transform and am
> >> working adding multi field/nested support to that one. I can think of a
> few
> >> times we’ve needed functionality like that. I’d add that adding support
> for
> >> transforming nested fields would be a great feature for this.
> >>
> >> -Brandon
> >>
> >> Brandon Brown
> >> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com>
> >> wrote:
> >> >
> >> > Hello everyone!
> >> > (Thanks to Tom Bentley for directing me in this direction!)
> >> >
> >> > I have made a series of changes to some of the standard Connect
> >> transforms
> >> > to meet some of the challenges at my company to consume messages using
> >> > Connect, and have been running them for a few weeks now as custom
> SMTs.
> >> >
> >> > I realized that several of these changes might actually be really good
> >> > features to be included in the standard transforms so I will open some
> >> KIPs
> >> > to go along with PRs which I already submitted (apologize for going a
> >> bit
> >> > backwards in the process as this was my first time working with the
> >> Kafka
> >> > project).
> >> >
> >> > The first one I have created a KIP for is KIP-682 on the
> >> TimestampConverter
> >> > transform.
> >> >
> >> > Basically the 2 limitations that I am proposing to remove are the
> >> ability
> >> > to only convert one field at a time, but instead convert multiple
> fields
> >> > using a common config, and that if any producer sends a slightly
> >> different
> >> > date/timestamp string format on any message then the transformation
> >> > fails... so to instead add the ability to support variations in the
> >> string
> >> > input format.
> >> >
> >> > More info here:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> >> >
> >> > I would appreciate any kind of discussion to make improvements and it
> >> would
> >> > be great to see changes like this being pushed up.
> >> >
> >> > Thanks for now!
> >> >
> >> > Best regards,
> >> > Joshua Grisham
> >>
> >
>

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Joshua Grisham <gr...@gmail.com>.
Hello again Kafka Developers community!

I thought it has been some time and I would try to dust KIP-682 off and see
if anyone wanted to take a discussion on them, if it still makes sense or
if this should go in a different direction.

Anyone have thoughts on this?  That Connect's TimestampConverter could be
updated to support converting multiple fields in one pass of the
transformation (instead of chaining multiple instances together if you have
multiple timestamps to convert) and that it could support a regex-like
pattern for matching string input instead of only allowing one string
format and expect that all producers are sending in 100% the same format?

Since some time has gone there are a few conflicts in my proposed PR but
they seem quite trivial (some error message and formatting of a test result
from a quick glance) so it should not be hard to resolve them.

Thanks!

Best,
Joshua


On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <gr...@gmail.com> wrote:

> Hi Brandon,
>
> I have added what i called "recursive" support to find child fields on any
> level to the Cast and ReplaceField transforms as well, the PR is here
> https://github.com/apache/kafka/pull/9493
>
> I plan to try and write up a KIP for that one as well maybe in the next
> day or two. It is not exactly the same as specifically targeting one nested
> field based on its full path, but instead will look for the field name at
> any level of the structure and cast/replace all of them if they match.
>
> The same could be added in theory to almost all of the transforms i guess
> but maybe that is for another discussion 😊
>
> But for now with this TimestampConverter the only thing which I have
> proposed are the two mentioned before - multiple fields and support for
> pattern like matching of string timestamps.
>
> Joshua
>
> Den lör 7 nov. 2020 18:42Brandon Brown <br...@bbrownsound.com> skrev:
>
>> I love this idea! I have a current KIP for a hash transform and am
>> working adding multi field/nested support to that one. I can think of a few
>> times we’ve needed functionality like that. I’d add that adding support for
>> transforming nested fields would be a great feature for this.
>>
>> -Brandon
>>
>> Brandon Brown
>> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com>
>> wrote:
>> >
>> > Hello everyone!
>> > (Thanks to Tom Bentley for directing me in this direction!)
>> >
>> > I have made a series of changes to some of the standard Connect
>> transforms
>> > to meet some of the challenges at my company to consume messages using
>> > Connect, and have been running them for a few weeks now as custom SMTs.
>> >
>> > I realized that several of these changes might actually be really good
>> > features to be included in the standard transforms so I will open some
>> KIPs
>> > to go along with PRs which I already submitted (apologize for going a
>> bit
>> > backwards in the process as this was my first time working with the
>> Kafka
>> > project).
>> >
>> > The first one I have created a KIP for is KIP-682 on the
>> TimestampConverter
>> > transform.
>> >
>> > Basically the 2 limitations that I am proposing to remove are the
>> ability
>> > to only convert one field at a time, but instead convert multiple fields
>> > using a common config, and that if any producer sends a slightly
>> different
>> > date/timestamp string format on any message then the transformation
>> > fails... so to instead add the ability to support variations in the
>> string
>> > input format.
>> >
>> > More info here:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
>> >
>> > I would appreciate any kind of discussion to make improvements and it
>> would
>> > be great to see changes like this being pushed up.
>> >
>> > Thanks for now!
>> >
>> > Best regards,
>> > Joshua Grisham
>>
>

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Joshua Grisham <gr...@gmail.com>.
Hi Brandon,

I have added what i called "recursive" support to find child fields on any
level to the Cast and ReplaceField transforms as well, the PR is here
https://github.com/apache/kafka/pull/9493

I plan to try and write up a KIP for that one as well maybe in the next day
or two. It is not exactly the same as specifically targeting one nested
field based on its full path, but instead will look for the field name at
any level of the structure and cast/replace all of them if they match.

The same could be added in theory to almost all of the transforms i guess
but maybe that is for another discussion 😊

But for now with this TimestampConverter the only thing which I have
proposed are the two mentioned before - multiple fields and support for
pattern like matching of string timestamps.

Joshua

Den lör 7 nov. 2020 18:42Brandon Brown <br...@bbrownsound.com> skrev:

> I love this idea! I have a current KIP for a hash transform and am working
> adding multi field/nested support to that one. I can think of a few times
> we’ve needed functionality like that. I’d add that adding support for
> transforming nested fields would be a great feature for this.
>
> -Brandon
>
> Brandon Brown
> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com>
> wrote:
> >
> > Hello everyone!
> > (Thanks to Tom Bentley for directing me in this direction!)
> >
> > I have made a series of changes to some of the standard Connect
> transforms
> > to meet some of the challenges at my company to consume messages using
> > Connect, and have been running them for a few weeks now as custom SMTs.
> >
> > I realized that several of these changes might actually be really good
> > features to be included in the standard transforms so I will open some
> KIPs
> > to go along with PRs which I already submitted (apologize for going a bit
> > backwards in the process as this was my first time working with the Kafka
> > project).
> >
> > The first one I have created a KIP for is KIP-682 on the
> TimestampConverter
> > transform.
> >
> > Basically the 2 limitations that I am proposing to remove are the ability
> > to only convert one field at a time, but instead convert multiple fields
> > using a common config, and that if any producer sends a slightly
> different
> > date/timestamp string format on any message then the transformation
> > fails... so to instead add the ability to support variations in the
> string
> > input format.
> >
> > More info here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> >
> > I would appreciate any kind of discussion to make improvements and it
> would
> > be great to see changes like this being pushed up.
> >
> > Thanks for now!
> >
> > Best regards,
> > Joshua Grisham
>

Re: [DISCUSS] KIP-682: Connect TimestampConverter support for multiple fields and multiple input formats

Posted by Brandon Brown <br...@bbrownsound.com>.
I love this idea! I have a current KIP for a hash transform and am working adding multi field/nested support to that one. I can think of a few times we’ve needed functionality like that. I’d add that adding support for transforming nested fields would be a great feature for this. 

-Brandon

Brandon Brown
> On Nov 7, 2020, at 11:11 AM, Joshua Grisham <gr...@gmail.com> wrote:
> 
> Hello everyone!
> (Thanks to Tom Bentley for directing me in this direction!)
> 
> I have made a series of changes to some of the standard Connect transforms
> to meet some of the challenges at my company to consume messages using
> Connect, and have been running them for a few weeks now as custom SMTs.
> 
> I realized that several of these changes might actually be really good
> features to be included in the standard transforms so I will open some KIPs
> to go along with PRs which I already submitted (apologize for going a bit
> backwards in the process as this was my first time working with the Kafka
> project).
> 
> The first one I have created a KIP for is KIP-682 on the TimestampConverter
> transform.
> 
> Basically the 2 limitations that I am proposing to remove are the ability
> to only convert one field at a time, but instead convert multiple fields
> using a common config, and that if any producer sends a slightly different
> date/timestamp string format on any message then the transformation
> fails... so to instead add the ability to support variations in the string
> input format.
> 
> More info here:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats
> 
> I would appreciate any kind of discussion to make improvements and it would
> be great to see changes like this being pushed up.
> 
> Thanks for now!
> 
> Best regards,
> Joshua Grisham