You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by gyejun choi <wh...@gmail.com> on 2020/10/23 05:56:34 UTC

KIP 678: New Kafka Connect SMT for plainText => Struct(or Map) with Regex

Hi,

I've opened KIP-678 which is intended to provide a new SMT in Kafka Connect.
I'd be grateful for any
feedback:https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==

thanks,

whsoul

Re: KIP 678: New Kafka Connect SMT for plainText => Struct(or Map) with Regex

Posted by Chris Egerton <ch...@confluent.io.INVALID>.
Hi whsoul,

Thanks for the KIP. The two use cases you identified seem quite appealing,
especially the first one: parsing structured log messages.

Here are my initial thoughts on the proposed design:

1. I wonder if it's necessary to include support for type casting with this
SMT. We already have a Cast SMT (
https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java)
that can parse multiple fields of a structured record value with differing
types. Would it be enough for your new SMT to only produce string values
for its structured data, and then allow users to perform casting logic
using the Cast SMT afterward?

2. It seems like the "struct.field" property is similar; based on the
examples, it looks like when the SMT is configured with a value for that
property, it will first pull out a field from a structured record value
(for example, it would pull out the value "
https://kafka.apache.org/documentation/#connect" from a map of {"url": "
https://kafka.apache.org/documentation/#connect"}), then parse that field's
value, and replace the entire record value (or key) with the result of the
parsing stage. It seems like this could be accomplished using the
ExtractField SMT (
https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java)
as a preliminary step before passing it to your new SMT. Is this correct?
And if so, could we simplify the interface for your SMT by removing the
"struct.field" property in favor of the existing ExtractField SMT?

3. (Nit) I believe the property names in the table beneath the "Proposed
Interfaces" section should have the "transforms.RegexTransform." prefix
removed, since users will be able to configure the SMT with any name, not
just "RegexTransform". This keeps in line with similar KIPs such as
KIP-437, which added a new property to the MaskField SMT (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
).

Cheers,

Chris

On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com> wrote:

> is there someone who any feed back about this?
>
> 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성:
>
> > Hi,
> >
> > I've opened KIP-678 which is intended to provide a new SMT in Kafka
> Connect.
> > I'd be grateful for any
> > feedback:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==
> >
> > thanks,
> >
> > whsoul
> >
> >
>

Re: KIP 678: New Kafka Connect SMT for plainText => Struct(or Map) with Regex

Posted by gyejun choi <wh...@gmail.com>.
is there someone who any feed back about this?

2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성:

> Hi,
>
> I've opened KIP-678 which is intended to provide a new SMT in Kafka Connect.
> I'd be grateful for any
> feedback:https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==
>
> thanks,
>
> whsoul
>
>