You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by David Mariassy <da...@gmail.com> on 2023/02/09 19:28:21 UTC

[DISCUSS] KIP-905: Broker interceptors

Hi everyone,

I'd like to get a discussion going for KIP-905
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors>,
which proposes the addition of broker interceptors to the stack.

The KIP contains the motivation, and lists the new public interfaces that
this change would entail. Since my company had its quarterly hack days this
week, I also took the liberty to throw together a first prototype of the
proposed new feature here: https://github.com/apache/kafka/pull/13224.

Looking forward to the group's feedback!

Thanks,
David

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by David Mariassy <da...@gmail.com>.
Bumping this thread as I'd love to get a bit more feedback on the general
approach before proceeding.

On Fri, Feb 10, 2023 at 11:41 AM David Mariassy <da...@gmail.com>
wrote:

> Hi Ahmed,
>
> Thanks for taking a look at the KIP, and for your insightful feedback!
>
> I don't disagree with the sentiment that in-band interceptors could be a
> potential source of bugs in a cluster.
>
> Having said that, I don't necessarily think that an in-band interceptor is
> significantly riskier than an out-of-band pre-processor. Let's take the
> example of platform-wide privacy scrubbing. In my opinion it doesn't really
> matter if this feature is deployed as an out-of-band stream processor app
> that consumes from all topics OR if the logic is implemented as an in-ban
> interceptor. Either way, a faulty release of the scrubber will result in
> the platform-wide disruption of data flows. Thus, I'd argue that from the
> perspective of the platform's overall health, the level of risk is very
> comparable in both cases. However in-band interceptors have a couple of
> advantages in my opinion:
> 1. They are significantly cheaper (don't require duplicating data between
> raw and sanitized topics. There are also a lot of potential savings in
> network costs)
> 2. They are easier to maintain (no need to set up additional
> infrastructure for out-of-band processing)
> 3. They can provide accurate produce responses to clients (since there is
> no downstream processing that could render a client's messages invalid
> async)
>
> Also, in-band interceptors could be as safe or risky as their authors
> design them to be. There's nothing stopping someone from catching all
> exceptions in a `processRecord` method, and letting all unprocessed
> messages go through or sending them to a DLQ. Once the interceptor is
> fixed, those unprocessed messages could get re-ingested into Kafka to
> re-attempt pre-processing.
>
> Thanks and happy Friday,
> David
>
>
>
>
>
> On Fri, Feb 10, 2023 at 8:23 AM Ahmed Abdalla <en...@gmail.com>
> wrote:
>
>> Hi David,
>>
>> That's a very interesting KIP and I wanted to share my two cents. I
>> believe
>> there's a lot of value and use cases for the ability to intercept, mutate
>> and filter Kafka's messages, however I'm not sure if trying to achieve
>> that
>> via in-band interceptors is the best approach for this.
>>
>>    - My mental model around one of Kafka's core values is the brokers'
>>    focus on a single functionality (more or less): highly available and
>> fault
>>    tolerant commit log. I see this in many design decisions such as
>>    off-loading responsibilities to the clients (partitioner, assignor,
>>    consumer groups coordination etc).
>>    - And the impact of this KIP on the Kafka server would be adding
>> another
>>    moving part to their "state of the world" that they try to maintain.
>> What
>>    if an interceptor goes bad? What if there're version-mismatch? etc, a
>> lot
>>    of responsibilities that can be managed very efficiently out-of-band
>> IMHO.
>>    - The comparison to NginX and Kubernetes is IMHO comparing apples to
>>    oranges
>>       - NginX
>>          - Doesn't maintain persisted data.
>>          - It's designed as a middleware, it's an interceptor by nature.
>>       - Kubernetes
>>          - CRDs extend the API surface, they don't "augment" existing
>> APIs.
>>          I think admission webhooks
>>          <
>> https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
>> >
>> is
>>          Kubernetes' solution for providing interceptors.
>>          - The admission webhooks are out-of-band, and in fact they're a
>>          great example of "opening up your cluster for extensibility"
>> going wrong.
>>          Installing a misbehaving webhook can brick the whole cluster.
>>
>> As I mentioned, I see a value for users being able to intercept and
>> transform Kafka's messages. But I'm worried that having this as a core
>> Kafka feature might not be the best approach for achieving that.
>>
>> Thanks,
>> --
>> Ahmed Abdalla
>> T: @devguyio <https://twitter.com/devguyio>
>>
>>
>> On Thu, Feb 9, 2023 at 8:28 PM David Mariassy <da...@gmail.com>
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I'd like to get a discussion going for KIP-905
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
>> > >,
>> > which proposes the addition of broker interceptors to the stack.
>> >
>> > The KIP contains the motivation, and lists the new public interfaces
>> that
>> > this change would entail. Since my company had its quarterly hack days
>> this
>> > week, I also took the liberty to throw together a first prototype of the
>> > proposed new feature here: https://github.com/apache/kafka/pull/13224.
>> >
>> > Looking forward to the group's feedback!
>> >
>> > Thanks,
>> > David
>> >
>>
>>
>> --
>> *Ahmed Abdalla*
>>
>

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by David Mariassy <da...@gmail.com>.
Hi Ahmed,

Thanks for taking a look at the KIP, and for your insightful feedback!

I don't disagree with the sentiment that in-band interceptors could be a
potential source of bugs in a cluster.

Having said that, I don't necessarily think that an in-band interceptor is
significantly riskier than an out-of-band pre-processor. Let's take the
example of platform-wide privacy scrubbing. In my opinion it doesn't really
matter if this feature is deployed as an out-of-band stream processor app
that consumes from all topics OR if the logic is implemented as an in-ban
interceptor. Either way, a faulty release of the scrubber will result in
the platform-wide disruption of data flows. Thus, I'd argue that from the
perspective of the platform's overall health, the level of risk is very
comparable in both cases. However in-band interceptors have a couple of
advantages in my opinion:
1. They are significantly cheaper (don't require duplicating data between
raw and sanitized topics. There are also a lot of potential savings in
network costs)
2. They are easier to maintain (no need to set up additional infrastructure
for out-of-band processing)
3. They can provide accurate produce responses to clients (since there is
no downstream processing that could render a client's messages invalid
async)

Also, in-band interceptors could be as safe or risky as their authors
design them to be. There's nothing stopping someone from catching all
exceptions in a `processRecord` method, and letting all unprocessed
messages go through or sending them to a DLQ. Once the interceptor is
fixed, those unprocessed messages could get re-ingested into Kafka to
re-attempt pre-processing.

Thanks and happy Friday,
David





On Fri, Feb 10, 2023 at 8:23 AM Ahmed Abdalla <en...@gmail.com>
wrote:

> Hi David,
>
> That's a very interesting KIP and I wanted to share my two cents. I believe
> there's a lot of value and use cases for the ability to intercept, mutate
> and filter Kafka's messages, however I'm not sure if trying to achieve that
> via in-band interceptors is the best approach for this.
>
>    - My mental model around one of Kafka's core values is the brokers'
>    focus on a single functionality (more or less): highly available and
> fault
>    tolerant commit log. I see this in many design decisions such as
>    off-loading responsibilities to the clients (partitioner, assignor,
>    consumer groups coordination etc).
>    - And the impact of this KIP on the Kafka server would be adding another
>    moving part to their "state of the world" that they try to maintain.
> What
>    if an interceptor goes bad? What if there're version-mismatch? etc, a
> lot
>    of responsibilities that can be managed very efficiently out-of-band
> IMHO.
>    - The comparison to NginX and Kubernetes is IMHO comparing apples to
>    oranges
>       - NginX
>          - Doesn't maintain persisted data.
>          - It's designed as a middleware, it's an interceptor by nature.
>       - Kubernetes
>          - CRDs extend the API surface, they don't "augment" existing APIs.
>          I think admission webhooks
>          <
> https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
> >
> is
>          Kubernetes' solution for providing interceptors.
>          - The admission webhooks are out-of-band, and in fact they're a
>          great example of "opening up your cluster for extensibility"
> going wrong.
>          Installing a misbehaving webhook can brick the whole cluster.
>
> As I mentioned, I see a value for users being able to intercept and
> transform Kafka's messages. But I'm worried that having this as a core
> Kafka feature might not be the best approach for achieving that.
>
> Thanks,
> --
> Ahmed Abdalla
> T: @devguyio <https://twitter.com/devguyio>
>
>
> On Thu, Feb 9, 2023 at 8:28 PM David Mariassy <da...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > I'd like to get a discussion going for KIP-905
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
> > >,
> > which proposes the addition of broker interceptors to the stack.
> >
> > The KIP contains the motivation, and lists the new public interfaces that
> > this change would entail. Since my company had its quarterly hack days
> this
> > week, I also took the liberty to throw together a first prototype of the
> > proposed new feature here: https://github.com/apache/kafka/pull/13224.
> >
> > Looking forward to the group's feedback!
> >
> > Thanks,
> > David
> >
>
>
> --
> *Ahmed Abdalla*
>

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by Ahmed Abdalla <en...@gmail.com>.
Hi David,

That's a very interesting KIP and I wanted to share my two cents. I believe
there's a lot of value and use cases for the ability to intercept, mutate
and filter Kafka's messages, however I'm not sure if trying to achieve that
via in-band interceptors is the best approach for this.

   - My mental model around one of Kafka's core values is the brokers'
   focus on a single functionality (more or less): highly available and fault
   tolerant commit log. I see this in many design decisions such as
   off-loading responsibilities to the clients (partitioner, assignor,
   consumer groups coordination etc).
   - And the impact of this KIP on the Kafka server would be adding another
   moving part to their "state of the world" that they try to maintain. What
   if an interceptor goes bad? What if there're version-mismatch? etc, a lot
   of responsibilities that can be managed very efficiently out-of-band IMHO.
   - The comparison to NginX and Kubernetes is IMHO comparing apples to
   oranges
      - NginX
         - Doesn't maintain persisted data.
         - It's designed as a middleware, it's an interceptor by nature.
      - Kubernetes
         - CRDs extend the API surface, they don't "augment" existing APIs.
         I think admission webhooks
         <https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/>
is
         Kubernetes' solution for providing interceptors.
         - The admission webhooks are out-of-band, and in fact they're a
         great example of "opening up your cluster for extensibility"
going wrong.
         Installing a misbehaving webhook can brick the whole cluster.

As I mentioned, I see a value for users being able to intercept and
transform Kafka's messages. But I'm worried that having this as a core
Kafka feature might not be the best approach for achieving that.

Thanks,
-- 
Ahmed Abdalla
T: @devguyio <https://twitter.com/devguyio>


On Thu, Feb 9, 2023 at 8:28 PM David Mariassy <da...@gmail.com>
wrote:

> Hi everyone,
>
> I'd like to get a discussion going for KIP-905
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
> >,
> which proposes the addition of broker interceptors to the stack.
>
> The KIP contains the motivation, and lists the new public interfaces that
> this change would entail. Since my company had its quarterly hack days this
> week, I also took the liberty to throw together a first prototype of the
> proposed new feature here: https://github.com/apache/kafka/pull/13224.
>
> Looking forward to the group's feedback!
>
> Thanks,
> David
>


-- 
*Ahmed Abdalla*

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by Edoardo Comar <ed...@gmail.com>.
Also, the KIP doesn't describe the client-side experience

Is a producer only expected to get the new API error in the response when
the interceptor fails unexpectedly ?
It looks otherwise that it is expected that records may get skipped or
mutated without the producer metadata response noting this.

This is a significant difference from one of the other KIPs mentioned,
where the intention was for a producer to receive PolicyFailed exceptions.
This should be part of the KIP, IMHO


On Tue, 21 Feb 2023 at 13:53, Edoardo Comar <ed...@gmail.com> wrote:

> Hi David
> thanks for the KIP.
>
> Two initial observations from me.
>
> I think the Rejected Alternatives section could compare your proposal to
> the prior art that you rightly mention initially.
>
> Also, the Java interface could extend Kafka's own Configurable.
> This allows an implementation to get hold of the static properties with
> which a broker is started.
> In practice, that's a way to get hold of configuration entries for a
> plug-in class,
> as you can add entries to a broker server.properties that get ignored by
> the broker but passed on to plugins.
>
> cheers,
> Edoardo
>
> On Thu, 9 Feb 2023 at 19:28, David Mariassy <da...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I'd like to get a discussion going for KIP-905
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
>> >,
>> which proposes the addition of broker interceptors to the stack.
>>
>> The KIP contains the motivation, and lists the new public interfaces that
>> this change would entail. Since my company had its quarterly hack days
>> this
>> week, I also took the liberty to throw together a first prototype of the
>> proposed new feature here: https://github.com/apache/kafka/pull/13224.
>>
>> Looking forward to the group's feedback!
>>
>> Thanks,
>> David
>>
>

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by Edoardo Comar <ed...@gmail.com>.
Hi David
thanks for the KIP.

Two initial observations from me.

I think the Rejected Alternatives section could compare your proposal to
the prior art that you rightly mention initially.

Also, the Java interface could extend Kafka's own Configurable.
This allows an implementation to get hold of the static properties with
which a broker is started.
In practice, that's a way to get hold of configuration entries for a
plug-in class,
as you can add entries to a broker server.properties that get ignored by
the broker but passed on to plugins.

cheers,
Edoardo

On Thu, 9 Feb 2023 at 19:28, David Mariassy <da...@gmail.com>
wrote:

> Hi everyone,
>
> I'd like to get a discussion going for KIP-905
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
> >,
> which proposes the addition of broker interceptors to the stack.
>
> The KIP contains the motivation, and lists the new public interfaces that
> this change would entail. Since my company had its quarterly hack days this
> week, I also took the liberty to throw together a first prototype of the
> proposed new feature here: https://github.com/apache/kafka/pull/13224.
>
> Looking forward to the group's feedback!
>
> Thanks,
> David
>

Re: [DISCUSS] KIP-905: Broker interceptors

Posted by Christo Lolov <ch...@gmail.com>.
Hello David,

Thank you for the proposal - it is an interesting read!

I have a few questions about it.

1. Can you take a stance on whether you are proposing the feature just for
producers or for consumers as well? If it is just for producers can we
remove references to consumers? If it is for both can you explicitly call
out the properties used to configure the consumers?

2. What about extending this to altering configurations? One shortcoming of
Kafka today is that a range of values of one configuration affect the range
of values another configuration can have, but the validation framework
within Kafka does not have the capability to make such checks.

3. Does it not make sense for the pattern which determines whether an
interceptor is to be applied or not to be a configuration? Otherwise if
there is a problem with the pattern I have to carry out a whole new
deployment since I need to change the code. In the same line of reasoning,
will there be a way I can query the cluster to understand what interceptors
are currently running and what patterns they are using? Otherwise how would
I know what is the cluster's current configuration?

4. Will there be any new metrics emitted by said interceptors (i.e. number
of records dropped, number of records processed per unit time)? If there
aren't how will I be able to determine the performance of my interceptors?
If I have multiple interceptors how will I be able to determine which one
is the bottleneck?

5. Will records pass through interceptors in the same order as the
interceptors are specified in the list or will there be another way to
specify the ordering?

6. What happens if you have a pipeline of interceptors and some of them are
the same, how will you handle loading the interceptors then? For example, I
can imagine someone saying filter everything above a value X, do some more
complex operation, filter everything above a value X, do another more
complex operation, filter everything above a value X etc.

Let me know your thoughts!

Best,
Christo



On Thu, 9 Feb 2023 at 19:28, David Mariassy <da...@gmail.com>
wrote:

> Hi everyone,
>
> I'd like to get a discussion going for KIP-905
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-905%3A+Broker+interceptors
> >,
> which proposes the addition of broker interceptors to the stack.
>
> The KIP contains the motivation, and lists the new public interfaces that
> this change would entail. Since my company had its quarterly hack days this
> week, I also took the liberty to throw together a first prototype of the
> proposed new feature here: https://github.com/apache/kafka/pull/13224.
>
> Looking forward to the group's feedback!
>
> Thanks,
> David
>