You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Colin McCabe <cm...@apache.org> on 2021/12/14 00:46:51 UTC

[VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Hi all,

I'd like to start the vote on KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw

The original DISCUSS thread is here:

https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq

Please take a look and vote if you can.

best,
Colin

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
On Mon, Jan 31, 2022, at 16:20, Jason Gustafson wrote:
> Hi Colin,
>
> The updates look good to me. I wanted to clarify a few details about the
> AUTHORIZER_NOT_READY error code:
>
> 1. Only the Authorizer implementation will know when it is finished
> initialization. I had imagined that any request received in this state
> which cannot be authorized based on static configuration (i.e super.users)
> would raise an AuthorizerNotReadyException (or something like that).  Is
> that what you had in mind?

Right. Currently only super.users will be allowed.

> 2. Do we need any version bumps of protocols to indicate compatibility with
> this error code? I think the main situation where we might see it is
> `Envelope` requests.
> 3. In the context of forwarding, I believe the plan was to not return
> AUTHORIZER_NOT_READY to clients. Instead, if the broker sees this error, it
> will backoff and retry the request. Is that right?
>

I restructured this part of the KIP a bit so that it discusses EnvelopeRequest specifically. I think this is the only request we want to add the new error code to. This also means that we don't need a new IBP, since the KRaft controllers have always relied on ApiVersions to determine what RPC formats to use.

I revised the KIP to make it more explicit that the broker should retry on getting AUTHORIZER_NOT_READY, that we don't need a new IBP, and that it applies only to EnvelopeRequest.

regards,
Colin

> Thanks,
> Jason
>
>
>
>
> On Mon, Jan 31, 2022 at 3:02 PM Colin McCabe <cm...@apache.org> wrote:
>
>> Hi all,
>>
>> Thanks, everyone. I've updated the KIP.
>>
>> I reorganized the design section to explain bootstrapping a bit better,
>> and put it first. (It was always weird that the design section came after
>> the interface changes section :) ). I also added early.start.listeners and
>> a section on the other configs (which are the same as in AclAuthorizer).
>>
>> best,
>> Colin
>>
>> On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote:
>> >> My intention here was to avoid exposing Kafka clients to
>> > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
>> internal
>> > components (such as the broker) then things are a lot simpler. Keep in
>> mind
>> > there are external management systems that assume that if the broker
>> ports
>> > accept traffic, the broker is "up". For example, Kubernetes operators,
>> > external dashboards and health checks, etc.
>> >
>> > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
>> > returned in Envelope responses (and only if the client principal is not a
>> > super user). The broker would retry the operation until the request
>> timeout
>> > after seeing this error, which is similar to how we handle controller
>> > unavailability in general. So the error is not seen by client requests.
>> It
>> > sounds like what you are suggesting instead is having a dedicated
>> listener
>> > for inter-cluster traffic? Can  you clarify the rules that we use to
>> decide
>> > which listener to send forwarded requests to?
>> >
>> >> We've discussed this issue in the past in the context of the broker.
>> > Should we fence if the broker can't apply metadata for a certain amount
>> of
>> > time? It's controversial because fencing may add more load to an already
>> > loaded cluster. However, without new metadata, we won't get new ACL
>> > changes, learn about new topics, etc. etc.
>> >
>> > Yep, I appreciate the concern. It is a bit different in the context of
>> the
>> > controller though. Without having access to the quorum, there is little a
>> > controller can do anyway. That is why we filed this issue:
>> > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
>> > would be desirable if the same approach could be used by both the broker
>> > and controller.
>> >
>> > -Jason
>> >
>> >
>> > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cm...@apache.org> wrote:
>> >
>> >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
>> >> > Hey Colin,
>> >> >
>> >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`,
>> then
>> >> > would we need `early.start.listeners`? This would give the
>> `Authorizer` a
>> >> > way to support partial startup, but it would be up to the
>> implementation
>> >> to
>> >> > decide whether to use it. For an `Authorizer` which depends only on an
>> >> > external service, there is no dependence on the metadata log, so
>> there is
>> >> > no reason to have partial startup. For the `StandardAuthorizer`, we
>> could
>> >> > let it return a completed future from `Authorizer.start` for all of
>> the
>> >> > controller listeners. Then we can let it authorize only super.user
>> >> > operations until the metadata log has been loaded (and raise
>> >> > AuthorizerNotReadyException for everything else until that time).
>> >> >
>> >>
>> >> Hi Jason,
>> >>
>> >> My intention here was to avoid exposing Kafka clients to
>> >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
>> internal
>> >> components (such as the broker) then things are a lot simpler. Keep in
>> mind
>> >> there are external management systems that assume that if the broker
>> ports
>> >> accept traffic, the broker is "up". For example, Kubernetes operators,
>> >> external dashboards and health checks, etc.
>> >>
>> >> We can continue to uphold the guarantee that the broker is up if the
>> port
>> >> is responsive if we annotate only internal ports (like the controller
>> port,
>> >> or possibly inter-broker port) as "early start." This also helps with
>> the
>> >> case where people have some internal topic they want to be able to read
>> or
>> >> write from before turning on client traffic. For example, some people
>> send
>> >> Kafka metric data to an internal Kafka topic.
>> >>
>> >> >> Leaving aside the preceding discussion, do you agree with starting up
>> >> all
>> >> > endpoints (including non-early start ones) once we load a metadata
>> >> > snapshot? How feasible would it be for us to get a callback from the
>> Raft
>> >> > layer the first time we caught up to the last stable offset? (we only
>> >> want
>> >> > the callback the first time, not any other time). (I think the
>> metadata
>> >> > shell also would want something like this, at least as an option).
>> >> >
>> >> > It is certainly possible to get a callback when we've reached the high
>> >> > watermark, which seems like what we'd want. This does make me wonder
>> how
>> >> we
>> >> > would handle the case when a controller gets partitioned from the
>> rest of
>> >> > the quorum for some time. Should there be some point at which the
>> >> > controller no longer accepts authorization requests because of the
>> >> > potential staleness of the ACLs? Perhaps we could even reuse the
>> >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
>> >> words,
>> >> > maybe the readiness condition is more of a dynamic assertion that we
>> have
>> >> > caught up to near the current end of the log.
>> >> >
>> >>
>> >> We've discussed this issue in the past in the context of the broker.
>> >> Should we fence if the broker can't apply metadata for a certain amount
>> of
>> >> time? It's controversial because fencing may add more load to an already
>> >> loaded cluster. However, without new metadata, we won't get new ACL
>> >> changes, learn about new topics, etc. etc.
>> >>
>> >> This is probably something we should discuss in a separate KIP, since
>> it's
>> >> a bigger issue about how we handle nodes with stale metadata. And of
>> course
>> >> this issue does exist in the current ZK implementation -- you can talk
>> to a
>> >> broker that has been fenced from ZK for days, and there you are relying
>> on
>> >> whatever ACLs were in place when it could last talk to ZK. If we
>> implement
>> >> bounded staleness for KRaft, we may want to consider doing the same for
>> ZK,
>> >> for consistency.
>> >>
>> >> Another approach, which I suspect is more realistic in practice, is to
>> >> have a metric tracking staleness and fire off an alert when it grows too
>> >> high. Making the stale node inaccessible is sort of the nuclear option,
>> and
>> >> may tend to make a bad situation worse in practice.
>> >>
>> >> best,
>> >> Colin
>> >>
>> >>
>> >> >
>> >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org>
>> >> wrote:
>> >> >
>> >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> >> >> > Hi Colin,
>> >> >> >
>> >> >> > Thanks for the KIP. A couple of questions:
>> >> >> >
>> >> >> > 1) The KIP says:
>> >> >> > *However, when the controller is active, its Authorizer state may
>> be
>> >> >> > slightly ahead of the the broker's Authorizer state. This will
>> happen
>> >> in
>> >> >> > the time between when a new ACL is created (or deleted) and the
>> time
>> >> that
>> >> >> > this change is persisted durably by a majority of controller quorum
>> >> >> peers.*
>> >> >> >
>> >> >> > Once the ACL takes effect on the controller, do we guarantee that
>> it
>> >> will
>> >> >> > be applied everywhere or can there be scenarios where the
>> controller
>> >> has
>> >> >> > applied the change, but others don't due to a subsequent failure in
>> >> >> > persisting the ACL changes?
>> >> >>
>> >> >> Hi Rajini,
>> >> >>
>> >> >> Thanks for taking a look.
>> >> >>
>> >> >> Good question. In general, for any new metadata record (not just ACL
>> >> >> records), there are two fates:
>> >> >> 1. the record will be persisted to the raft quorum.
>> >> >> 2. the raft leader will fail and a new leader will be elected that
>> >> doesn't
>> >> >> have the new record.
>> >> >>
>> >> >> In the case of #2, the standby controllers and the brokers will never
>> >> >> apply the record.
>> >> >>
>> >> >> In general the active controller always rolls back its state to the
>> last
>> >> >> committed offset if there is a failure and it loses the leadership.
>> So
>> >> this
>> >> >> isn't really unique to the KRaft Authorizer, it just seemed worth
>> >> pointing
>> >> >> out since there will be a brief period when the active controller is
>> >> >> "ahead."
>> >> >>
>> >> >> We do try pretty hard to apply KIP-801 ACL records in order so that
>> the
>> >> >> states you will see on brokers and standby controllers will always be
>> >> valid
>> >> >> states from some point in the past timeline. The consistency here
>> >> should be
>> >> >> at least as good as the current system.
>> >> >>
>> >> >> >
>> >> >> > 2) Have we considered using a different config with limited
>> privileges
>> >> >> for
>> >> >> > bootstrapping instead of the all-powerful *super.users*?
>> >> >> >
>> >> >>
>> >> >> That's an interesting idea, but I'm not sure how much we could limit
>> it.
>> >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER
>> in
>> >> >> order to do most of what they do. This might be something we could
>> >> explore
>> >> >> in a future KIP since it's pretty cross-cutting (if we had such a
>> >> limited
>> >> >> bootstrapping user, all the authorizers could implement it, not just
>> the
>> >> >> KIP-801 one...)
>> >> >>
>> >> >> best,
>> >> >> Colin
>> >> >>
>> >> >> >
>> >> >> > Regards,
>> >> >> >
>> >> >> > Rajini
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org>
>> >> wrote:
>> >> >> >
>> >> >> >> How about this:
>> >> >> >>
>> >> >> >> We create a configuration key called early.start.listeners which
>> >> >> contains
>> >> >> >> a list of listener names. If this is not specified, its value
>> >> defaults
>> >> >> to
>> >> >> >> just the controller listeners. Optionally, other listeners can be
>> >> added
>> >> >> too.
>> >> >> >>
>> >> >> >> If super.users contains any user names, early start listeners will
>> >> start
>> >> >> >> immediately. In the beginning they only authorize users that are
>> in
>> >> >> >> super.users. All other listeners receive a new error code,
>> >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any
>> user
>> >> >> names,
>> >> >> >> then early start listeners will not be treated differently than
>> other
>> >> >> >> listeners.
>> >> >> >>
>> >> >> >> This will allow the controller listeners to get started
>> immediately
>> >> if
>> >> >> the
>> >> >> >> broker user is in super.users, which will speed up startup. It
>> also
>> >> >> will be
>> >> >> >> useful for breaking chicken/egg cycles like needing to pull the
>> SCRAM
>> >> >> >> metadata to authorize pulling the SCRAM metadata.
>> >> >> >>
>> >> >> >> There are still a few use cases where super.users won't be
>> required,
>> >> but
>> >> >> >> it may be useful in many cases to have this early start
>> >> functionality.
>> >> >> >>
>> >> >> >> Leaving aside the preceding discussion, do you agree with
>> starting up
>> >> >> all
>> >> >> >> endpoints (including non-early start ones) once we load a metadata
>> >> >> >> snapshot? How feasible would it be for us to get a callback from
>> the
>> >> >> Raft
>> >> >> >> layer the first time we caught up to the last stable offset? (we
>> only
>> >> >> want
>> >> >> >> the callback the first time, not any other time). (I think the
>> >> metadata
>> >> >> >> shell also would want something like this, at least as an option).
>> >> >> >>
>> >> >> >> best,
>> >> >> >> Colin
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> >> >> > Hi Colin,
>> >> >> >> >
>> >> >> >> > Thanks for the writeup. I had one question about bootstrapping.
>> For
>> >> >> the
>> >> >> >> > brokers, I understand that listener startup is delayed until the
>> >> >> >> Authorizer
>> >> >> >> > is ready. However, I was not very clear how this would work for
>> the
>> >> >> >> > controller listeners. We may need them to startup before the
>> >> metadata
>> >> >> log
>> >> >> >> > is ready so that a quorum can be established (as noted in the
>> KIP).
>> >> >> This
>> >> >> >> > works fine if we assume that the controller principals are among
>> >> >> >> > `super.users`. For requests forwarded from brokers, on the other
>> >> >> hand, we
>> >> >> >> > need to ensure the ACLs have been loaded properly before we
>> begin
>> >> >> >> > authorizing. The problem is that we currently use the same
>> listener
>> >> >> for
>> >> >> >> > quorum requests and for forwarded requests. So my question is
>> how
>> >> does
>> >> >> >> the
>> >> >> >> > Authorizer communicate to the controller when it is safe to
>> begin
>> >> >> >> > authorizing different request types?
>> >> >> >> >
>> >> >> >> > There are a couple ways I can see this working. First, we could
>> >> allow
>> >> >> the
>> >> >> >> > user to configure the listener used for forwarded requests
>> >> separately.
>> >> >> >> That
>> >> >> >> > would work with the existing `Authorizer.start` API.
>> Alternatively,
>> >> >> >> perhaps
>> >> >> >> > we could modify `Authorizer.start` to work with something more
>> >> >> granular
>> >> >> >> > than `EndPoint`. This would allow the controller to begin
>> accepting
>> >> >> >> > requests from the other quorum members before it is ready to
>> >> authorize
>> >> >> >> > forwarded requests from clients.  Then we would need some way to
>> >> let
>> >> >> >> > brokers know when the controller is ready to accept these
>> forwarded
>> >> >> >> > requests (e.g. through an error code in the `Envelope`
>> response).
>> >> >> >> >
>> >> >> >> > What do you think?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Jason
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> >> >> > <da...@confluent.io.invalid> wrote:
>> >> >> >> >
>> >> >> >> >> +1 binding, thanks Colin!
>> >> >> >> >>
>> >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <
>> cmccabe@apache.org>
>> >> >> >> wrote:
>> >> >> >> >>
>> >> >> >> >> > Hi all,
>> >> >> >> >> >
>> >> >> >> >> > I'd like to start the vote on KIP-801: Implement an
>> Authorizer
>> >> that
>> >> >> >> >> stores
>> >> >> >> >> > metadata in __cluster_metadata
>> >> >> >> >> >
>> >> >> >> >> > The KIP is here:
>> https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >> >> >
>> >> >> >> >> > The original DISCUSS thread is here:
>> >> >> >> >> >
>> >> >> >> >> >
>> >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >> >> >
>> >> >> >> >> > Please take a look and vote if you can.
>> >> >> >> >> >
>> >> >> >> >> > best,
>> >> >> >> >> > Colin
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> -David
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
Hi Colin,

The updates look good to me. I wanted to clarify a few details about the
AUTHORIZER_NOT_READY error code:

1. Only the Authorizer implementation will know when it is finished
initialization. I had imagined that any request received in this state
which cannot be authorized based on static configuration (i.e super.users)
would raise an AuthorizerNotReadyException (or something like that).  Is
that what you had in mind?
2. Do we need any version bumps of protocols to indicate compatibility with
this error code? I think the main situation where we might see it is
`Envelope` requests.
3. In the context of forwarding, I believe the plan was to not return
AUTHORIZER_NOT_READY to clients. Instead, if the broker sees this error, it
will backoff and retry the request. Is that right?

Thanks,
Jason




On Mon, Jan 31, 2022 at 3:02 PM Colin McCabe <cm...@apache.org> wrote:

> Hi all,
>
> Thanks, everyone. I've updated the KIP.
>
> I reorganized the design section to explain bootstrapping a bit better,
> and put it first. (It was always weird that the design section came after
> the interface changes section :) ). I also added early.start.listeners and
> a section on the other configs (which are the same as in AclAuthorizer).
>
> best,
> Colin
>
> On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote:
> >> My intention here was to avoid exposing Kafka clients to
> > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
> internal
> > components (such as the broker) then things are a lot simpler. Keep in
> mind
> > there are external management systems that assume that if the broker
> ports
> > accept traffic, the broker is "up". For example, Kubernetes operators,
> > external dashboards and health checks, etc.
> >
> > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
> > returned in Envelope responses (and only if the client principal is not a
> > super user). The broker would retry the operation until the request
> timeout
> > after seeing this error, which is similar to how we handle controller
> > unavailability in general. So the error is not seen by client requests.
> It
> > sounds like what you are suggesting instead is having a dedicated
> listener
> > for inter-cluster traffic? Can  you clarify the rules that we use to
> decide
> > which listener to send forwarded requests to?
> >
> >> We've discussed this issue in the past in the context of the broker.
> > Should we fence if the broker can't apply metadata for a certain amount
> of
> > time? It's controversial because fencing may add more load to an already
> > loaded cluster. However, without new metadata, we won't get new ACL
> > changes, learn about new topics, etc. etc.
> >
> > Yep, I appreciate the concern. It is a bit different in the context of
> the
> > controller though. Without having access to the quorum, there is little a
> > controller can do anyway. That is why we filed this issue:
> > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
> > would be desirable if the same approach could be used by both the broker
> > and controller.
> >
> > -Jason
> >
> >
> > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cm...@apache.org> wrote:
> >
> >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
> >> > Hey Colin,
> >> >
> >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`,
> then
> >> > would we need `early.start.listeners`? This would give the
> `Authorizer` a
> >> > way to support partial startup, but it would be up to the
> implementation
> >> to
> >> > decide whether to use it. For an `Authorizer` which depends only on an
> >> > external service, there is no dependence on the metadata log, so
> there is
> >> > no reason to have partial startup. For the `StandardAuthorizer`, we
> could
> >> > let it return a completed future from `Authorizer.start` for all of
> the
> >> > controller listeners. Then we can let it authorize only super.user
> >> > operations until the metadata log has been loaded (and raise
> >> > AuthorizerNotReadyException for everything else until that time).
> >> >
> >>
> >> Hi Jason,
> >>
> >> My intention here was to avoid exposing Kafka clients to
> >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
> internal
> >> components (such as the broker) then things are a lot simpler. Keep in
> mind
> >> there are external management systems that assume that if the broker
> ports
> >> accept traffic, the broker is "up". For example, Kubernetes operators,
> >> external dashboards and health checks, etc.
> >>
> >> We can continue to uphold the guarantee that the broker is up if the
> port
> >> is responsive if we annotate only internal ports (like the controller
> port,
> >> or possibly inter-broker port) as "early start." This also helps with
> the
> >> case where people have some internal topic they want to be able to read
> or
> >> write from before turning on client traffic. For example, some people
> send
> >> Kafka metric data to an internal Kafka topic.
> >>
> >> >> Leaving aside the preceding discussion, do you agree with starting up
> >> all
> >> > endpoints (including non-early start ones) once we load a metadata
> >> > snapshot? How feasible would it be for us to get a callback from the
> Raft
> >> > layer the first time we caught up to the last stable offset? (we only
> >> want
> >> > the callback the first time, not any other time). (I think the
> metadata
> >> > shell also would want something like this, at least as an option).
> >> >
> >> > It is certainly possible to get a callback when we've reached the high
> >> > watermark, which seems like what we'd want. This does make me wonder
> how
> >> we
> >> > would handle the case when a controller gets partitioned from the
> rest of
> >> > the quorum for some time. Should there be some point at which the
> >> > controller no longer accepts authorization requests because of the
> >> > potential staleness of the ACLs? Perhaps we could even reuse the
> >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
> >> words,
> >> > maybe the readiness condition is more of a dynamic assertion that we
> have
> >> > caught up to near the current end of the log.
> >> >
> >>
> >> We've discussed this issue in the past in the context of the broker.
> >> Should we fence if the broker can't apply metadata for a certain amount
> of
> >> time? It's controversial because fencing may add more load to an already
> >> loaded cluster. However, without new metadata, we won't get new ACL
> >> changes, learn about new topics, etc. etc.
> >>
> >> This is probably something we should discuss in a separate KIP, since
> it's
> >> a bigger issue about how we handle nodes with stale metadata. And of
> course
> >> this issue does exist in the current ZK implementation -- you can talk
> to a
> >> broker that has been fenced from ZK for days, and there you are relying
> on
> >> whatever ACLs were in place when it could last talk to ZK. If we
> implement
> >> bounded staleness for KRaft, we may want to consider doing the same for
> ZK,
> >> for consistency.
> >>
> >> Another approach, which I suspect is more realistic in practice, is to
> >> have a metric tracking staleness and fire off an alert when it grows too
> >> high. Making the stale node inaccessible is sort of the nuclear option,
> and
> >> may tend to make a bad situation worse in practice.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> >
> >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> >
> >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
> >> >> > Hi Colin,
> >> >> >
> >> >> > Thanks for the KIP. A couple of questions:
> >> >> >
> >> >> > 1) The KIP says:
> >> >> > *However, when the controller is active, its Authorizer state may
> be
> >> >> > slightly ahead of the the broker's Authorizer state. This will
> happen
> >> in
> >> >> > the time between when a new ACL is created (or deleted) and the
> time
> >> that
> >> >> > this change is persisted durably by a majority of controller quorum
> >> >> peers.*
> >> >> >
> >> >> > Once the ACL takes effect on the controller, do we guarantee that
> it
> >> will
> >> >> > be applied everywhere or can there be scenarios where the
> controller
> >> has
> >> >> > applied the change, but others don't due to a subsequent failure in
> >> >> > persisting the ACL changes?
> >> >>
> >> >> Hi Rajini,
> >> >>
> >> >> Thanks for taking a look.
> >> >>
> >> >> Good question. In general, for any new metadata record (not just ACL
> >> >> records), there are two fates:
> >> >> 1. the record will be persisted to the raft quorum.
> >> >> 2. the raft leader will fail and a new leader will be elected that
> >> doesn't
> >> >> have the new record.
> >> >>
> >> >> In the case of #2, the standby controllers and the brokers will never
> >> >> apply the record.
> >> >>
> >> >> In general the active controller always rolls back its state to the
> last
> >> >> committed offset if there is a failure and it loses the leadership.
> So
> >> this
> >> >> isn't really unique to the KRaft Authorizer, it just seemed worth
> >> pointing
> >> >> out since there will be a brief period when the active controller is
> >> >> "ahead."
> >> >>
> >> >> We do try pretty hard to apply KIP-801 ACL records in order so that
> the
> >> >> states you will see on brokers and standby controllers will always be
> >> valid
> >> >> states from some point in the past timeline. The consistency here
> >> should be
> >> >> at least as good as the current system.
> >> >>
> >> >> >
> >> >> > 2) Have we considered using a different config with limited
> privileges
> >> >> for
> >> >> > bootstrapping instead of the all-powerful *super.users*?
> >> >> >
> >> >>
> >> >> That's an interesting idea, but I'm not sure how much we could limit
> it.
> >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER
> in
> >> >> order to do most of what they do. This might be something we could
> >> explore
> >> >> in a future KIP since it's pretty cross-cutting (if we had such a
> >> limited
> >> >> bootstrapping user, all the authorizers could implement it, not just
> the
> >> >> KIP-801 one...)
> >> >>
> >> >> best,
> >> >> Colin
> >> >>
> >> >> >
> >> >> > Regards,
> >> >> >
> >> >> > Rajini
> >> >> >
> >> >> >
> >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> >> >
> >> >> >> How about this:
> >> >> >>
> >> >> >> We create a configuration key called early.start.listeners which
> >> >> contains
> >> >> >> a list of listener names. If this is not specified, its value
> >> defaults
> >> >> to
> >> >> >> just the controller listeners. Optionally, other listeners can be
> >> added
> >> >> too.
> >> >> >>
> >> >> >> If super.users contains any user names, early start listeners will
> >> start
> >> >> >> immediately. In the beginning they only authorize users that are
> in
> >> >> >> super.users. All other listeners receive a new error code,
> >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any
> user
> >> >> names,
> >> >> >> then early start listeners will not be treated differently than
> other
> >> >> >> listeners.
> >> >> >>
> >> >> >> This will allow the controller listeners to get started
> immediately
> >> if
> >> >> the
> >> >> >> broker user is in super.users, which will speed up startup. It
> also
> >> >> will be
> >> >> >> useful for breaking chicken/egg cycles like needing to pull the
> SCRAM
> >> >> >> metadata to authorize pulling the SCRAM metadata.
> >> >> >>
> >> >> >> There are still a few use cases where super.users won't be
> required,
> >> but
> >> >> >> it may be useful in many cases to have this early start
> >> functionality.
> >> >> >>
> >> >> >> Leaving aside the preceding discussion, do you agree with
> starting up
> >> >> all
> >> >> >> endpoints (including non-early start ones) once we load a metadata
> >> >> >> snapshot? How feasible would it be for us to get a callback from
> the
> >> >> Raft
> >> >> >> layer the first time we caught up to the last stable offset? (we
> only
> >> >> want
> >> >> >> the callback the first time, not any other time). (I think the
> >> metadata
> >> >> >> shell also would want something like this, at least as an option).
> >> >> >>
> >> >> >> best,
> >> >> >> Colin
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
> >> >> >> > Hi Colin,
> >> >> >> >
> >> >> >> > Thanks for the writeup. I had one question about bootstrapping.
> For
> >> >> the
> >> >> >> > brokers, I understand that listener startup is delayed until the
> >> >> >> Authorizer
> >> >> >> > is ready. However, I was not very clear how this would work for
> the
> >> >> >> > controller listeners. We may need them to startup before the
> >> metadata
> >> >> log
> >> >> >> > is ready so that a quorum can be established (as noted in the
> KIP).
> >> >> This
> >> >> >> > works fine if we assume that the controller principals are among
> >> >> >> > `super.users`. For requests forwarded from brokers, on the other
> >> >> hand, we
> >> >> >> > need to ensure the ACLs have been loaded properly before we
> begin
> >> >> >> > authorizing. The problem is that we currently use the same
> listener
> >> >> for
> >> >> >> > quorum requests and for forwarded requests. So my question is
> how
> >> does
> >> >> >> the
> >> >> >> > Authorizer communicate to the controller when it is safe to
> begin
> >> >> >> > authorizing different request types?
> >> >> >> >
> >> >> >> > There are a couple ways I can see this working. First, we could
> >> allow
> >> >> the
> >> >> >> > user to configure the listener used for forwarded requests
> >> separately.
> >> >> >> That
> >> >> >> > would work with the existing `Authorizer.start` API.
> Alternatively,
> >> >> >> perhaps
> >> >> >> > we could modify `Authorizer.start` to work with something more
> >> >> granular
> >> >> >> > than `EndPoint`. This would allow the controller to begin
> accepting
> >> >> >> > requests from the other quorum members before it is ready to
> >> authorize
> >> >> >> > forwarded requests from clients.  Then we would need some way to
> >> let
> >> >> >> > brokers know when the controller is ready to accept these
> forwarded
> >> >> >> > requests (e.g. through an error code in the `Envelope`
> response).
> >> >> >> >
> >> >> >> > What do you think?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Jason
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
> >> >> >> > <da...@confluent.io.invalid> wrote:
> >> >> >> >
> >> >> >> >> +1 binding, thanks Colin!
> >> >> >> >>
> >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <
> cmccabe@apache.org>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Hi all,
> >> >> >> >> >
> >> >> >> >> > I'd like to start the vote on KIP-801: Implement an
> Authorizer
> >> that
> >> >> >> >> stores
> >> >> >> >> > metadata in __cluster_metadata
> >> >> >> >> >
> >> >> >> >> > The KIP is here:
> https://cwiki.apache.org/confluence/x/h5KqCw
> >> >> >> >> >
> >> >> >> >> > The original DISCUSS thread is here:
> >> >> >> >> >
> >> >> >> >> >
> >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
> >> >> >> >> >
> >> >> >> >> > Please take a look and vote if you can.
> >> >> >> >> >
> >> >> >> >> > best,
> >> >> >> >> > Colin
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> -David
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
Hi all,

Thanks, everyone. I've updated the KIP.

I reorganized the design section to explain bootstrapping a bit better, and put it first. (It was always weird that the design section came after the interface changes section :) ). I also added early.start.listeners and a section on the other configs (which are the same as in AclAuthorizer).

best,
Colin

On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote:
>> My intention here was to avoid exposing Kafka clients to
> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
> components (such as the broker) then things are a lot simpler. Keep in mind
> there are external management systems that assume that if the broker ports
> accept traffic, the broker is "up". For example, Kubernetes operators,
> external dashboards and health checks, etc.
>
> What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
> returned in Envelope responses (and only if the client principal is not a
> super user). The broker would retry the operation until the request timeout
> after seeing this error, which is similar to how we handle controller
> unavailability in general. So the error is not seen by client requests. It
> sounds like what you are suggesting instead is having a dedicated listener
> for inter-cluster traffic? Can  you clarify the rules that we use to decide
> which listener to send forwarded requests to?
>
>> We've discussed this issue in the past in the context of the broker.
> Should we fence if the broker can't apply metadata for a certain amount of
> time? It's controversial because fencing may add more load to an already
> loaded cluster. However, without new metadata, we won't get new ACL
> changes, learn about new topics, etc. etc.
>
> Yep, I appreciate the concern. It is a bit different in the context of the
> controller though. Without having access to the quorum, there is little a
> controller can do anyway. That is why we filed this issue:
> https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
> would be desirable if the same approach could be used by both the broker
> and controller.
>
> -Jason
>
>
> On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cm...@apache.org> wrote:
>
>> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
>> > Hey Colin,
>> >
>> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
>> > would we need `early.start.listeners`? This would give the `Authorizer` a
>> > way to support partial startup, but it would be up to the implementation
>> to
>> > decide whether to use it. For an `Authorizer` which depends only on an
>> > external service, there is no dependence on the metadata log, so there is
>> > no reason to have partial startup. For the `StandardAuthorizer`, we could
>> > let it return a completed future from `Authorizer.start` for all of the
>> > controller listeners. Then we can let it authorize only super.user
>> > operations until the metadata log has been loaded (and raise
>> > AuthorizerNotReadyException for everything else until that time).
>> >
>>
>> Hi Jason,
>>
>> My intention here was to avoid exposing Kafka clients to
>> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
>> components (such as the broker) then things are a lot simpler. Keep in mind
>> there are external management systems that assume that if the broker ports
>> accept traffic, the broker is "up". For example, Kubernetes operators,
>> external dashboards and health checks, etc.
>>
>> We can continue to uphold the guarantee that the broker is up if the port
>> is responsive if we annotate only internal ports (like the controller port,
>> or possibly inter-broker port) as "early start." This also helps with the
>> case where people have some internal topic they want to be able to read or
>> write from before turning on client traffic. For example, some people send
>> Kafka metric data to an internal Kafka topic.
>>
>> >> Leaving aside the preceding discussion, do you agree with starting up
>> all
>> > endpoints (including non-early start ones) once we load a metadata
>> > snapshot? How feasible would it be for us to get a callback from the Raft
>> > layer the first time we caught up to the last stable offset? (we only
>> want
>> > the callback the first time, not any other time). (I think the metadata
>> > shell also would want something like this, at least as an option).
>> >
>> > It is certainly possible to get a callback when we've reached the high
>> > watermark, which seems like what we'd want. This does make me wonder how
>> we
>> > would handle the case when a controller gets partitioned from the rest of
>> > the quorum for some time. Should there be some point at which the
>> > controller no longer accepts authorization requests because of the
>> > potential staleness of the ACLs? Perhaps we could even reuse the
>> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
>> words,
>> > maybe the readiness condition is more of a dynamic assertion that we have
>> > caught up to near the current end of the log.
>> >
>>
>> We've discussed this issue in the past in the context of the broker.
>> Should we fence if the broker can't apply metadata for a certain amount of
>> time? It's controversial because fencing may add more load to an already
>> loaded cluster. However, without new metadata, we won't get new ACL
>> changes, learn about new topics, etc. etc.
>>
>> This is probably something we should discuss in a separate KIP, since it's
>> a bigger issue about how we handle nodes with stale metadata. And of course
>> this issue does exist in the current ZK implementation -- you can talk to a
>> broker that has been fenced from ZK for days, and there you are relying on
>> whatever ACLs were in place when it could last talk to ZK. If we implement
>> bounded staleness for KRaft, we may want to consider doing the same for ZK,
>> for consistency.
>>
>> Another approach, which I suspect is more realistic in practice, is to
>> have a metric tracking staleness and fire off an alert when it grows too
>> high. Making the stale node inaccessible is sort of the nuclear option, and
>> may tend to make a bad situation worse in practice.
>>
>> best,
>> Colin
>>
>>
>> >
>> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org>
>> wrote:
>> >
>> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> >> > Hi Colin,
>> >> >
>> >> > Thanks for the KIP. A couple of questions:
>> >> >
>> >> > 1) The KIP says:
>> >> > *However, when the controller is active, its Authorizer state may be
>> >> > slightly ahead of the the broker's Authorizer state. This will happen
>> in
>> >> > the time between when a new ACL is created (or deleted) and the time
>> that
>> >> > this change is persisted durably by a majority of controller quorum
>> >> peers.*
>> >> >
>> >> > Once the ACL takes effect on the controller, do we guarantee that it
>> will
>> >> > be applied everywhere or can there be scenarios where the controller
>> has
>> >> > applied the change, but others don't due to a subsequent failure in
>> >> > persisting the ACL changes?
>> >>
>> >> Hi Rajini,
>> >>
>> >> Thanks for taking a look.
>> >>
>> >> Good question. In general, for any new metadata record (not just ACL
>> >> records), there are two fates:
>> >> 1. the record will be persisted to the raft quorum.
>> >> 2. the raft leader will fail and a new leader will be elected that
>> doesn't
>> >> have the new record.
>> >>
>> >> In the case of #2, the standby controllers and the brokers will never
>> >> apply the record.
>> >>
>> >> In general the active controller always rolls back its state to the last
>> >> committed offset if there is a failure and it loses the leadership. So
>> this
>> >> isn't really unique to the KRaft Authorizer, it just seemed worth
>> pointing
>> >> out since there will be a brief period when the active controller is
>> >> "ahead."
>> >>
>> >> We do try pretty hard to apply KIP-801 ACL records in order so that the
>> >> states you will see on brokers and standby controllers will always be
>> valid
>> >> states from some point in the past timeline. The consistency here
>> should be
>> >> at least as good as the current system.
>> >>
>> >> >
>> >> > 2) Have we considered using a different config with limited privileges
>> >> for
>> >> > bootstrapping instead of the all-powerful *super.users*?
>> >> >
>> >>
>> >> That's an interesting idea, but I'm not sure how much we could limit it.
>> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
>> >> order to do most of what they do. This might be something we could
>> explore
>> >> in a future KIP since it's pretty cross-cutting (if we had such a
>> limited
>> >> bootstrapping user, all the authorizers could implement it, not just the
>> >> KIP-801 one...)
>> >>
>> >> best,
>> >> Colin
>> >>
>> >> >
>> >> > Regards,
>> >> >
>> >> > Rajini
>> >> >
>> >> >
>> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org>
>> wrote:
>> >> >
>> >> >> How about this:
>> >> >>
>> >> >> We create a configuration key called early.start.listeners which
>> >> contains
>> >> >> a list of listener names. If this is not specified, its value
>> defaults
>> >> to
>> >> >> just the controller listeners. Optionally, other listeners can be
>> added
>> >> too.
>> >> >>
>> >> >> If super.users contains any user names, early start listeners will
>> start
>> >> >> immediately. In the beginning they only authorize users that are in
>> >> >> super.users. All other listeners receive a new error code,
>> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
>> >> names,
>> >> >> then early start listeners will not be treated differently than other
>> >> >> listeners.
>> >> >>
>> >> >> This will allow the controller listeners to get started immediately
>> if
>> >> the
>> >> >> broker user is in super.users, which will speed up startup. It also
>> >> will be
>> >> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
>> >> >> metadata to authorize pulling the SCRAM metadata.
>> >> >>
>> >> >> There are still a few use cases where super.users won't be required,
>> but
>> >> >> it may be useful in many cases to have this early start
>> functionality.
>> >> >>
>> >> >> Leaving aside the preceding discussion, do you agree with starting up
>> >> all
>> >> >> endpoints (including non-early start ones) once we load a metadata
>> >> >> snapshot? How feasible would it be for us to get a callback from the
>> >> Raft
>> >> >> layer the first time we caught up to the last stable offset? (we only
>> >> want
>> >> >> the callback the first time, not any other time). (I think the
>> metadata
>> >> >> shell also would want something like this, at least as an option).
>> >> >>
>> >> >> best,
>> >> >> Colin
>> >> >>
>> >> >>
>> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> >> > Hi Colin,
>> >> >> >
>> >> >> > Thanks for the writeup. I had one question about bootstrapping. For
>> >> the
>> >> >> > brokers, I understand that listener startup is delayed until the
>> >> >> Authorizer
>> >> >> > is ready. However, I was not very clear how this would work for the
>> >> >> > controller listeners. We may need them to startup before the
>> metadata
>> >> log
>> >> >> > is ready so that a quorum can be established (as noted in the KIP).
>> >> This
>> >> >> > works fine if we assume that the controller principals are among
>> >> >> > `super.users`. For requests forwarded from brokers, on the other
>> >> hand, we
>> >> >> > need to ensure the ACLs have been loaded properly before we begin
>> >> >> > authorizing. The problem is that we currently use the same listener
>> >> for
>> >> >> > quorum requests and for forwarded requests. So my question is how
>> does
>> >> >> the
>> >> >> > Authorizer communicate to the controller when it is safe to begin
>> >> >> > authorizing different request types?
>> >> >> >
>> >> >> > There are a couple ways I can see this working. First, we could
>> allow
>> >> the
>> >> >> > user to configure the listener used for forwarded requests
>> separately.
>> >> >> That
>> >> >> > would work with the existing `Authorizer.start` API. Alternatively,
>> >> >> perhaps
>> >> >> > we could modify `Authorizer.start` to work with something more
>> >> granular
>> >> >> > than `EndPoint`. This would allow the controller to begin accepting
>> >> >> > requests from the other quorum members before it is ready to
>> authorize
>> >> >> > forwarded requests from clients.  Then we would need some way to
>> let
>> >> >> > brokers know when the controller is ready to accept these forwarded
>> >> >> > requests (e.g. through an error code in the `Envelope` response).
>> >> >> >
>> >> >> > What do you think?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Jason
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> >> > <da...@confluent.io.invalid> wrote:
>> >> >> >
>> >> >> >> +1 binding, thanks Colin!
>> >> >> >>
>> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Hi all,
>> >> >> >> >
>> >> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer
>> that
>> >> >> >> stores
>> >> >> >> > metadata in __cluster_metadata
>> >> >> >> >
>> >> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >> >
>> >> >> >> > The original DISCUSS thread is here:
>> >> >> >> >
>> >> >> >> >
>> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >> >
>> >> >> >> > Please take a look and vote if you can.
>> >> >> >> >
>> >> >> >> > best,
>> >> >> >> > Colin
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> -David
>> >> >> >>
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
> My intention here was to avoid exposing Kafka clients to
AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
components (such as the broker) then things are a lot simpler. Keep in mind
there are external management systems that assume that if the broker ports
accept traffic, the broker is "up". For example, Kubernetes operators,
external dashboards and health checks, etc.

What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
returned in Envelope responses (and only if the client principal is not a
super user). The broker would retry the operation until the request timeout
after seeing this error, which is similar to how we handle controller
unavailability in general. So the error is not seen by client requests. It
sounds like what you are suggesting instead is having a dedicated listener
for inter-cluster traffic? Can  you clarify the rules that we use to decide
which listener to send forwarded requests to?

> We've discussed this issue in the past in the context of the broker.
Should we fence if the broker can't apply metadata for a certain amount of
time? It's controversial because fencing may add more load to an already
loaded cluster. However, without new metadata, we won't get new ACL
changes, learn about new topics, etc. etc.

Yep, I appreciate the concern. It is a bit different in the context of the
controller though. Without having access to the quorum, there is little a
controller can do anyway. That is why we filed this issue:
https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
would be desirable if the same approach could be used by both the broker
and controller.

-Jason


On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cm...@apache.org> wrote:

> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
> > Hey Colin,
> >
> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
> > would we need `early.start.listeners`? This would give the `Authorizer` a
> > way to support partial startup, but it would be up to the implementation
> to
> > decide whether to use it. For an `Authorizer` which depends only on an
> > external service, there is no dependence on the metadata log, so there is
> > no reason to have partial startup. For the `StandardAuthorizer`, we could
> > let it return a completed future from `Authorizer.start` for all of the
> > controller listeners. Then we can let it authorize only super.user
> > operations until the metadata log has been loaded (and raise
> > AuthorizerNotReadyException for everything else until that time).
> >
>
> Hi Jason,
>
> My intention here was to avoid exposing Kafka clients to
> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
> components (such as the broker) then things are a lot simpler. Keep in mind
> there are external management systems that assume that if the broker ports
> accept traffic, the broker is "up". For example, Kubernetes operators,
> external dashboards and health checks, etc.
>
> We can continue to uphold the guarantee that the broker is up if the port
> is responsive if we annotate only internal ports (like the controller port,
> or possibly inter-broker port) as "early start." This also helps with the
> case where people have some internal topic they want to be able to read or
> write from before turning on client traffic. For example, some people send
> Kafka metric data to an internal Kafka topic.
>
> >> Leaving aside the preceding discussion, do you agree with starting up
> all
> > endpoints (including non-early start ones) once we load a metadata
> > snapshot? How feasible would it be for us to get a callback from the Raft
> > layer the first time we caught up to the last stable offset? (we only
> want
> > the callback the first time, not any other time). (I think the metadata
> > shell also would want something like this, at least as an option).
> >
> > It is certainly possible to get a callback when we've reached the high
> > watermark, which seems like what we'd want. This does make me wonder how
> we
> > would handle the case when a controller gets partitioned from the rest of
> > the quorum for some time. Should there be some point at which the
> > controller no longer accepts authorization requests because of the
> > potential staleness of the ACLs? Perhaps we could even reuse the
> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
> words,
> > maybe the readiness condition is more of a dynamic assertion that we have
> > caught up to near the current end of the log.
> >
>
> We've discussed this issue in the past in the context of the broker.
> Should we fence if the broker can't apply metadata for a certain amount of
> time? It's controversial because fencing may add more load to an already
> loaded cluster. However, without new metadata, we won't get new ACL
> changes, learn about new topics, etc. etc.
>
> This is probably something we should discuss in a separate KIP, since it's
> a bigger issue about how we handle nodes with stale metadata. And of course
> this issue does exist in the current ZK implementation -- you can talk to a
> broker that has been fenced from ZK for days, and there you are relying on
> whatever ACLs were in place when it could last talk to ZK. If we implement
> bounded staleness for KRaft, we may want to consider doing the same for ZK,
> for consistency.
>
> Another approach, which I suspect is more realistic in practice, is to
> have a metric tracking staleness and fire off an alert when it grows too
> high. Making the stale node inaccessible is sort of the nuclear option, and
> may tend to make a bad situation worse in practice.
>
> best,
> Colin
>
>
> >
> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org>
> wrote:
> >
> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
> >> > Hi Colin,
> >> >
> >> > Thanks for the KIP. A couple of questions:
> >> >
> >> > 1) The KIP says:
> >> > *However, when the controller is active, its Authorizer state may be
> >> > slightly ahead of the the broker's Authorizer state. This will happen
> in
> >> > the time between when a new ACL is created (or deleted) and the time
> that
> >> > this change is persisted durably by a majority of controller quorum
> >> peers.*
> >> >
> >> > Once the ACL takes effect on the controller, do we guarantee that it
> will
> >> > be applied everywhere or can there be scenarios where the controller
> has
> >> > applied the change, but others don't due to a subsequent failure in
> >> > persisting the ACL changes?
> >>
> >> Hi Rajini,
> >>
> >> Thanks for taking a look.
> >>
> >> Good question. In general, for any new metadata record (not just ACL
> >> records), there are two fates:
> >> 1. the record will be persisted to the raft quorum.
> >> 2. the raft leader will fail and a new leader will be elected that
> doesn't
> >> have the new record.
> >>
> >> In the case of #2, the standby controllers and the brokers will never
> >> apply the record.
> >>
> >> In general the active controller always rolls back its state to the last
> >> committed offset if there is a failure and it loses the leadership. So
> this
> >> isn't really unique to the KRaft Authorizer, it just seemed worth
> pointing
> >> out since there will be a brief period when the active controller is
> >> "ahead."
> >>
> >> We do try pretty hard to apply KIP-801 ACL records in order so that the
> >> states you will see on brokers and standby controllers will always be
> valid
> >> states from some point in the past timeline. The consistency here
> should be
> >> at least as good as the current system.
> >>
> >> >
> >> > 2) Have we considered using a different config with limited privileges
> >> for
> >> > bootstrapping instead of the all-powerful *super.users*?
> >> >
> >>
> >> That's an interesting idea, but I'm not sure how much we could limit it.
> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
> >> order to do most of what they do. This might be something we could
> explore
> >> in a future KIP since it's pretty cross-cutting (if we had such a
> limited
> >> bootstrapping user, all the authorizers could implement it, not just the
> >> KIP-801 one...)
> >>
> >> best,
> >> Colin
> >>
> >> >
> >> > Regards,
> >> >
> >> > Rajini
> >> >
> >> >
> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org>
> wrote:
> >> >
> >> >> How about this:
> >> >>
> >> >> We create a configuration key called early.start.listeners which
> >> contains
> >> >> a list of listener names. If this is not specified, its value
> defaults
> >> to
> >> >> just the controller listeners. Optionally, other listeners can be
> added
> >> too.
> >> >>
> >> >> If super.users contains any user names, early start listeners will
> start
> >> >> immediately. In the beginning they only authorize users that are in
> >> >> super.users. All other listeners receive a new error code,
> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
> >> names,
> >> >> then early start listeners will not be treated differently than other
> >> >> listeners.
> >> >>
> >> >> This will allow the controller listeners to get started immediately
> if
> >> the
> >> >> broker user is in super.users, which will speed up startup. It also
> >> will be
> >> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
> >> >> metadata to authorize pulling the SCRAM metadata.
> >> >>
> >> >> There are still a few use cases where super.users won't be required,
> but
> >> >> it may be useful in many cases to have this early start
> functionality.
> >> >>
> >> >> Leaving aside the preceding discussion, do you agree with starting up
> >> all
> >> >> endpoints (including non-early start ones) once we load a metadata
> >> >> snapshot? How feasible would it be for us to get a callback from the
> >> Raft
> >> >> layer the first time we caught up to the last stable offset? (we only
> >> want
> >> >> the callback the first time, not any other time). (I think the
> metadata
> >> >> shell also would want something like this, at least as an option).
> >> >>
> >> >> best,
> >> >> Colin
> >> >>
> >> >>
> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
> >> >> > Hi Colin,
> >> >> >
> >> >> > Thanks for the writeup. I had one question about bootstrapping. For
> >> the
> >> >> > brokers, I understand that listener startup is delayed until the
> >> >> Authorizer
> >> >> > is ready. However, I was not very clear how this would work for the
> >> >> > controller listeners. We may need them to startup before the
> metadata
> >> log
> >> >> > is ready so that a quorum can be established (as noted in the KIP).
> >> This
> >> >> > works fine if we assume that the controller principals are among
> >> >> > `super.users`. For requests forwarded from brokers, on the other
> >> hand, we
> >> >> > need to ensure the ACLs have been loaded properly before we begin
> >> >> > authorizing. The problem is that we currently use the same listener
> >> for
> >> >> > quorum requests and for forwarded requests. So my question is how
> does
> >> >> the
> >> >> > Authorizer communicate to the controller when it is safe to begin
> >> >> > authorizing different request types?
> >> >> >
> >> >> > There are a couple ways I can see this working. First, we could
> allow
> >> the
> >> >> > user to configure the listener used for forwarded requests
> separately.
> >> >> That
> >> >> > would work with the existing `Authorizer.start` API. Alternatively,
> >> >> perhaps
> >> >> > we could modify `Authorizer.start` to work with something more
> >> granular
> >> >> > than `EndPoint`. This would allow the controller to begin accepting
> >> >> > requests from the other quorum members before it is ready to
> authorize
> >> >> > forwarded requests from clients.  Then we would need some way to
> let
> >> >> > brokers know when the controller is ready to accept these forwarded
> >> >> > requests (e.g. through an error code in the `Envelope` response).
> >> >> >
> >> >> > What do you think?
> >> >> >
> >> >> > Thanks,
> >> >> > Jason
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
> >> >> > <da...@confluent.io.invalid> wrote:
> >> >> >
> >> >> >> +1 binding, thanks Colin!
> >> >> >>
> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
> >> >> wrote:
> >> >> >>
> >> >> >> > Hi all,
> >> >> >> >
> >> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer
> that
> >> >> >> stores
> >> >> >> > metadata in __cluster_metadata
> >> >> >> >
> >> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
> >> >> >> >
> >> >> >> > The original DISCUSS thread is here:
> >> >> >> >
> >> >> >> >
> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
> >> >> >> >
> >> >> >> > Please take a look and vote if you can.
> >> >> >> >
> >> >> >> > best,
> >> >> >> > Colin
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> -David
> >> >> >>
> >> >>
> >>
>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
> Hey Colin,
>
> If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
> would we need `early.start.listeners`? This would give the `Authorizer` a
> way to support partial startup, but it would be up to the implementation to
> decide whether to use it. For an `Authorizer` which depends only on an
> external service, there is no dependence on the metadata log, so there is
> no reason to have partial startup. For the `StandardAuthorizer`, we could
> let it return a completed future from `Authorizer.start` for all of the
> controller listeners. Then we can let it authorize only super.user
> operations until the metadata log has been loaded (and raise
> AuthorizerNotReadyException for everything else until that time).
>

Hi Jason,

My intention here was to avoid exposing Kafka clients to AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal components (such as the broker) then things are a lot simpler. Keep in mind there are external management systems that assume that if the broker ports accept traffic, the broker is "up". For example, Kubernetes operators, external dashboards and health checks, etc.

We can continue to uphold the guarantee that the broker is up if the port is responsive if we annotate only internal ports (like the controller port, or possibly inter-broker port) as "early start." This also helps with the case where people have some internal topic they want to be able to read or write from before turning on client traffic. For example, some people send Kafka metric data to an internal Kafka topic.

>> Leaving aside the preceding discussion, do you agree with starting up all
> endpoints (including non-early start ones) once we load a metadata
> snapshot? How feasible would it be for us to get a callback from the Raft
> layer the first time we caught up to the last stable offset? (we only want
> the callback the first time, not any other time). (I think the metadata
> shell also would want something like this, at least as an option).
>
> It is certainly possible to get a callback when we've reached the high
> watermark, which seems like what we'd want. This does make me wonder how we
> would handle the case when a controller gets partitioned from the rest of
> the quorum for some time. Should there be some point at which the
> controller no longer accepts authorization requests because of the
> potential staleness of the ACLs? Perhaps we could even reuse the
> `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other words,
> maybe the readiness condition is more of a dynamic assertion that we have
> caught up to near the current end of the log.
>

We've discussed this issue in the past in the context of the broker. Should we fence if the broker can't apply metadata for a certain amount of time? It's controversial because fencing may add more load to an already loaded cluster. However, without new metadata, we won't get new ACL changes, learn about new topics, etc. etc.

This is probably something we should discuss in a separate KIP, since it's a bigger issue about how we handle nodes with stale metadata. And of course this issue does exist in the current ZK implementation -- you can talk to a broker that has been fenced from ZK for days, and there you are relying on whatever ACLs were in place when it could last talk to ZK. If we implement bounded staleness for KRaft, we may want to consider doing the same for ZK, for consistency.

Another approach, which I suspect is more realistic in practice, is to have a metric tracking staleness and fire off an alert when it grows too high. Making the stale node inaccessible is sort of the nuclear option, and may tend to make a bad situation worse in practice.

best,
Colin


>
> On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org> wrote:
>
>> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> > Hi Colin,
>> >
>> > Thanks for the KIP. A couple of questions:
>> >
>> > 1) The KIP says:
>> > *However, when the controller is active, its Authorizer state may be
>> > slightly ahead of the the broker's Authorizer state. This will happen in
>> > the time between when a new ACL is created (or deleted) and the time that
>> > this change is persisted durably by a majority of controller quorum
>> peers.*
>> >
>> > Once the ACL takes effect on the controller, do we guarantee that it will
>> > be applied everywhere or can there be scenarios where the controller has
>> > applied the change, but others don't due to a subsequent failure in
>> > persisting the ACL changes?
>>
>> Hi Rajini,
>>
>> Thanks for taking a look.
>>
>> Good question. In general, for any new metadata record (not just ACL
>> records), there are two fates:
>> 1. the record will be persisted to the raft quorum.
>> 2. the raft leader will fail and a new leader will be elected that doesn't
>> have the new record.
>>
>> In the case of #2, the standby controllers and the brokers will never
>> apply the record.
>>
>> In general the active controller always rolls back its state to the last
>> committed offset if there is a failure and it loses the leadership. So this
>> isn't really unique to the KRaft Authorizer, it just seemed worth pointing
>> out since there will be a brief period when the active controller is
>> "ahead."
>>
>> We do try pretty hard to apply KIP-801 ACL records in order so that the
>> states you will see on brokers and standby controllers will always be valid
>> states from some point in the past timeline. The consistency here should be
>> at least as good as the current system.
>>
>> >
>> > 2) Have we considered using a different config with limited privileges
>> for
>> > bootstrapping instead of the all-powerful *super.users*?
>> >
>>
>> That's an interesting idea, but I'm not sure how much we could limit it.
>> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
>> order to do most of what they do. This might be something we could explore
>> in a future KIP since it's pretty cross-cutting (if we had such a limited
>> bootstrapping user, all the authorizers could implement it, not just the
>> KIP-801 one...)
>>
>> best,
>> Colin
>>
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>> >
>> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org> wrote:
>> >
>> >> How about this:
>> >>
>> >> We create a configuration key called early.start.listeners which
>> contains
>> >> a list of listener names. If this is not specified, its value defaults
>> to
>> >> just the controller listeners. Optionally, other listeners can be added
>> too.
>> >>
>> >> If super.users contains any user names, early start listeners will start
>> >> immediately. In the beginning they only authorize users that are in
>> >> super.users. All other listeners receive a new error code,
>> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
>> names,
>> >> then early start listeners will not be treated differently than other
>> >> listeners.
>> >>
>> >> This will allow the controller listeners to get started immediately if
>> the
>> >> broker user is in super.users, which will speed up startup. It also
>> will be
>> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
>> >> metadata to authorize pulling the SCRAM metadata.
>> >>
>> >> There are still a few use cases where super.users won't be required, but
>> >> it may be useful in many cases to have this early start functionality.
>> >>
>> >> Leaving aside the preceding discussion, do you agree with starting up
>> all
>> >> endpoints (including non-early start ones) once we load a metadata
>> >> snapshot? How feasible would it be for us to get a callback from the
>> Raft
>> >> layer the first time we caught up to the last stable offset? (we only
>> want
>> >> the callback the first time, not any other time). (I think the metadata
>> >> shell also would want something like this, at least as an option).
>> >>
>> >> best,
>> >> Colin
>> >>
>> >>
>> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> > Hi Colin,
>> >> >
>> >> > Thanks for the writeup. I had one question about bootstrapping. For
>> the
>> >> > brokers, I understand that listener startup is delayed until the
>> >> Authorizer
>> >> > is ready. However, I was not very clear how this would work for the
>> >> > controller listeners. We may need them to startup before the metadata
>> log
>> >> > is ready so that a quorum can be established (as noted in the KIP).
>> This
>> >> > works fine if we assume that the controller principals are among
>> >> > `super.users`. For requests forwarded from brokers, on the other
>> hand, we
>> >> > need to ensure the ACLs have been loaded properly before we begin
>> >> > authorizing. The problem is that we currently use the same listener
>> for
>> >> > quorum requests and for forwarded requests. So my question is how does
>> >> the
>> >> > Authorizer communicate to the controller when it is safe to begin
>> >> > authorizing different request types?
>> >> >
>> >> > There are a couple ways I can see this working. First, we could allow
>> the
>> >> > user to configure the listener used for forwarded requests separately.
>> >> That
>> >> > would work with the existing `Authorizer.start` API. Alternatively,
>> >> perhaps
>> >> > we could modify `Authorizer.start` to work with something more
>> granular
>> >> > than `EndPoint`. This would allow the controller to begin accepting
>> >> > requests from the other quorum members before it is ready to authorize
>> >> > forwarded requests from clients.  Then we would need some way to let
>> >> > brokers know when the controller is ready to accept these forwarded
>> >> > requests (e.g. through an error code in the `Envelope` response).
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Thanks,
>> >> > Jason
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> > <da...@confluent.io.invalid> wrote:
>> >> >
>> >> >> +1 binding, thanks Colin!
>> >> >>
>> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
>> >> wrote:
>> >> >>
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer that
>> >> >> stores
>> >> >> > metadata in __cluster_metadata
>> >> >> >
>> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >
>> >> >> > The original DISCUSS thread is here:
>> >> >> >
>> >> >> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >
>> >> >> > Please take a look and vote if you can.
>> >> >> >
>> >> >> > best,
>> >> >> > Colin
>> >> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> -David
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
Hey Colin,

If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
would we need `early.start.listeners`? This would give the `Authorizer` a
way to support partial startup, but it would be up to the implementation to
decide whether to use it. For an `Authorizer` which depends only on an
external service, there is no dependence on the metadata log, so there is
no reason to have partial startup. For the `StandardAuthorizer`, we could
let it return a completed future from `Authorizer.start` for all of the
controller listeners. Then we can let it authorize only super.user
operations until the metadata log has been loaded (and raise
AuthorizerNotReadyException for everything else until that time).

> Leaving aside the preceding discussion, do you agree with starting up all
endpoints (including non-early start ones) once we load a metadata
snapshot? How feasible would it be for us to get a callback from the Raft
layer the first time we caught up to the last stable offset? (we only want
the callback the first time, not any other time). (I think the metadata
shell also would want something like this, at least as an option).

It is certainly possible to get a callback when we've reached the high
watermark, which seems like what we'd want. This does make me wonder how we
would handle the case when a controller gets partitioned from the rest of
the quorum for some time. Should there be some point at which the
controller no longer accepts authorization requests because of the
potential staleness of the ACLs? Perhaps we could even reuse the
`AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other words,
maybe the readiness condition is more of a dynamic assertion that we have
caught up to near the current end of the log.

Best,
Jason

On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cm...@apache.org> wrote:

> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
> > Hi Colin,
> >
> > Thanks for the KIP. A couple of questions:
> >
> > 1) The KIP says:
> > *However, when the controller is active, its Authorizer state may be
> > slightly ahead of the the broker's Authorizer state. This will happen in
> > the time between when a new ACL is created (or deleted) and the time that
> > this change is persisted durably by a majority of controller quorum
> peers.*
> >
> > Once the ACL takes effect on the controller, do we guarantee that it will
> > be applied everywhere or can there be scenarios where the controller has
> > applied the change, but others don't due to a subsequent failure in
> > persisting the ACL changes?
>
> Hi Rajini,
>
> Thanks for taking a look.
>
> Good question. In general, for any new metadata record (not just ACL
> records), there are two fates:
> 1. the record will be persisted to the raft quorum.
> 2. the raft leader will fail and a new leader will be elected that doesn't
> have the new record.
>
> In the case of #2, the standby controllers and the brokers will never
> apply the record.
>
> In general the active controller always rolls back its state to the last
> committed offset if there is a failure and it loses the leadership. So this
> isn't really unique to the KRaft Authorizer, it just seemed worth pointing
> out since there will be a brief period when the active controller is
> "ahead."
>
> We do try pretty hard to apply KIP-801 ACL records in order so that the
> states you will see on brokers and standby controllers will always be valid
> states from some point in the past timeline. The consistency here should be
> at least as good as the current system.
>
> >
> > 2) Have we considered using a different config with limited privileges
> for
> > bootstrapping instead of the all-powerful *super.users*?
> >
>
> That's an interesting idea, but I'm not sure how much we could limit it.
> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
> order to do most of what they do. This might be something we could explore
> in a future KIP since it's pretty cross-cutting (if we had such a limited
> bootstrapping user, all the authorizers could implement it, not just the
> KIP-801 one...)
>
> best,
> Colin
>
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org> wrote:
> >
> >> How about this:
> >>
> >> We create a configuration key called early.start.listeners which
> contains
> >> a list of listener names. If this is not specified, its value defaults
> to
> >> just the controller listeners. Optionally, other listeners can be added
> too.
> >>
> >> If super.users contains any user names, early start listeners will start
> >> immediately. In the beginning they only authorize users that are in
> >> super.users. All other listeners receive a new error code,
> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
> names,
> >> then early start listeners will not be treated differently than other
> >> listeners.
> >>
> >> This will allow the controller listeners to get started immediately if
> the
> >> broker user is in super.users, which will speed up startup. It also
> will be
> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
> >> metadata to authorize pulling the SCRAM metadata.
> >>
> >> There are still a few use cases where super.users won't be required, but
> >> it may be useful in many cases to have this early start functionality.
> >>
> >> Leaving aside the preceding discussion, do you agree with starting up
> all
> >> endpoints (including non-early start ones) once we load a metadata
> >> snapshot? How feasible would it be for us to get a callback from the
> Raft
> >> layer the first time we caught up to the last stable offset? (we only
> want
> >> the callback the first time, not any other time). (I think the metadata
> >> shell also would want something like this, at least as an option).
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
> >> > Hi Colin,
> >> >
> >> > Thanks for the writeup. I had one question about bootstrapping. For
> the
> >> > brokers, I understand that listener startup is delayed until the
> >> Authorizer
> >> > is ready. However, I was not very clear how this would work for the
> >> > controller listeners. We may need them to startup before the metadata
> log
> >> > is ready so that a quorum can be established (as noted in the KIP).
> This
> >> > works fine if we assume that the controller principals are among
> >> > `super.users`. For requests forwarded from brokers, on the other
> hand, we
> >> > need to ensure the ACLs have been loaded properly before we begin
> >> > authorizing. The problem is that we currently use the same listener
> for
> >> > quorum requests and for forwarded requests. So my question is how does
> >> the
> >> > Authorizer communicate to the controller when it is safe to begin
> >> > authorizing different request types?
> >> >
> >> > There are a couple ways I can see this working. First, we could allow
> the
> >> > user to configure the listener used for forwarded requests separately.
> >> That
> >> > would work with the existing `Authorizer.start` API. Alternatively,
> >> perhaps
> >> > we could modify `Authorizer.start` to work with something more
> granular
> >> > than `EndPoint`. This would allow the controller to begin accepting
> >> > requests from the other quorum members before it is ready to authorize
> >> > forwarded requests from clients.  Then we would need some way to let
> >> > brokers know when the controller is ready to accept these forwarded
> >> > requests (e.g. through an error code in the `Envelope` response).
> >> >
> >> > What do you think?
> >> >
> >> > Thanks,
> >> > Jason
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
> >> > <da...@confluent.io.invalid> wrote:
> >> >
> >> >> +1 binding, thanks Colin!
> >> >>
> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> >>
> >> >> > Hi all,
> >> >> >
> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer that
> >> >> stores
> >> >> > metadata in __cluster_metadata
> >> >> >
> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
> >> >> >
> >> >> > The original DISCUSS thread is here:
> >> >> >
> >> >> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
> >> >> >
> >> >> > Please take a look and vote if you can.
> >> >> >
> >> >> > best,
> >> >> > Colin
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> -David
> >> >>
> >>
>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
> Hi Colin,
>
> Thanks for the KIP. A couple of questions:
>
> 1) The KIP says:
> *However, when the controller is active, its Authorizer state may be
> slightly ahead of the the broker's Authorizer state. This will happen in
> the time between when a new ACL is created (or deleted) and the time that
> this change is persisted durably by a majority of controller quorum peers.*
>
> Once the ACL takes effect on the controller, do we guarantee that it will
> be applied everywhere or can there be scenarios where the controller has
> applied the change, but others don't due to a subsequent failure in
> persisting the ACL changes?

Hi Rajini,

Thanks for taking a look.

Good question. In general, for any new metadata record (not just ACL records), there are two fates:
1. the record will be persisted to the raft quorum.
2. the raft leader will fail and a new leader will be elected that doesn't have the new record.

In the case of #2, the standby controllers and the brokers will never apply the record.

In general the active controller always rolls back its state to the last committed offset if there is a failure and it loses the leadership. So this isn't really unique to the KRaft Authorizer, it just seemed worth pointing out since there will be a brief period when the active controller is "ahead."

We do try pretty hard to apply KIP-801 ACL records in order so that the states you will see on brokers and standby controllers will always be valid states from some point in the past timeline. The consistency here should be at least as good as the current system.

>
> 2) Have we considered using a different config with limited privileges for
> bootstrapping instead of the all-powerful *super.users*?
>

That's an interesting idea, but I'm not sure how much we could limit it. The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in order to do most of what they do. This might be something we could explore in a future KIP since it's pretty cross-cutting (if we had such a limited bootstrapping user, all the authorizers could implement it, not just the KIP-801 one...)

best,
Colin

>
> Regards,
>
> Rajini
>
>
> On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org> wrote:
>
>> How about this:
>>
>> We create a configuration key called early.start.listeners which contains
>> a list of listener names. If this is not specified, its value defaults to
>> just the controller listeners. Optionally, other listeners can be added too.
>>
>> If super.users contains any user names, early start listeners will start
>> immediately. In the beginning they only authorize users that are in
>> super.users. All other listeners receive a new error code,
>> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user names,
>> then early start listeners will not be treated differently than other
>> listeners.
>>
>> This will allow the controller listeners to get started immediately if the
>> broker user is in super.users, which will speed up startup. It also will be
>> useful for breaking chicken/egg cycles like needing to pull the SCRAM
>> metadata to authorize pulling the SCRAM metadata.
>>
>> There are still a few use cases where super.users won't be required, but
>> it may be useful in many cases to have this early start functionality.
>>
>> Leaving aside the preceding discussion, do you agree with starting up all
>> endpoints (including non-early start ones) once we load a metadata
>> snapshot? How feasible would it be for us to get a callback from the Raft
>> layer the first time we caught up to the last stable offset? (we only want
>> the callback the first time, not any other time). (I think the metadata
>> shell also would want something like this, at least as an option).
>>
>> best,
>> Colin
>>
>>
>> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> > Hi Colin,
>> >
>> > Thanks for the writeup. I had one question about bootstrapping. For the
>> > brokers, I understand that listener startup is delayed until the
>> Authorizer
>> > is ready. However, I was not very clear how this would work for the
>> > controller listeners. We may need them to startup before the metadata log
>> > is ready so that a quorum can be established (as noted in the KIP). This
>> > works fine if we assume that the controller principals are among
>> > `super.users`. For requests forwarded from brokers, on the other hand, we
>> > need to ensure the ACLs have been loaded properly before we begin
>> > authorizing. The problem is that we currently use the same listener for
>> > quorum requests and for forwarded requests. So my question is how does
>> the
>> > Authorizer communicate to the controller when it is safe to begin
>> > authorizing different request types?
>> >
>> > There are a couple ways I can see this working. First, we could allow the
>> > user to configure the listener used for forwarded requests separately.
>> That
>> > would work with the existing `Authorizer.start` API. Alternatively,
>> perhaps
>> > we could modify `Authorizer.start` to work with something more granular
>> > than `EndPoint`. This would allow the controller to begin accepting
>> > requests from the other quorum members before it is ready to authorize
>> > forwarded requests from clients.  Then we would need some way to let
>> > brokers know when the controller is ready to accept these forwarded
>> > requests (e.g. through an error code in the `Envelope` response).
>> >
>> > What do you think?
>> >
>> > Thanks,
>> > Jason
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> > <da...@confluent.io.invalid> wrote:
>> >
>> >> +1 binding, thanks Colin!
>> >>
>> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
>> wrote:
>> >>
>> >> > Hi all,
>> >> >
>> >> > I'd like to start the vote on KIP-801: Implement an Authorizer that
>> >> stores
>> >> > metadata in __cluster_metadata
>> >> >
>> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >
>> >> > The original DISCUSS thread is here:
>> >> >
>> >> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >
>> >> > Please take a look and vote if you can.
>> >> >
>> >> > best,
>> >> > Colin
>> >> >
>> >>
>> >>
>> >> --
>> >> -David
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Rajini Sivaram <ra...@gmail.com>.
Hi Colin,

Thanks for the KIP. A couple of questions:

1) The KIP says:
*However, when the controller is active, its Authorizer state may be
slightly ahead of the the broker's Authorizer state. This will happen in
the time between when a new ACL is created (or deleted) and the time that
this change is persisted durably by a majority of controller quorum peers.*

Once the ACL takes effect on the controller, do we guarantee that it will
be applied everywhere or can there be scenarios where the controller has
applied the change, but others don't due to a subsequent failure in
persisting the ACL changes?

2) Have we considered using a different config with limited privileges for
bootstrapping instead of the all-powerful *super.users*?

Regards,

Rajini


On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cm...@apache.org> wrote:

> How about this:
>
> We create a configuration key called early.start.listeners which contains
> a list of listener names. If this is not specified, its value defaults to
> just the controller listeners. Optionally, other listeners can be added too.
>
> If super.users contains any user names, early start listeners will start
> immediately. In the beginning they only authorize users that are in
> super.users. All other listeners receive a new error code,
> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user names,
> then early start listeners will not be treated differently than other
> listeners.
>
> This will allow the controller listeners to get started immediately if the
> broker user is in super.users, which will speed up startup. It also will be
> useful for breaking chicken/egg cycles like needing to pull the SCRAM
> metadata to authorize pulling the SCRAM metadata.
>
> There are still a few use cases where super.users won't be required, but
> it may be useful in many cases to have this early start functionality.
>
> Leaving aside the preceding discussion, do you agree with starting up all
> endpoints (including non-early start ones) once we load a metadata
> snapshot? How feasible would it be for us to get a callback from the Raft
> layer the first time we caught up to the last stable offset? (we only want
> the callback the first time, not any other time). (I think the metadata
> shell also would want something like this, at least as an option).
>
> best,
> Colin
>
>
> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
> > Hi Colin,
> >
> > Thanks for the writeup. I had one question about bootstrapping. For the
> > brokers, I understand that listener startup is delayed until the
> Authorizer
> > is ready. However, I was not very clear how this would work for the
> > controller listeners. We may need them to startup before the metadata log
> > is ready so that a quorum can be established (as noted in the KIP). This
> > works fine if we assume that the controller principals are among
> > `super.users`. For requests forwarded from brokers, on the other hand, we
> > need to ensure the ACLs have been loaded properly before we begin
> > authorizing. The problem is that we currently use the same listener for
> > quorum requests and for forwarded requests. So my question is how does
> the
> > Authorizer communicate to the controller when it is safe to begin
> > authorizing different request types?
> >
> > There are a couple ways I can see this working. First, we could allow the
> > user to configure the listener used for forwarded requests separately.
> That
> > would work with the existing `Authorizer.start` API. Alternatively,
> perhaps
> > we could modify `Authorizer.start` to work with something more granular
> > than `EndPoint`. This would allow the controller to begin accepting
> > requests from the other quorum members before it is ready to authorize
> > forwarded requests from clients.  Then we would need some way to let
> > brokers know when the controller is ready to accept these forwarded
> > requests (e.g. through an error code in the `Envelope` response).
> >
> > What do you think?
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
> > <da...@confluent.io.invalid> wrote:
> >
> >> +1 binding, thanks Colin!
> >>
> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I'd like to start the vote on KIP-801: Implement an Authorizer that
> >> stores
> >> > metadata in __cluster_metadata
> >> >
> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
> >> >
> >> > The original DISCUSS thread is here:
> >> >
> >> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
> >> >
> >> > Please take a look and vote if you can.
> >> >
> >> > best,
> >> > Colin
> >> >
> >>
> >>
> >> --
> >> -David
> >>
>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
How about this:

We create a configuration key called early.start.listeners which contains a list of listener names. If this is not specified, its value defaults to just the controller listeners. Optionally, other listeners can be added too.

If super.users contains any user names, early start listeners will start immediately. In the beginning they only authorize users that are in super.users. All other listeners receive a new error code, AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user names, then early start listeners will not be treated differently than other listeners.

This will allow the controller listeners to get started immediately if the broker user is in super.users, which will speed up startup. It also will be useful for breaking chicken/egg cycles like needing to pull the SCRAM metadata to authorize pulling the SCRAM metadata.

There are still a few use cases where super.users won't be required, but it may be useful in many cases to have this early start functionality.

Leaving aside the preceding discussion, do you agree with starting up all endpoints (including non-early start ones) once we load a metadata snapshot? How feasible would it be for us to get a callback from the Raft layer the first time we caught up to the last stable offset? (we only want the callback the first time, not any other time). (I think the metadata shell also would want something like this, at least as an option).

best,
Colin


On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
> Hi Colin,
>
> Thanks for the writeup. I had one question about bootstrapping. For the
> brokers, I understand that listener startup is delayed until the Authorizer
> is ready. However, I was not very clear how this would work for the
> controller listeners. We may need them to startup before the metadata log
> is ready so that a quorum can be established (as noted in the KIP). This
> works fine if we assume that the controller principals are among
> `super.users`. For requests forwarded from brokers, on the other hand, we
> need to ensure the ACLs have been loaded properly before we begin
> authorizing. The problem is that we currently use the same listener for
> quorum requests and for forwarded requests. So my question is how does the
> Authorizer communicate to the controller when it is safe to begin
> authorizing different request types?
>
> There are a couple ways I can see this working. First, we could allow the
> user to configure the listener used for forwarded requests separately. That
> would work with the existing `Authorizer.start` API. Alternatively, perhaps
> we could modify `Authorizer.start` to work with something more granular
> than `EndPoint`. This would allow the controller to begin accepting
> requests from the other quorum members before it is ready to authorize
> forwarded requests from clients.  Then we would need some way to let
> brokers know when the controller is ready to accept these forwarded
> requests (e.g. through an error code in the `Envelope` response).
>
> What do you think?
>
> Thanks,
> Jason
>
>
>
>
>
>
> On Wed, Jan 12, 2022 at 12:57 PM David Arthur
> <da...@confluent.io.invalid> wrote:
>
>> +1 binding, thanks Colin!
>>
>> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org> wrote:
>>
>> > Hi all,
>> >
>> > I'd like to start the vote on KIP-801: Implement an Authorizer that
>> stores
>> > metadata in __cluster_metadata
>> >
>> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >
>> > The original DISCUSS thread is here:
>> >
>> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >
>> > Please take a look and vote if you can.
>> >
>> > best,
>> > Colin
>> >
>>
>>
>> --
>> -David
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
Hi Colin,

Thanks for the writeup. I had one question about bootstrapping. For the
brokers, I understand that listener startup is delayed until the Authorizer
is ready. However, I was not very clear how this would work for the
controller listeners. We may need them to startup before the metadata log
is ready so that a quorum can be established (as noted in the KIP). This
works fine if we assume that the controller principals are among
`super.users`. For requests forwarded from brokers, on the other hand, we
need to ensure the ACLs have been loaded properly before we begin
authorizing. The problem is that we currently use the same listener for
quorum requests and for forwarded requests. So my question is how does the
Authorizer communicate to the controller when it is safe to begin
authorizing different request types?

There are a couple ways I can see this working. First, we could allow the
user to configure the listener used for forwarded requests separately. That
would work with the existing `Authorizer.start` API. Alternatively, perhaps
we could modify `Authorizer.start` to work with something more granular
than `EndPoint`. This would allow the controller to begin accepting
requests from the other quorum members before it is ready to authorize
forwarded requests from clients.  Then we would need some way to let
brokers know when the controller is ready to accept these forwarded
requests (e.g. through an error code in the `Envelope` response).

What do you think?

Thanks,
Jason






On Wed, Jan 12, 2022 at 12:57 PM David Arthur
<da...@confluent.io.invalid> wrote:

> +1 binding, thanks Colin!
>
> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org> wrote:
>
> > Hi all,
> >
> > I'd like to start the vote on KIP-801: Implement an Authorizer that
> stores
> > metadata in __cluster_metadata
> >
> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
> >
> > The original DISCUSS thread is here:
> >
> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
> >
> > Please take a look and vote if you can.
> >
> > best,
> > Colin
> >
>
>
> --
> -David
>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by David Arthur <da...@confluent.io.INVALID>.
+1 binding, thanks Colin!

On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cm...@apache.org> wrote:

> Hi all,
>
> I'd like to start the vote on KIP-801: Implement an Authorizer that stores
> metadata in __cluster_metadata
>
> The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>
> The original DISCUSS thread is here:
>
> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>
> Please take a look and vote if you can.
>
> best,
> Colin
>


-- 
-David

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Posted by Colin McCabe <cm...@apache.org>.
bumping this thread

On Mon, Dec 13, 2021, at 16:46, Colin McCabe wrote:
> Hi all,
>
> I'd like to start the vote on KIP-801: Implement an Authorizer that 
> stores metadata in __cluster_metadata
>
> The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>
> The original DISCUSS thread is here:
>
> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>
> Please take a look and vote if you can.
>
> best,
> Colin