You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Paolo Moriello <pa...@gmail.com> on 2020/04/06 09:56:50 UTC

[DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Hello everybody,

I've opened a Jira to fix a bug on creation of internal topics. This
happens when the topics are created under insufficient ACLs: eg.
__consumer_offset is created but subsequent updateMetadata and leaderIsr
requests fail; the topic is than in an inconsistent state and it is
impossible to consume.

Jira: https://issues.apache.org/jira/browse/KAFKA-9806

A simple fix to solve this problem is to authorize the cluster operation
before creating these topics. I've submitted a PR with the fix:
https://github.com/apache/kafka/pull/8415

Please take a look and let me know if you have any feedback.
Thanks,
Paolo

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Paolo Moriello <pa...@gmail.com>.
Right, the problem in this case is that restoring ACLs to a correct configuration does not fix the problem, because the internal topics remains in a bad state. For instance:
1) user sets insufficient cluster level ACLs (now brokers are not able to communicate)
2) user consumes for the first time, consumer_offsets gets created
3) user sets correct ACLs (now brokers are able to communicate)
4) it is still impossible to consume because consumer_offsets is in a bad state

I agree that when broker ACLs are configured incorrectly, a lot of things fail. However, when ACLs are set back correctly we should expect things to work normally. This does not happen for consumers at the moment. This is why I believe that the source of inconsistency is in consumer_offsets creation here, it shouldn’t be created if we know that subsequent requests will fail.

Best,
Paolo

> On 13 Apr 2020, at 15:05, Colin McCabe <cm...@apache.org> wrote:
> Hi Paolo,
> 
> If the problem is broker ACLs being configured incorrectly so that it can't receive requests from the controller, a lot of things will fail.  This isn't really related to anything with FindCoordinator.
> 
> best,
> Colin

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Colin McCabe <cm...@apache.org>.
On Thu, Apr 9, 2020, at 09:36, Paolo Moriello wrote:
> Hi Colin,
> 
> Thanks again for checking this out.
> 
> Indeed you are right, a configuration problem is what leads to
> authorization failure (and consequently to the internal topics bug): i.e.
> incorrect ACLs configuration. In particular, in case of insufficient
> cluster-level ACLs, so if one does not include the broker CN required to
> allow inter-broker communication when client SSL is required:
> 1) FindCoordinator request completes successfully, and __consumer_offsets
> topic is created in zk
> 2) but subsequent UpdateMetadata and LeaderAndIsr fail. This leaves the
> internal topic in a bad state
> 
> A deeper look confirmed that the change I proposed initially does not work,
> since authorizing the user principal is not enough to prevent the issue.
> However, I believe that we should still avoid creating the internal
> topic(s) at all in case of insufficient broker ACLs (which means, make
> FindCoordinator request fail since we won't have the required metadata). A
> possibility could be to try to check the existence of brokers' ACLs before
> creating the internal topic.
> Let me know if you have any feedback.

Hi Paolo,

If the problem is broker ACLs being configured incorrectly so that it can't receive requests from the controller, a lot of things will fail.  This isn't really related to anything with FindCoordinator.

best,
Colin


> 
> Thanks,
> Paolo
> 
> 
> On Tue, 7 Apr 2020 at 17:12, Colin McCabe <cm...@apache.org> wrote:
> 
> > On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote:
> > > Hi Colin,
> > >
> > > Thanks for your interest in this. I agree with you, this change could
> > break
> > > compatibility. However, changing the source principal is non trivial in
> > > this case. In fact, here the problem is not in the internal topic
> > creation
> > > - which succeeds - but in the two subsequent LeaderAndIsr and
> > > UpdateMetadata requests.
> > >
> > > When a consumer tries to consume for the first time, the creation of
> > > internal topic completes, zk-nodes are filled with the necessary
> > metadata,
> > > and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
> > > update which, in turn, makes the ControllerChannelManager
> > > (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
> > > requests to the brokers; (I can be wrong, but I believe that this
> > requests
> > > are already being executed with broker principal). These requests fail
> > > because we authorize the cluster operation there, so the
> > __consumer_offsets
> > > topic remains in a bad state.
> >
> > I might be misunderstanding something here, but it seems to me that if
> > LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization
> > errors, then there is a configuration problem on the cluster which doesn't
> > have anything to do with the __consumer_offsets topic.
> >
> > >
> > > Is there a reason to not authorize the operation for find coordinator
> > > requests as well?
> >
> > To be clear, we can't change the authorization for FindCoordinatorRequest.
> >
> > best,
> > Colin
> >
>

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Paolo Moriello <pa...@gmail.com>.
Hi Colin,

Thanks again for checking this out.

Indeed you are right, a configuration problem is what leads to
authorization failure (and consequently to the internal topics bug): i.e.
incorrect ACLs configuration. In particular, in case of insufficient
cluster-level ACLs, so if one does not include the broker CN required to
allow inter-broker communication when client SSL is required:
1) FindCoordinator request completes successfully, and __consumer_offsets
topic is created in zk
2) but subsequent UpdateMetadata and LeaderAndIsr fail. This leaves the
internal topic in a bad state

A deeper look confirmed that the change I proposed initially does not work,
since authorizing the user principal is not enough to prevent the issue.
However, I believe that we should still avoid creating the internal
topic(s) at all in case of insufficient broker ACLs (which means, make
FindCoordinator request fail since we won't have the required metadata). A
possibility could be to try to check the existence of brokers' ACLs before
creating the internal topic.
Let me know if you have any feedback.

Thanks,
Paolo


On Tue, 7 Apr 2020 at 17:12, Colin McCabe <cm...@apache.org> wrote:

> On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote:
> > Hi Colin,
> >
> > Thanks for your interest in this. I agree with you, this change could
> break
> > compatibility. However, changing the source principal is non trivial in
> > this case. In fact, here the problem is not in the internal topic
> creation
> > - which succeeds - but in the two subsequent LeaderAndIsr and
> > UpdateMetadata requests.
> >
> > When a consumer tries to consume for the first time, the creation of
> > internal topic completes, zk-nodes are filled with the necessary
> metadata,
> > and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
> > update which, in turn, makes the ControllerChannelManager
> > (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
> > requests to the brokers; (I can be wrong, but I believe that this
> requests
> > are already being executed with broker principal). These requests fail
> > because we authorize the cluster operation there, so the
> __consumer_offsets
> > topic remains in a bad state.
>
> I might be misunderstanding something here, but it seems to me that if
> LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization
> errors, then there is a configuration problem on the cluster which doesn't
> have anything to do with the __consumer_offsets topic.
>
> >
> > Is there a reason to not authorize the operation for find coordinator
> > requests as well?
>
> To be clear, we can't change the authorization for FindCoordinatorRequest.
>
> best,
> Colin
>

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Colin McCabe <cm...@apache.org>.
On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote:
> Hi Colin,
> 
> Thanks for your interest in this. I agree with you, this change could break
> compatibility. However, changing the source principal is non trivial in
> this case. In fact, here the problem is not in the internal topic creation
> - which succeeds - but in the two subsequent LeaderAndIsr and
> UpdateMetadata requests.
> 
> When a consumer tries to consume for the first time, the creation of
> internal topic completes, zk-nodes are filled with the necessary metadata,
> and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
> update which, in turn, makes the ControllerChannelManager
> (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
> requests to the brokers; (I can be wrong, but I believe that this requests
> are already being executed with broker principal). These requests fail
> because we authorize the cluster operation there, so the __consumer_offsets
> topic remains in a bad state.

I might be misunderstanding something here, but it seems to me that if LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization errors, then there is a configuration problem on the cluster which doesn't have anything to do with the __consumer_offsets topic.

> 
> Is there a reason to not authorize the operation for find coordinator
> requests as well?

To be clear, we can't change the authorization for FindCoordinatorRequest.

best,
Colin

> 
> Thanks,
> Paolo
> 
> On Mon, 6 Apr 2020 at 23:58, Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Paolo,
> >
> > Thanks for finding this issue.
> >
> > Unfortunately, you certainly can't add a new permission requirement to an
> > existing RPC without breaking compatibility.  So the current solution in
> > the PR will not work.  However, you should be able to have the broker
> > create the topic using its own principal rather than the caller's.
> > Basically the equivalent of a doAs block (I forget how we do this exactly,
> > but we do have some way of doing it).
> >
> > best,
> > Colin
> >
> >
> > On Mon, Apr 6, 2020, at 02:56, Paolo Moriello wrote:
> > > Hello everybody,
> > >
> > > I've opened a Jira to fix a bug on creation of internal topics. This
> > > happens when the topics are created under insufficient ACLs: eg.
> > > __consumer_offset is created but subsequent updateMetadata and leaderIsr
> > > requests fail; the topic is than in an inconsistent state and it is
> > > impossible to consume.
> > >
> > > Jira: https://issues.apache.org/jira/browse/KAFKA-9806
> > >
> > > A simple fix to solve this problem is to authorize the cluster operation
> > > before creating these topics. I've submitted a PR with the fix:
> > > https://github.com/apache/kafka/pull/8415
> > >
> > > Please take a look and let me know if you have any feedback.
> > > Thanks,
> > > Paolo
> > >
> >
>

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Paolo Moriello <pa...@gmail.com>.
Hi Colin,

Thanks for your interest in this. I agree with you, this change could break
compatibility. However, changing the source principal is non trivial in
this case. In fact, here the problem is not in the internal topic creation
- which succeeds - but in the two subsequent LeaderAndIsr and
UpdateMetadata requests.

When a consumer tries to consume for the first time, the creation of
internal topic completes, zk-nodes are filled with the necessary metadata,
and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
update which, in turn, makes the ControllerChannelManager
(ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
requests to the brokers; (I can be wrong, but I believe that this requests
are already being executed with broker principal). These requests fail
because we authorize the cluster operation there, so the __consumer_offsets
topic remains in a bad state.

Is there a reason to not authorize the operation for find coordinator
requests as well?

Thanks,
Paolo

On Mon, 6 Apr 2020 at 23:58, Colin McCabe <cm...@apache.org> wrote:

> Hi Paolo,
>
> Thanks for finding this issue.
>
> Unfortunately, you certainly can't add a new permission requirement to an
> existing RPC without breaking compatibility.  So the current solution in
> the PR will not work.  However, you should be able to have the broker
> create the topic using its own principal rather than the caller's.
> Basically the equivalent of a doAs block (I forget how we do this exactly,
> but we do have some way of doing it).
>
> best,
> Colin
>
>
> On Mon, Apr 6, 2020, at 02:56, Paolo Moriello wrote:
> > Hello everybody,
> >
> > I've opened a Jira to fix a bug on creation of internal topics. This
> > happens when the topics are created under insufficient ACLs: eg.
> > __consumer_offset is created but subsequent updateMetadata and leaderIsr
> > requests fail; the topic is than in an inconsistent state and it is
> > impossible to consume.
> >
> > Jira: https://issues.apache.org/jira/browse/KAFKA-9806
> >
> > A simple fix to solve this problem is to authorize the cluster operation
> > before creating these topics. I've submitted a PR with the fix:
> > https://github.com/apache/kafka/pull/8415
> >
> > Please take a look and let me know if you have any feedback.
> > Thanks,
> > Paolo
> >
>

Re: [DISCUSS] (KAFKA-9806) authorize cluster operation when creating internal topics

Posted by Colin McCabe <cm...@apache.org>.
Hi Paolo,

Thanks for finding this issue.

Unfortunately, you certainly can't add a new permission requirement to an existing RPC without breaking compatibility.  So the current solution in the PR will not work.  However, you should be able to have the broker create the topic using its own principal rather than the caller's.  Basically the equivalent of a doAs block (I forget how we do this exactly, but we do have some way of doing it).

best,
Colin


On Mon, Apr 6, 2020, at 02:56, Paolo Moriello wrote:
> Hello everybody,
> 
> I've opened a Jira to fix a bug on creation of internal topics. This
> happens when the topics are created under insufficient ACLs: eg.
> __consumer_offset is created but subsequent updateMetadata and leaderIsr
> requests fail; the topic is than in an inconsistent state and it is
> impossible to consume.
> 
> Jira: https://issues.apache.org/jira/browse/KAFKA-9806
> 
> A simple fix to solve this problem is to authorize the cluster operation
> before creating these topics. I've submitted a PR with the fix:
> https://github.com/apache/kafka/pull/8415
> 
> Please take a look and let me know if you have any feedback.
> Thanks,
> Paolo
>