You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jose Armando Garcia Sancio <js...@confluent.io> on 2019/04/24 22:45:10 UTC

[DISCUSS] KIP-460: Admin Leader Election RPC

Hi all,

We would like to extend the "preferred leader election" RPC for the admin
client to also support unclean leader elections.

The KIP can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC

Thanks!
-Jose

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
On Thu, Apr 25, 2019 at 11:47 AM Jason Gustafson <ja...@confluent.io> wrote:
>
> Hi Jose,
>
> This looks useful. One comment I had is whether we can improve the leader
> election tool. Needing to provide a json file is a bit annoying. Could we
> have a way to specify partitions directly through the command line? Often
> when we need to enable unclean leader election, it is just one or a small
> set of partitions.  I'd hope to be able to do something like this.
>
> bin/kafka-elect-leaders.sh --allow-unclean --topic foo --partition 1
> --bootstrap-server localhost:9092

Thanks for the feedback Jason. How about the following help output for
those flags:

--election-type <String: election>
Type of election to attempt. Possible values are 0 (or "preferred") for
preferred election or 1 (or "uncleaned") for uncleaned election. The
default value is 0 (or "preferred"), if --topic and --partition is
specified . Not allowed if --path-to-json-file is specified.

--topic <String: topic>
Name of topic for which to perform an election. REQUIRED if --partition is
specified. Not allowed if --path-to-json-file is specified.

--partition <Integer: partition id>
Partition id for which to perform an election. REQUIRED if --topic is
specified. Not allowed if --path-to-json-file is specified.

--path-to-json-file <String: file path>
...
Defaults to preferred election to all existing partitions if --topic and
--partition flags are not specified.

>
> Also there's a comment if the json file is not provided, the help document
> says "Defaults to all existing partitions." I assume we would not keep
this
> behavior?

Unfortunately, this behaviour is at the protocol level. If the Kafka
controller receives a request with a null for "TopicPartitions" then it
assumes that the user is attempting to perform a preferred leader election
on all of the partitions. I am not sure if we can remove this functionality
at this point. We could remove this feature from the CLI/command while
keeping it at the protocol level. What do we think?

This is the code that handles this:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaApis.scala#L2365-L2366

>
> The only other question I had is whether we ought to deprecate
> `AdminClient.electPreferredLeaders`?

Yes. We should deprecate this method. I'll update the KIP.

Thanks!

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Jose,

This looks useful. One comment I had is whether we can improve the leader
election tool. Needing to provide a json file is a bit annoying. Could we
have a way to specify partitions directly through the command line? Often
when we need to enable unclean leader election, it is just one or a small
set of partitions.  I'd hope to be able to do something like this.

bin/kafka-elect-leaders.sh --allow-unclean --topic foo --partition 1
--bootstrap-server localhost:9092

Also there's a comment if the json file is not provided, the help document
says "Defaults to all existing partitions." I assume we would not keep this
behavior?

The only other question I had is whether we ought to deprecate
`AdminClient.electPreferredLeaders`?

Thanks,
Jason

On Thu, Apr 25, 2019 at 11:23 AM Jose Armando Garcia Sancio <
jsancio@confluent.io> wrote:

> On Thu, Apr 25, 2019 at 8:22 AM Colin McCabe <cm...@apache.org> wrote:
> >
> > On Wed, Apr 24, 2019, at 21:04, Jose Armando Garcia Sancio wrote:
> > > Thanks for the reply. Comments below.
> > >
> > > On Wed, Apr 24, 2019 at 6:07 PM Colin McCabe <cm...@apache.org>
> wrote:
> > > > What's the rationale for using an int8 rather than just having a
> boolean
> > > > that is either true or false for "unclean"?  We only have two values
> now,
> > > > and it seems like we could modify the RPC schema in the future if
> needed.
> > > > Or is the intention to add more flags later?
> > > >
> > >
> > > There are two reason:
> > >
> > >    1. The controller supports 4 (5 technically) different election
> > >    algorithms. We are only exposing "preferred" and "unclean" through
> the
> > >    admin client because we only have use cases for those two election
> types.
> > >    It is possible that in the future we may want to support more
> algorithms.
> > >    This would allow us to make that change easier.
> > >    2. I believe that an enum is more descriptive than a boolean flag
> as it
> > >    is not a matter of "unclean" vs "clean" or "preferred" vs
> "non-preferred".
> > >       1. Preferred means that the controller will attempt to elect
> only the
> > >       fist replica describe in the partition assignment if it is
> > > online and it is
> > >       in-sync.
> > >       2. Unclean means that the controller will attempt to elect the
> first
> > >       in-sync and alive replica given the order of the partition
> assignment. If
> > >       this is not satisfied it will attempt to elect the first replica
> in the
> > >       assignment that is alive.
> > >
> >
> > OK, that makes sense.
> >
> > On an unrelated note, you can simplify your protocol definition to this,
> I believe:
> >
> >         { "name": "Partitions", "type": "[]Partitions", "versions": "1+",
> >           "about": "The partitions of this topic whose leader should be
> elected.",
> >           "fields": [
> >               { "name": "PartitionId", "type": "int32", "versions": "0+",
> >                 "about": "The partition id." },
> >               { "name": "ElectionType", "type": "int8", "versions": "1+",
> >                 "about": "Type of elections to conduct for the
> partition. A value of '0' elects the preferred leader. A value of '1'
> elects an unclean leader if there are no in-sync leaders." }
> >           ]
> >         }
>
> Great suggestion. I made one modification to the "versions" field for
> "Partitions". Let me know which one is correct. The KIP should have
> the final result.
>
> >
> > The reason is because the v0 array of ints is equivalent on the wire to
> an array of structures that only have an int inside.
> >
> > In other words, this:
> >         { "type": "[]int32" }
> >
> > is just another way of saying this:
> >         { "type": "[]MyArrayType", "fields": [
> >               { "name": "MyInt", "type": "int32",  } ] }
>
> Very cool. This is the definition of zero overhead abstraction.
>
> -Jose
>

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
On Thu, Apr 25, 2019 at 8:22 AM Colin McCabe <cm...@apache.org> wrote:
>
> On Wed, Apr 24, 2019, at 21:04, Jose Armando Garcia Sancio wrote:
> > Thanks for the reply. Comments below.
> >
> > On Wed, Apr 24, 2019 at 6:07 PM Colin McCabe <cm...@apache.org> wrote:
> > > What's the rationale for using an int8 rather than just having a boolean
> > > that is either true or false for "unclean"?  We only have two values now,
> > > and it seems like we could modify the RPC schema in the future if needed.
> > > Or is the intention to add more flags later?
> > >
> >
> > There are two reason:
> >
> >    1. The controller supports 4 (5 technically) different election
> >    algorithms. We are only exposing "preferred" and "unclean" through the
> >    admin client because we only have use cases for those two election types.
> >    It is possible that in the future we may want to support more algorithms.
> >    This would allow us to make that change easier.
> >    2. I believe that an enum is more descriptive than a boolean flag as it
> >    is not a matter of "unclean" vs "clean" or "preferred" vs "non-preferred".
> >       1. Preferred means that the controller will attempt to elect only the
> >       fist replica describe in the partition assignment if it is
> > online and it is
> >       in-sync.
> >       2. Unclean means that the controller will attempt to elect the first
> >       in-sync and alive replica given the order of the partition assignment. If
> >       this is not satisfied it will attempt to elect the first replica in the
> >       assignment that is alive.
> >
>
> OK, that makes sense.
>
> On an unrelated note, you can simplify your protocol definition to this, I believe:
>
>         { "name": "Partitions", "type": "[]Partitions", "versions": "1+",
>           "about": "The partitions of this topic whose leader should be elected.",
>           "fields": [
>               { "name": "PartitionId", "type": "int32", "versions": "0+",
>                 "about": "The partition id." },
>               { "name": "ElectionType", "type": "int8", "versions": "1+",
>                 "about": "Type of elections to conduct for the partition. A value of '0' elects the preferred leader. A value of '1' elects an unclean leader if there are no in-sync leaders." }
>           ]
>         }

Great suggestion. I made one modification to the "versions" field for
"Partitions". Let me know which one is correct. The KIP should have
the final result.

>
> The reason is because the v0 array of ints is equivalent on the wire to an array of structures that only have an int inside.
>
> In other words, this:
>         { "type": "[]int32" }
>
> is just another way of saying this:
>         { "type": "[]MyArrayType", "fields": [
>               { "name": "MyInt", "type": "int32",  } ] }

Very cool. This is the definition of zero overhead abstraction.

-Jose

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Colin McCabe <cm...@apache.org>.
On Wed, Apr 24, 2019, at 21:04, Jose Armando Garcia Sancio wrote:
> Thanks for the reply. Comments below.
> 
> On Wed, Apr 24, 2019 at 6:07 PM Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Jose,
> >
> > Thanks for the KIP, looks valuable.
> >
> > If I use a PreferredLeaderElection RPC to specifically request an unclean
> > leader election, will this take effect even if unclean leader elections are
> > disabled on the topic involved?  I assume that the answer is yes, but it
> > would be good to clarify this in the KIP.
> >
> 
> Yes. One of the motivation for this change is to allow the user to attempt
> unclean leader election without having to change the topic configuration. I
> will update the motivation and design section.

Hi Jose,

Sounds good.

> 
> What ACLs will be required to perform this action?  WRITE on the topic
> > resource?  Or ALTER on KafkaCluster?  Or perhaps ALTER on the topic would
> > be most appropriate, since we probably don't want ordinary producers
> > triggering unclean leader elections.
> >
> 
> I am not sure. Let me investigate what the current RPC requires and get
> back to you. This is not a new RP We are instead updating an existing RPC
> that already performs authorization. The RPC has API key 32.

It looks like the existing RPC requires ALTER on CLUSTER.  That's definitely safe, since it's essentially root for us.  We can probably just keep it the way it is, then.

> 
> What's the rationale for using an int8 rather than just having a boolean
> > that is either true or false for "unclean"?  We only have two values now,
> > and it seems like we could modify the RPC schema in the future if needed.
> > Or is the intention to add more flags later?
> >
> 
> There are two reason:
> 
>    1. The controller supports 4 (5 technically) different election
>    algorithms. We are only exposing "preferred" and "unclean" through the
>    admin client because we only have use cases for those two election types.
>    It is possible that in the future we may want to support more algorithms.
>    This would allow us to make that change easier.
>    2. I believe that an enum is more descriptive than a boolean flag as it
>    is not a matter of "unclean" vs "clean" or "preferred" vs "non-preferred".
>       1. Preferred means that the controller will attempt to elect only the
>       fist replica describe in the partition assignment if it is
> online and it is
>       in-sync.
>       2. Unclean means that the controller will attempt to elect the first
>       in-sync and alive replica given the order of the partition assignment. If
>       this is not satisfied it will attempt to elect the first replica in the
>       assignment that is alive.
>

OK, that makes sense.

On an unrelated note, you can simplify your protocol definition to this, I believe:

        { "name": "Partitions", "type": "[]Partitions", "versions": "1+",
          "about": "The partitions of this topic whose leader should be elected.",
          "fields": [
              { "name": "PartitionId", "type": "int32", "versions": "0+",
                "about": "The partition id." },
              { "name": "ElectionType", "type": "int8", "versions": "1+",
                "about": "Type of elections to conduct for the partition. A value of '0' elects the preferred leader. A value of '1' elects an unclean leader if there are no in-sync leaders." } 
          ]
        }

The reason is because the v0 array of ints is equivalent on the wire to an array of structures that only have an int inside.

In other words, this:
        { "type": "[]int32" }

is just another way of saying this:
        { "type": "[]MyArrayType", "fields": [
              { "name": "MyInt", "type": "int32",  } ] }

best,
Colin

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
Thanks for the reply. Comments below.

On Wed, Apr 24, 2019 at 6:07 PM Colin McCabe <cm...@apache.org> wrote:

> Hi Jose,
>
> Thanks for the KIP, looks valuable.
>
> If I use a PreferredLeaderElection RPC to specifically request an unclean
> leader election, will this take effect even if unclean leader elections are
> disabled on the topic involved?  I assume that the answer is yes, but it
> would be good to clarify this in the KIP.
>

Yes. One of the motivation for this change is to allow the user to attempt
unclean leader election without having to change the topic configuration. I
will update the motivation and design section.

What ACLs will be required to perform this action?  WRITE on the topic
> resource?  Or ALTER on KafkaCluster?  Or perhaps ALTER on the topic would
> be most appropriate, since we probably don't want ordinary producers
> triggering unclean leader elections.
>

I am not sure. Let me investigate what the current RPC requires and get
back to you. This is not a new RPC. We are instead updating an existing RPC
that already performs authorization. The RPC has API key 32.

What's the rationale for using an int8 rather than just having a boolean
> that is either true or false for "unclean"?  We only have two values now,
> and it seems like we could modify the RPC schema in the future if needed.
> Or is the intention to add more flags later?
>

There are two reason:

   1. The controller supports 4 (5 technically) different election
   algorithms. We are only exposing "preferred" and "unclean" through the
   admin client because we only have use cases for those two election types.
   It is possible that in the future we may want to support more algorithms.
   This would allow us to make that change easier.
   2. I believe that an enum is more descriptive than a boolean flag as it
   is not a matter of "unclean" vs "clean" or "preferred" vs "non-preferred".
      1. Preferred means that the controller will attempt to elect only the
      fist replica describe in the partition assignment if it is
online and it is
      in-sync.
      2. Unclean means that the controller will attempt to elect the first
      in-sync and alive replica given the order of the partition assignment. If
      this is not satisfied it will attempt to elect the first replica in the
      assignment that is alive.

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Colin McCabe <cm...@apache.org>.
Hi Jose,

Thanks for the KIP, looks valuable.

If I use a PreferredLeaderElection RPC to specifically request an unclean leader election, will this take effect even if unclean leader elections are disabled on the topic involved?  I assume that the answer is yes, but it would be good to clarify this in the KIP.

What ACLs will be required to perform this action?  WRITE on the topic resource?  Or ALTER on KafkaCluster?  Or perhaps ALTER on the topic would be most appropriate, since we probably don't want ordinary producers triggering unclean leader elections.

What's the rationale for using an int8 rather than just having a boolean that is either true or false for "unclean"?  We only have two values now, and it seems like we could modify the RPC schema in the future if needed.  Or is the intention to add more flags later?

best,
Colin


On Wed, Apr 24, 2019, at 15:45, Jose Armando Garcia Sancio wrote:
> Hi all,
> 
> We would like to extend the "preferred leader election" RPC for the admin
> client to also support unclean leader elections.
> 
> The KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC
> 
> Thanks!
> -Jose
>

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
Hi all,

During the implementation of KIP-460, we discovered that we had to make
some minor changes to the design. I have updated the KIP wiki[1]. You can
see the difference here[2].

At a high-level we made the following changes:

   1. Added a top level ErrorCode to the response for errors that apply to
   all of the topic partitions. This is currently being used for cluster
   authorization errors.
   2. We renamed the delayedOperation for DelayOperationPurgatory from
   ElectPreferredLeader to ElectLeader.


[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC
.
[2]
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=113707931&selectedPageVersions=19&selectedPageVersions=18

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
On Sun, May 5, 2019 at 12:35 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey there Jose, thanks for the KIP!
>
> I have one small nit regarding the `kafka-leader-election.sh` tool. I agree
> with Jason that it is probably better to force users be explicit in their
> desired election type. I was wondering whether it makes sense to support
> only "preferred" and "unclean" for the "--election-type" flag, not the
> numeric values.
>

Thanks for the feedback Stanislav. I went ahead and updated KIP-460 to only
allow "preferred" and "unclean" for the "--election-type" flag.

Thanks!
-Jose

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey there Jose, thanks for the KIP!

I have one small nit regarding the `kafka-leader-election.sh` tool. I agree
with Jason that it is probably better to force users be explicit in their
desired election type. I was wondering whether it makes sense to support
only "preferred" and "unclean" for the "--election-type" flag, not the
numeric values.
While 0 and 1 are what the RPC uses, I think it might be simpler if we
abstracted that detail away from users and stick to terms they are familiar
with.

On Wed, May 1, 2019 at 12:35 AM Jose Armando Garcia Sancio <
jsancio@confluent.io> wrote:

> On Tue, Apr 30, 2019 at 11:39 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Thanks for the updates, Jose. The proposal looks good to me. Just one
> minor
> > question I had is whether we should even have a default --election-type
> in
> > kafka-leader-election.sh. I am wondering if it is reasonable to make the
> > user be explicit about what they are trying to do?
> >
>
> This change sounds good to me. We can always add a default to
> --election-type in the future without breaking backwards compatibility.
>
> Thanks!
>


-- 
Best,
Stanislav

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
On Tue, Apr 30, 2019 at 11:39 AM Jason Gustafson <ja...@confluent.io> wrote:

> Thanks for the updates, Jose. The proposal looks good to me. Just one minor
> question I had is whether we should even have a default --election-type in
> kafka-leader-election.sh. I am wondering if it is reasonable to make the
> user be explicit about what they are trying to do?
>

This change sounds good to me. We can always add a default to
--election-type in the future without breaking backwards compatibility.

Thanks!

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jason Gustafson <ja...@confluent.io>.
Thanks for the updates, Jose. The proposal looks good to me. Just one minor
question I had is whether we should even have a default --election-type in
kafka-leader-election.sh. I am wondering if it is reasonable to make the
user be explicit about what they are trying to do?

-Jason

On Fri, Apr 26, 2019 at 2:39 PM Jose Armando Garcia Sancio <
jsancio@confluent.io> wrote:

> Hi all,
>
> Jason, Colin and I discuss this KIP offline and decided to make the
> following changes.
>
>    1. Change the ElectLeadersRequest RPC so that only one election type can
>    be specified and it applies to all of the topic partitions enumerated.
> We
>    think that this makes the API easier to use when performing one type of
>    election across multiple topic partitions. We think that it is rare that
>    they user would like to perform different type of elections in the same
>    command (or request).
>    2. Change the kafka-leader-election script so that it doesn't default to
>    applying the election type to all of the topic partitions. For example
>    previously "bin/kafka-preferred-replica-election.sh --bootstrap-server
>    $host:$port" would attempt to perform preferred leader election to all
> of
>    the partition. Instead now the user needs to run the following command
>    "bin/kafka-leader-election.sh --bootstrap-server $host:@port
>    --all-topic-partitions"
>
> The KIP has been updated to includes these changes. The diff is here:
>
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=113707931&selectedPageVersions=13&selectedPageVersions=12
>
> Thanks!
>
> On Wed, Apr 24, 2019 at 3:45 PM Jose Armando Garcia Sancio <
> jsancio@confluent.io> wrote:
>
> > Hi all,
> >
> > We would like to extend the "preferred leader election" RPC for the admin
> > client to also support unclean leader elections.
> >
> > The KIP can be found here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC
> >
> > Thanks!
> > -Jose
> >
>
>
> --
> -Jose
>

Re: [DISCUSS] KIP-460: Admin Leader Election RPC

Posted by Jose Armando Garcia Sancio <js...@confluent.io>.
Hi all,

Jason, Colin and I discuss this KIP offline and decided to make the
following changes.

   1. Change the ElectLeadersRequest RPC so that only one election type can
   be specified and it applies to all of the topic partitions enumerated. We
   think that this makes the API easier to use when performing one type of
   election across multiple topic partitions. We think that it is rare that
   they user would like to perform different type of elections in the same
   command (or request).
   2. Change the kafka-leader-election script so that it doesn't default to
   applying the election type to all of the topic partitions. For example
   previously "bin/kafka-preferred-replica-election.sh --bootstrap-server
   $host:$port" would attempt to perform preferred leader election to all of
   the partition. Instead now the user needs to run the following command
   "bin/kafka-leader-election.sh --bootstrap-server $host:@port
   --all-topic-partitions"

The KIP has been updated to includes these changes. The diff is here:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=113707931&selectedPageVersions=13&selectedPageVersions=12

Thanks!

On Wed, Apr 24, 2019 at 3:45 PM Jose Armando Garcia Sancio <
jsancio@confluent.io> wrote:

> Hi all,
>
> We would like to extend the "preferred leader election" RPC for the admin
> client to also support unclean leader elections.
>
> The KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC
>
> Thanks!
> -Jose
>


-- 
-Jose