You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gwen Shapira <gw...@confluent.io> on 2021/02/07 05:13:28 UTC

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

I realize that I'm very very late to a pretty big party, but I was
just looking at the new code (super readable, btw), and noticed that
the inverse API for registerBroker() is called decomissionBroker().
It would be way clearer that we are undoing a registration is we
rename it to unregisterBroker() (and the AdminClient API that matches,
to avoid more confusion).

Any objections?

On Fri, Dec 18, 2020 at 10:08 AM Jun Rao <ju...@confluent.io> wrote:
>
> Hi, Colin,
>
> Thanks for the reply. The KIP looks good to me now. Thanks for your
> diligence.
>
> Jun
>
> On Thu, Dec 17, 2020 at 5:15 PM Colin McCabe <cm...@apache.org> wrote:
>
> > On Thu, Dec 17, 2020, at 17:02, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. Sounds good. One more comment.
> > >
> > > 231. Currently, we have sasl.mechanism.inter.broker.protocol for inter
> > > broker communication. It seems that we need a similar config for
> > specifying
> > > the sasl mechanism for the communication between the broker and the
> > > controller.
> > >
> >
> > Hi Jun,
> >
> > Yeah... sounds like we could use a new configuration key for this.  I
> > added sasl.mechanism.controller.protocol for this.
> >
> > regards,
> > Colin
> >
> > > Jun
> > >
> > > On Thu, Dec 17, 2020 at 2:29 PM Colin McCabe <cm...@apache.org> wrote:
> > >
> > > > On Thu, Dec 17, 2020, at 10:19, Jun Rao wrote:
> > > > > Hi, Colin,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 211. Hmm, I still don't see a clear benefit of registering a broker
> > > > before
> > > > > recovery. It's possible for the recovery to take time. However,
> > during
> > > > the
> > > > > recovery mode, it seems the broker will still be in the fenced mode
> > and
> > > > > won't be able to do work for the clients. So, registering and
> > > > heartbeating
> > > > > early seems to only add unnecessary overhead. For your point on topic
> > > > > creation, I thought we now allow replicas to be created on
> > > > > unregistered brokers.
> > > > >
> > > >
> > > > Hi Jun,
> > > >
> > > > Thanks again for the reviews.
> > > >
> > > > Topics cannot be created on unregistered brokers.  They can be created
> > on
> > > > registered but fenced brokers.  So for that reason I think it makes
> > sense
> > > > to register as early as possible.
> > > >
> > > > > 230. Currently, we do have a ControllerId field in MetadataResponse.
> > In
> > > > the
> > > > > early discussion, I thought that we want to expose the controller for
> > > > > debugging purposes, but not used by the client library.
> > > > >
> > > >
> > > > The current plan is that we will expose the controller node ID, but the
> > > > controller will not be included in the list of nodes in the metadata
> > > > response.
> > > >
> > > > It's not really possible to include the controller in that list of
> > nodes
> > > > because the controller may not share the same set of listeners as the
> > > > broker.  So, for example, maybe the controller endpoint is using a
> > > > different type of security than the broker.  So while we could pass
> > back a
> > > > hostname and port, the client would have no way to connect since it
> > doesn't
> > > > know what security settings to use.
> > > >
> > > > regards,
> > > > Colin
> > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Dec 16, 2020 at 9:13 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > > > >
> > > > > > On Wed, Dec 16, 2020, at 18:13, Jun Rao wrote:
> > > > > > > Hi, Colin,
> > > > > > >
> > > > > > > Thanks for the reply. Just a couple of more comments.
> > > > > > >
> > > > > > > 211. Currently, the broker only registers itself in ZK after log
> > > > > > recovery.
> > > > > > > Is there any benefit to change that? As you mentioned, the broker
> > > > can't
> > > > > > do
> > > > > > > much before completing log recovery.
> > > > > > >
> > > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Previously, it wasn't possible to register in ZK without
> > immediately
> > > > > > getting added to the MetadataResponse.  So I think that's the main
> > > > reason
> > > > > > why registration was delayed until after log recovery.  Since that
> > > > > > constraint doesn't exist any more, there seems to be no reason to
> > delay
> > > > > > registration.
> > > > > >
> > > > > > I think delaying registration would have some major downsides.  If
> > log
> > > > > > recovery takes a while, that's a longer window during which someone
> > > > else
> > > > > > could register a broker with the same ID.  Having parts of the
> > cluster
> > > > > > missing for a while gives up some of the benefit of separating
> > > > registration
> > > > > > from fencing.  For example, if a broker somehow gets unregistered
> > and
> > > > we
> > > > > > want to re-register it, but we have to wait for a 10 minute log
> > > > recovery to
> > > > > > do that, that could be a window during which topics can't be
> > created
> > > > that
> > > > > > need to be on that broker, and so forth.
> > > > > >
> > > > > > > 230. Regarding MetadataResponse, there is a slight awkwardness.
> > We
> > > > return
> > > > > > > rack for each node. However, if that node is for the controller,
> > the
> > > > rack
> > > > > > > field is not really relevant. Should we clean it up here or in
> > > > another
> > > > > > KIP
> > > > > > > like KIP-700?
> > > > > >
> > > > > > Oh, controllers don't appear in the MetadataResponses returned to
> > > > clients,
> > > > > > since clients can't access them.  I should have been more clear
> > about
> > > > that
> > > > > > in the KIP-- I added a sentence to "Networking" describing this.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe <cmccabe@apache.org
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > > > > > > > > Hi, Colin,
> > > > > > > > >
> > > > > > > > > Thanks for the reply. A few follow up comments.
> > > > > > > > >
> > > > > > > > > 211. When does the broker send the BrokerRegistration
> > request to
> > > > the
> > > > > > > > > controller? Is it after the recovery phase? If so, at that
> > > > point, the
> > > > > > > > > broker has already caught up on the metadata (in order to
> > clean
> > > > up
> > > > > > > > deleted
> > > > > > > > > partitions). Then, it seems we don't need the ShouldFence
> > field
> > > > > > > > > in BrokerHeartbeatRequest?
> > > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Thanks again for the reviews.
> > > > > > > >
> > > > > > > > The broker sends the registration request as soon as it starts
> > > > up.  It
> > > > > > > > cannot wait until the recovery phase is over since sometimes
> > log
> > > > > > recovery
> > > > > > > > can take quite a long time.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 213. kafka-cluster.sh
> > > > > > > > > 213.1 For the decommision example, should the command take a
> > > > broker
> > > > > > id?
> > > > > > > >
> > > > > > > > Yes, the example should have a broker id.  Fixed.
> > > > > > > >
> > > > > > > > > 213.2 Existing tools use the "--" command line option (e.g.
> > > > > > kafka-topics
> > > > > > > > > --list --topic test). Should we follow the same convention
> > > > > > > > > for kafka-cluster.sh (and kafka-storage.sh)?
> > > > > > > >
> > > > > > > > Hmm.  I don't think argparse4j supports using double dashes to
> > > > identify
> > > > > > > > subcommands.  I think it might be confusing as well, since the
> > > > > > subcommand
> > > > > > > > must come first, unlike a plain old argument which can be
> > anywhere
> > > > on
> > > > > > the
> > > > > > > > command line.
> > > > > > > >
> > > > > > > > > 213.3 Should we add an admin api for broker decommission so
> > that
> > > > > > this can
> > > > > > > > > be done programmatically?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, there is an admin client API for decommissioning as well.
> > > > > > > >
> > > > > > > > > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the
> > > > > > request was
> > > > > > > > > sent to is not the controller". I thought the clients never
> > send
> > > > a
> > > > > > > > request
> > > > > > > > > to the controller directly now and the broker will forward
> > it to
> > > > the
> > > > > > > > > controller?
> > > > > > > > >
> > > > > > > >
> > > > > > > > If the controller moved recently, it's possible that the broker
> > > > could
> > > > > > send
> > > > > > > > to a controller that has just recently become inactive.  In
> > that
> > > > case
> > > > > > > > NOT_CONTROLLER would be returned.  (A standby controller
> > returns
> > > > > > > > NOT_CONTROLLER for most APIs).
> > > > > > > >
> > > > > > > > > 221. Could we add the required ACL for the new requests?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Good point.  I added the required ACL for each new RPC.
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > > > > > > > > Hi, Colin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > Thanks for the comments.
> > > > > > > > > >
> > > > > > > > > > > 206. "RemoveTopic is the last step, that scrubs all
> > metadata
> > > > > > about
> > > > > > > > the
> > > > > > > > > > > topic.  In order to get to that last step, the topic data
> > > > needs
> > > > > > to
> > > > > > > > > > removed
> > > > > > > > > > > from all brokers (after each broker notices that the
> > topic is
> > > > > > being
> > > > > > > > > > > deleted)." Currently, this is done based on the response
> > of
> > > > > > > > > > > StopReplicaRequest. Since the controller no longer sends
> > this
> > > > > > > > request,
> > > > > > > > > > how
> > > > > > > > > > > does the controller know that the data for the deleted
> > topic
> > > > has
> > > > > > > > > > > been removed in the brokers?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > That's a good point... definitely an oversight.
> > > > > > > > > >
> > > > > > > > > > It seems complex to force the controller to track when log
> > > > > > directories
> > > > > > > > > > have been deleted.  Let's just assume that KIP-516 has been
> > > > > > > > implemented,
> > > > > > > > > > and track them by UUID.  Then we can just have a single
> > topic
> > > > > > deletion
> > > > > > > > > > record.
> > > > > > > > > >
> > > > > > > > > > I added a section on "topic identifiers" describing this.
> > > > > > > > > >
> > > > > > > > > > > 210. Thanks for the explanation. Sounds good to me.
> > > > > > > > > > >
> > > > > > > > > > > 211. I still don't see an example when ShouldFence is
> > set to
> > > > > > true in
> > > > > > > > > > > BrokerHeartbeatReques. Could we add one?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It is sent to true when the broker is first starting up and
> > > > > > doesn't yet
> > > > > > > > > > want to be unfenced.  I added a longer explanation of this
> > in
> > > > the
> > > > > > > > "Broker
> > > > > > > > > > Leases" section.
> > > > > > > > > >
> > > > > > > > > > > 213. The KIP now allows replicas to be assigned on
> > brokers
> > > > that
> > > > > > are
> > > > > > > > > > fenced,
> > > > > > > > > > > which is an improvement. How do we permanently remove a
> > > > broker
> > > > > > (e.g.
> > > > > > > > > > > cluster shrinking) to prevent it from being used for
> > future
> > > > > > replica
> > > > > > > > > > > assignments?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is a fair point.  I will create a kafka-cluster.sh
> > script
> > > > > > which
> > > > > > > > can
> > > > > > > > > > do this, plus a DecommissionBrokerRequest.
> > > > > > > > > >
> > > > > > > > > > As a bonus the kafka-cluster.sh script can help people
> > find the
> > > > > > > > cluster ID
> > > > > > > > > > of brokers, something that people have wanted a tool for
> > in the
> > > > > > past.
> > > > > > > > > >
> > > > > > > > > > > 214. "Currently, when a node is down, all of its ZK
> > > > registration
> > > > > > > > > > > information is gone.  But  we need this information in
> > order
> > > > to
> > > > > > > > > > understand
> > > > > > > > > > > things like whether the replicas of a particular
> > partition
> > > > are
> > > > > > > > > > > well-balanced across racks." This is not quite true.
> > > > Currently,
> > > > > > even
> > > > > > > > when
> > > > > > > > > > > ZK registration is gone, the existing replica assignment
> > is
> > > > still
> > > > > > > > > > available
> > > > > > > > > > > in the metadata response.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I agree that the existing replica assignment is still
> > > > available.
> > > > > > But
> > > > > > > > that
> > > > > > > > > > just tells you partition X is on nodes A, B, and C.  If you
> > > > don't
> > > > > > have
> > > > > > > > the
> > > > > > > > > > ZK registration for one or more of A, B, or C then you
> > don't
> > > > know
> > > > > > > > whether
> > > > > > > > > > we are following the policy of "two replicas on one rack,
> > one
> > > > > > replica
> > > > > > > > on
> > > > > > > > > > another."  Or any other more complex rack placement policy
> > > > that you
> > > > > > > > might
> > > > > > > > > > have.
> > > > > > > > > >
> > > > > > > > > > best,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe <
> > > > cmccabe@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the reply. A few more follow up comments.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 210. initial.broker.registration.timeout.ms: The
> > default
> > > > > > value
> > > > > > > > is
> > > > > > > > > > 90sec,
> > > > > > > > > > > > > which seems long. If a broker fails the registration
> > > > because
> > > > > > of
> > > > > > > > > > incorrect
> > > > > > > > > > > > > configs, we want to fail faster. In comparison, the
> > > > defaults
> > > > > > for
> > > > > > > > > > > > > zookeeper.connection.timeout.ms is 18 secs.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > >
> > > > > > > > > > > > I agree that the initial connection timeout here is
> > longer
> > > > than
> > > > > > > > what we
> > > > > > > > > > > > had with ZK.  The main reason I selected a slightly
> > longer
> > > > > > timeout
> > > > > > > > > > here is
> > > > > > > > > > > > to handle the case where the controllers and the
> > brokers
> > > > are
> > > > > > > > > > co-located.
> > > > > > > > > > > > For example, if you have a 3 node cluster and all three
> > > > nodes
> > > > > > are
> > > > > > > > > > > > controllers+brokers, when you first bring up the
> > cluster,
> > > > we
> > > > > > will
> > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > > > stand up the controller quorum and then handle broker
> > > > > > > > registrations.
> > > > > > > > > > > > Although we believe Raft will start up relatively
> > quickly,
> > > > it's
> > > > > > > > good to
> > > > > > > > > > > > leave some extra margin here.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think there's a big disadvantage to having a
> > > > slightly
> > > > > > > > longer
> > > > > > > > > > > > timeout here.  After all, starting up the brokers with
> > no
> > > > > > > > controllers
> > > > > > > > > > is an
> > > > > > > > > > > > admin mistake, which we don't expect to see very often.
> > > > Maybe
> > > > > > > > let's
> > > > > > > > > > set it
> > > > > > > > > > > > to 60 seconds for now and maybe see if we can tweak it
> > in
> > > > the
> > > > > > > > future if
> > > > > > > > > > > > that turns out to be too long or short?
> > > > > > > > > > > >
> > > > > > > > > > > > > 211. "Once they are all moved, the controller
> > responds
> > > > to the
> > > > > > > > > > heartbeat
> > > > > > > > > > > > > with a nextState of SHUTDOWN." It seems that
> > nextState
> > > > is no
> > > > > > > > longer
> > > > > > > > > > > > > relevant. Also, could you add a bit more explanation
> > on
> > > > when
> > > > > > > > > > ShouldFence
> > > > > > > > > > > > is
> > > > > > > > > > > > > set to true in BrokerHeartbeatRequest?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Good catch.  I removed the obsolete section referring
> > to
> > > > > > nextState
> > > > > > > > and
> > > > > > > > > > > > added a reference to the new boolean.  I also added
> > some
> > > > more
> > > > > > > > > > information
> > > > > > > > > > > > about ShouldFence and about the rationale for
> > separating
> > > > > > fencing
> > > > > > > > from
> > > > > > > > > > > > registration.
> > > > > > > > > > > >
> > > > > > > > > > > > > 212. Related to the above, what does the broker do if
> > > > > > IsFenced is
> > > > > > > > > > true in
> > > > > > > > > > > > > the BrokerHeartbeatResponse? Will the broker do the
> > same
> > > > > > thing on
> > > > > > > > > > > > receiving
> > > > > > > > > > > > > a FenceBrokerRecord from the metadata log?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The broker only checks this boolean during the startup
> > > > process.
> > > > > > > > After
> > > > > > > > > > the
> > > > > > > > > > > > startup process is finished, it ignores this boolean.
> > > > > > > > > > > >
> > > > > > > > > > > > The broker uses fence / unfence records in the metadata
> > > > log to
> > > > > > > > > > determine
> > > > > > > > > > > > which brokers should appear in its MetadataResponse.
> > > > > > > > > > > >
> > > > > > > > > > > > best,
> > > > > > > > > > > > Colin
> > > > > > > > > > > >
> > > > > > > > > > > > > Jun
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe <
> > > > > > cmccabe@apache.org
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The KIP says that "brokers which are fenced will
> > not
> > > > > > appear
> > > > > > > > in
> > > > > > > > > > > > > > > MetadataResponses.  So clients that have
> > up-to-date
> > > > > > metadata
> > > > > > > > will
> > > > > > > > > > > > not try
> > > > > > > > > > > > > > > to contact fenced brokers.", which is fine, but
> > it
> > > > > > doesn't
> > > > > > > > seem
> > > > > > > > > > to
> > > > > > > > > > > > > > > elaborate on what happens for clients which try
> > to
> > > > > > contact a
> > > > > > > > > > broker
> > > > > > > > > > > > > > (using
> > > > > > > > > > > > > > > stale metadata, for example) and find it in the
> > > > > > > > > > registered-but-fenced
> > > > > > > > > > > > > > > state.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have dropped the broker-side fencing from this
> > > > > > proposal.  So
> > > > > > > > now
> > > > > > > > > > the
> > > > > > > > > > > > > > fencing is basically the same as today's fencing:
> > it's
> > > > > > > > > > controller-side
> > > > > > > > > > > > > > only.  That means that clients using stale
> > metadata can
> > > > > > contact
> > > > > > > > > > fenced
> > > > > > > > > > > > > > brokers and communicate with them.  The only case
> > > > where the
> > > > > > > > broker
> > > > > > > > > > > > will be
> > > > > > > > > > > > > > inaccessible is when it hasn't finished starting
> > yet
> > > > > > (i.e., has
> > > > > > > > > > not yet
> > > > > > > > > > > > > > attained RUNNING state.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Just like today, we expect the safeguards built
> > into
> > > > the
> > > > > > > > > > replication
> > > > > > > > > > > > > > protocol to prevent the worst corner cases that
> > could
> > > > > > result
> > > > > > > > from
> > > > > > > > > > > > this.  I
> > > > > > > > > > > > > > do think we will probably take up this issue later
> > and
> > > > > > find a
> > > > > > > > > > better
> > > > > > > > > > > > > > solution for client-side metadata, but this KIP is
> > big
> > > > > > enough
> > > > > > > > > > as-is.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You said previously that the broker will respond
> > > > with a
> > > > > > > > retriable
> > > > > > > > > > > > > > > error, which is again fine, but you didn't
> > answer my
> > > > > > question
> > > > > > > > > > about
> > > > > > > > > > > > which
> > > > > > > > > > > > > > > error code you've chosen for this. I realise that
> > > > this
> > > > > > > > doesn't
> > > > > > > > > > really
> > > > > > > > > > > > > > > affect the correctness of the behaviour, but I'm
> > not
> > > > > > aware
> > > > > > > > of any
> > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > error code which is a good fit. So it would be
> > good
> > > > to
> > > > > > > > understand
> > > > > > > > > > > > about
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > you're making this backward compatible for
> > clients.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Many thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe <
> > > > > > > > cmccabe@apache.org
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the reply. Just a couple of more
> > > > comments
> > > > > > > > below.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 210. Since we are deprecating
> > > > > > > > > > zookeeper.connection.timeout.ms,
> > > > > > > > > > > > > > should we
> > > > > > > > > > > > > > > > > add a new config to bound the time for a
> > broker
> > > > to
> > > > > > > > connect
> > > > > > > > > > to the
> > > > > > > > > > > > > > > > > controller during starting up?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Good idea.  I added
> > > > > > initial.broker.registration.timeout.ms
> > > > > > > > for
> > > > > > > > > > > > this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 211. BrokerHeartbeat no longer has the state
> > > > field
> > > > > > in the
> > > > > > > > > > > > > > > > request/response.
> > > > > > > > > > > > > > > > > However, (a) the controller shutdown section
> > > > still
> > > > > > has
> > > > > > > > "In
> > > > > > > > > > its
> > > > > > > > > > > > > > periodic
> > > > > > > > > > > > > > > > > heartbeats, the broker asks the controller
> > if it
> > > > can
> > > > > > > > > > transition
> > > > > > > > > > > > into
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > SHUTDOWN state.  This motivates the
> > controller to
> > > > > > move
> > > > > > > > all
> > > > > > > > > > of the
> > > > > > > > > > > > > > leaders
> > > > > > > > > > > > > > > > > off of that broker.  Once they are all
> > moved, the
> > > > > > > > controller
> > > > > > > > > > > > > > responds to
> > > > > > > > > > > > > > > > > the heartbeat with a nextState of SHUTDOWN.";
> > > > (2) the
> > > > > > > > > > > > description of
> > > > > > > > > > > > > > > > > BrokerHeartbeat still references
> > currentState and
> > > > > > > > > > targetState.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks.  I've made these sections clearer and
> > > > removed
> > > > > > the
> > > > > > > > > > obsolete
> > > > > > > > > > > > > > > > references to sending states.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Dec 11, 2020 at 1:33 PM Colin McCabe
> > <
> > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Dec 9, 2020, at 10:10, Jun Rao
> > wrote:
> > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the update. A few more follow
> > up
> > > > > > comments.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks again for the review.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 100. FailedReplicaRecord: Since this is
> > > > reported
> > > > > > by
> > > > > > > > each
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > independently, perhaps we could use a
> > more
> > > > > > concise
> > > > > > > > > > > > representation
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > a top level broker field, an array of
> > topics,
> > > > > > which
> > > > > > > > has
> > > > > > > > > > an
> > > > > > > > > > > > array
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > partitions.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The issue is that there is a size limit on
> > the
> > > > each
> > > > > > > > record.
> > > > > > > > > > > > > > Putting
> > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > of the partitions of a log directory into a
> > > > single
> > > > > > > > record
> > > > > > > > > > would
> > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > break that in many cases.  Still, we can
> > > > optimize a
> > > > > > > > bit by
> > > > > > > > > > > > having
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > array
> > > > > > > > > > > > > > > > > > of partition IDs, since nearly all the
> > time, we
> > > > > > have
> > > > > > > > more
> > > > > > > > > > than
> > > > > > > > > > > > one
> > > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > > > > same topic.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 200. Sounds good. If we remove the
> > > > broker-side
> > > > > > > > fencing
> > > > > > > > > > > > logic, do
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > still keep FENCED in broker state? Do we
> > > > plan to
> > > > > > > > expose
> > > > > > > > > > the
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > states
> > > > > > > > > > > > > > > > > > > through the existing BrokerState metric
> > and
> > > > if
> > > > > > so,
> > > > > > > > what
> > > > > > > > > > are
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > values
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > the new states?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > No, we don't need FENCED any more.  I have
> > > > removed
> > > > > > it
> > > > > > > > from
> > > > > > > > > > the
> > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The new states are very similar to the
> > current
> > > > > > ones,
> > > > > > > > > > actually.
> > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > no new states or removed ones.  The main
> > > > change in
> > > > > > the
> > > > > > > > > > broker
> > > > > > > > > > > > state
> > > > > > > > > > > > > > > > machine
> > > > > > > > > > > > > > > > > > is that the
> > RECOVERING_FROM_UNCLEAN_SHUTDOWN
> > > > state
> > > > > > has
> > > > > > > > been
> > > > > > > > > > > > > > renamed to
> > > > > > > > > > > > > > > > > > RECOVERY.  Also, unlike previously, the
> > broker
> > > > will
> > > > > > > > always
> > > > > > > > > > pass
> > > > > > > > > > > > > > through
> > > > > > > > > > > > > > > > > > RECOVERY (although it may only stay in this
> > > > state
> > > > > > for a
> > > > > > > > > > very
> > > > > > > > > > > > short
> > > > > > > > > > > > > > > > amount
> > > > > > > > > > > > > > > > > > of time).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 201. This may be fine too. Could we
> > document
> > > > what
> > > > > > > > happens
> > > > > > > > > > > > when
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > broker.id/controller.id in
> > > > metadata.properties
> > > > > > don't
> > > > > > > > > > match
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > config when the broker starts up?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I added some documentation about this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 204. There is still "The highest metadata
> > > > offset
> > > > > > > > which
> > > > > > > > > > the
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > has not
> > > > > > > > > > > > > > > > > > > reached" referenced under
> > BrokerRegistration.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It should be CurrentMetadataOffset.  Fixed.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 206. Is that separate step needed given
> > > > KIP-516?
> > > > > > With
> > > > > > > > > > > > KIP-516 (
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsr
> > > > > > > > > > > > > > > > > > ),
> > > > > > > > > > > > > > > > > > > we don't need to wait for the topic data
> > to
> > > > be
> > > > > > > > removed
> > > > > > > > > > from
> > > > > > > > > > > > all
> > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > > before removing the topic metadata. The
> > > > > > combination
> > > > > > > > of
> > > > > > > > > > > > unmatching
> > > > > > > > > > > > > > > > > > > topicId
> > > > > > > > > > > > > > > > > > > or the missing topicId from the metadata
> > is
> > > > > > enough
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > clean
> > > > > > > > > > > > > > > > > > > up deleted topics asynchronously.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It won't be needed once KIP-516 is
> > adopted, but
> > > > > > this
> > > > > > > > hasn't
> > > > > > > > > > > > been
> > > > > > > > > > > > > > > > > > implemented yet.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Dec 8, 2020 at 5:27 PM Colin
> > McCabe <
> > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Thu, Dec 3, 2020, at 16:37, Jun Rao
> > > > wrote:
> > > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the updated KIP. A few
> > more
> > > > > > comments
> > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks again for the reviews.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 80.2 For deprecated configs, we need
> > to
> > > > > > include
> > > > > > > > > > > > zookeeper.*
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > broker.id.generation.enable.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Added.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 83.1 If a broker is down, does the
> > > > controller
> > > > > > > > keep
> > > > > > > > > > the
> > > > > > > > > > > > > > previously
> > > > > > > > > > > > > > > > > > > > > registered broker epoch forever? If
> > not,
> > > > how
> > > > > > long
> > > > > > > > > > does
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > > > > > it? What does the controller do when
> > > > > > receiving a
> > > > > > > > > > broker
> > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > > > > with an unfound broker epoch?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Yes, the controller keeps the previous
> > > > > > registration
> > > > > > > > > > > > forever.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Broker heartbeat requests with an
> > incorrect
> > > > > > broker
> > > > > > > > > > epoch
> > > > > > > > > > > > will
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > rejected
> > > > > > > > > > > > > > > > > > > > with STALE_BROKER_EPOCH.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 100. Have you figured out if we need
> > to
> > > > add
> > > > > > a new
> > > > > > > > > > record
> > > > > > > > > > > > > > type for
> > > > > > > > > > > > > > > > > > > > reporting
> > > > > > > > > > > > > > > > > > > > > partitions on failed disks?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I added FailedReplicaRecord to reflect
> > the
> > > > case
> > > > > > > > where a
> > > > > > > > > > > > JBOD
> > > > > > > > > > > > > > > > directory
> > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > failed, leading to failed replicas.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 102. For debugging purposes,
> > sometimes
> > > > it's
> > > > > > > > useful to
> > > > > > > > > > > > read
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > topic using tools like
> > console-consumer.
> > > > > > Should
> > > > > > > > we
> > > > > > > > > > > > support
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > > > > > > > > how?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > For now, we have the ability to read
> > the
> > > > > > metadata
> > > > > > > > logs
> > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > dump-logs
> > > > > > > > > > > > > > > > > > > > tool.  I think we will come up with
> > some
> > > > other
> > > > > > > > tools
> > > > > > > > > > in the
> > > > > > > > > > > > > > future
> > > > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > get experience.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 200. "brokers which are fenced will
> > not
> > > > > > appear in
> > > > > > > > > > > > > > > > MetadataResponses.
> > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > broker will not respond to these
> > > > requests--
> > > > > > > > instead,
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > > > > disconnect." If the controller is
> > > > > > partitioned off
> > > > > > > > > > from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > brokers,
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > design will cause every broker to
> > stop
> > > > > > accepting
> > > > > > > > new
> > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > requests. In
> > > > > > > > > > > > > > > > > > > > > contrast, if ZK is partitioned off,
> > the
> > > > > > existing
> > > > > > > > > > > > behavior is
> > > > > > > > > > > > > > > > that the
> > > > > > > > > > > > > > > > > > > > > brokers can continue to work based
> > on the
> > > > > > last
> > > > > > > > known
> > > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > > > > So, I
> > > > > > > > > > > > > > > > > > am
> > > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > sure if we should change the existing
> > > > > > behavior
> > > > > > > > > > because
> > > > > > > > > > > > of the
> > > > > > > > > > > > > > > > bigger
> > > > > > > > > > > > > > > > > > > > impact
> > > > > > > > > > > > > > > > > > > > > in the new one. Another option is to
> > > > keep the
> > > > > > > > > > existing
> > > > > > > > > > > > > > behavior
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > > a metric for fenced brokers so that
> > the
> > > > > > operator
> > > > > > > > > > could be
> > > > > > > > > > > > > > > > alerted.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I'm skeptical about how well running
> > > > without ZK
> > > > > > > > > > currently
> > > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > > However,
> > > > > > > > > > > > > > > > > > > > I will move the broker-side fencing
> > into a
> > > > > > > > follow-up
> > > > > > > > > > KIP.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > > > KIP is
> > > > > > > > > > > > > > > > > > > > already pretty large and there is no
> > hard
> > > > > > > > dependency on
> > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > also be other ways of accomplishing the
> > > > > > positive
> > > > > > > > > > effects of
> > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > broker-side fencing, so more
> > discussion is
> > > > > > needed.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 201. I read Ron's comment, but I am
> > > > still not
> > > > > > > > sure
> > > > > > > > > > the
> > > > > > > > > > > > > > benefit of
> > > > > > > > > > > > > > > > > > keeping
> > > > > > > > > > > > > > > > > > > > > broker.id and controller.id in
> > > > > > meta.properties.
> > > > > > > > It
> > > > > > > > > > seems
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we are
> > > > > > > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > > > > > duplicating the same info in two
> > places
> > > > and
> > > > > > have
> > > > > > > > the
> > > > > > > > > > > > > > additional
> > > > > > > > > > > > > > > > > > burden of
> > > > > > > > > > > > > > > > > > > > > making sure the values in the two
> > places
> > > > are
> > > > > > > > > > consistent.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I think the reasoning is that having
> > > > broker.id
> > > > > > > > > > protects us
> > > > > > > > > > > > > > against
> > > > > > > > > > > > > > > > > > > > accidentally bringing up a broker with
> > a
> > > > disk
> > > > > > from
> > > > > > > > a
> > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > broker.  I
> > > > > > > > > > > > > > > > > > > > don't feel strongly about this but it
> > > > seemed
> > > > > > > > simpler to
> > > > > > > > > > > > keep
> > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 202.
> > > > controller.connect.security.protocol: Is
> > > > > > > > this
> > > > > > > > > > needed
> > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > controller.listener.names and
> > > > > > > > > > > > listener.security.protocol.map
> > > > > > > > > > > > > > > > imply
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > security protocol already?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > You're right, this isn't needed.  I'll
> > > > remove
> > > > > > it.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 203.
> > registration.heartbeat.interval.ms:
> > > > It
> > > > > > > > > > defaults to
> > > > > > > > > > > > 2k.
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > uses
> > > > > > > > > > > > > > > > > > 1/3
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > the session timeout for heartbeat.
> > So,
> > > > given
> > > > > > the
> > > > > > > > > > default
> > > > > > > > > > > > 18k
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > registration.lease.timeout.ms,
> > should we
> > > > > > default
> > > > > > > > > > > > > > > > > > > > > registration.heartbeat.interval.ms
> > to
> > > > 6k?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 6 seconds seems like a pretty long time
> > > > between
> > > > > > > > > > > > heartbeats.  It
> > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > useful to know when a broker is missing
> > > > > > heartbeats,
> > > > > > > > > > with
> > > > > > > > > > > > less
> > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > that.  I provisionally set it to 3
> > seconds
> > > > (we
> > > > > > can
> > > > > > > > > > always
> > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > later...)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I also changed the name of these
> > > > > > configurations to
> > > > > > > > "
> > > > > > > > > > > > > > > > > > > > broker.heartbeat.interval.ms" and "
> > > > > > > > > > > > > > broker.registration.timeout.ms"
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > try
> > > > > > > > > > > > > > > > > > > > to clarify them a bit.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 204. "The highest metadata offset
> > which
> > > > the
> > > > > > > > broker
> > > > > > > > > > has
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > > reached."
> > > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > > seems this should be "has reached".
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I changed this to "one more than the
> > > > highest
> > > > > > > > metadata
> > > > > > > > > > > > offset
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > broker has reached."
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 205. UnfenceBrokerRecord and
> > > > > > > > UnregisterBrokerRecord:
> > > > > > > > > > To
> > > > > > > > > > > > me,
> > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > > seem to
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > the same. Do we need both?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Unregistration means that the broker
> > has
> > > > been
> > > > > > > > removed
> > > > > > > > > > from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > cluster.
> > > > > > > > > > > > > > > > > > > > That is different than unfencing, which
> > > > marks
> > > > > > the
> > > > > > > > > > broker as
> > > > > > > > > > > > > > active.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 206. TopicRecord: The Deleting field
> > is
> > > > used
> > > > > > to
> > > > > > > > > > indicate
> > > > > > > > > > > > > > that the
> > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > being deleted. I am wondering if
> > this is
> > > > > > really
> > > > > > > > > > needed
> > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > RemoveTopic
> > > > > > > > > > > > > > > > > > > > > already indicates the same thing.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > RemoveTopic is the last step, that
> > scrubs
> > > > all
> > > > > > > > metadata
> > > > > > > > > > > > about
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > topic.
> > > > > > > > > > > > > > > > > > > > In order to get to that last step, the
> > > > topic
> > > > > > data
> > > > > > > > > > needs to
> > > > > > > > > > > > > > removed
> > > > > > > > > > > > > > > > > > from all
> > > > > > > > > > > > > > > > > > > > brokers (after each broker notices
> > that the
> > > > > > topic
> > > > > > > > is
> > > > > > > > > > being
> > > > > > > > > > > > > > > > deleted).
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Dec 2, 2020 at 2:50 PM Colin
> > > > McCabe <
> > > > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 2, 2020, at 14:07, Ron
> > > > > > Dagostino
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks for the
> > updates.
> > > > It's
> > > > > > now
> > > > > > > > > > clear
> > > > > > > > > > > > to me
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > > > > > > keep their broker epoch for the
> > life
> > > > of
> > > > > > their
> > > > > > > > > > JVM --
> > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > > register
> > > > > > > > > > > > > > > > > > > > > > > once, get their broker epoch in
> > the
> > > > > > > > response, and
> > > > > > > > > > > > then
> > > > > > > > > > > > > > never
> > > > > > > > > > > > > > > > > > > > > > > re-register again.  Brokers may
> > get
> > > > > > fenced,
> > > > > > > > but
> > > > > > > > > > they
> > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > broker epoch for the life of
> > their
> > > > JVM.
> > > > > > The
> > > > > > > > > > > > incarnation
> > > > > > > > > > > > > > ID
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > > > kept for the life of the JVM but
> > is
> > > > > > > > generated by
> > > > > > > > > > the
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > itself
> > > > > > > > > > > > > > > > > > > > > > > upon startup, and the
> > combination of
> > > > the
> > > > > > two
> > > > > > > > > > allows
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > Controller to
> > > > > > > > > > > > > > > > > > > > > > > act idempotently if any
> > > > previously-sent
> > > > > > > > > > registration
> > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > gets
> > > > > > > > > > > > > > > > > > > > > > > lost.  Makes sense.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks, Ron.  That's a good
> > summary.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > One thing I wonder about is if it
> > > > might
> > > > > > be
> > > > > > > > > > helpful
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > > > > > send the Cluster ID as determined
> > > > from
> > > > > > its
> > > > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > > > > registration request.  Does it
> > even
> > > > make
> > > > > > > > sense
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > > > > > successfully register and enter
> > the
> > > > > > Fenced
> > > > > > > > state
> > > > > > > > > > if
> > > > > > > > > > > > it
> > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > wrong
> > > > > > > > > > > > > > > > > > > > > > > Cluster ID?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Yeah, that's a good idea.  Let's
> > have
> > > > the
> > > > > > > > broker
> > > > > > > > > > pass
> > > > > > > > > > > > its
> > > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > > ID in
> > > > > > > > > > > > > > > > > > > > > > the registration RPC, and then
> > > > > > registration can
> > > > > > > > > > fail
> > > > > > > > > > > > if the
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > configured for the wrong cluster.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >  The nextMetadatOffset value
> > that the
> > > > > > broker
> > > > > > > > > > > > communicates
> > > > > > > > > > > > > > > > > > > > > > > in its registration request only
> > has
> > > > > > meaning
> > > > > > > > > > within
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > correct
> > > > > > > > > > > > > > > > > > > > > > > cluster, so it feels to me that
> > the
> > > > > > > > Controller
> > > > > > > > > > should
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > perform this sanity check.
> > There is
> > > > > > > > currently
> > > > > > > > > > > > (pre-KIP
> > > > > > > > > > > > > > 500)
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > check
> > > > > > > > > > > > > > > > > > > > > > > in the broker to make sure its
> > > > configured
> > > > > > > > > > cluster ID
> > > > > > > > > > > > > > matches
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > > > > > stored in ZooKeeper, and we will
> > > > have to
> > > > > > > > perform
> > > > > > > > > > this
> > > > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > > > > > > somewhere in the KIP-500 world.
> > If
> > > > the
> > > > > > > > > > Controller
> > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > do it
> > > > > > > > > > > > > > > > > > > > > > > within the registration request
> > then
> > > > the
> > > > > > > > broker
> > > > > > > > > > will
> > > > > > > > > > > > > > have to
> > > > > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > > > > metadata request to the
> > Controller,
> > > > > > retrieve
> > > > > > > > the
> > > > > > > > > > > > Cluster
> > > > > > > > > > > > > > ID,
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > perform the check itself.  It
> > feels
> > > > to me
> > > > > > > > that it
> > > > > > > > > > > > might
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > better for
> > > > > > > > > > > > > > > > > > > > > > > the Controller to just do it, and
> > > > then
> > > > > > the
> > > > > > > > broker
> > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > have to
> > > > > > > > > > > > > > > > > > > > > > > worry about it anymore once it
> > > > > > successfully
> > > > > > > > > > > > registers.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I also have a question about the
> > > > > > broker.id
> > > > > > > > > > value and
> > > > > > > > > > > > > > > > > > > > meta.properties.
> > > > > > > > > > > > > > > > > > > > > > > The KIP now says "In version 0 of
> > > > > > > > > > meta.properties,
> > > > > > > > > > > > there
> > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > broker.id field.  Version 1
> > does not
> > > > > > have
> > > > > > > > this
> > > > > > > > > > > > field.
> > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > is no
> > > > > > > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > > > > > > needed because we no longer
> > support
> > > > > > dynamic
> > > > > > > > > > broker id
> > > > > > > > > > > > > > > > > > assignment."
> > > > > > > > > > > > > > > > > > > > > > > But then there is an example
> > version
> > > > 1
> > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > shows
> > > > > > > > > > > > > > > > > > > > > > > the broker.id value.  I actually
> > > > wonder
> > > > > > if
> > > > > > > > > > maybe the
> > > > > > > > > > > > > > > > broker.id
> > > > > > > > > > > > > > > > > > value
> > > > > > > > > > > > > > > > > > > > > > > would be good to keep in the
> > version
> > > > 1
> > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > currently (pre-KIP 500, version
> > 0)
> > > > acts
> > > > > > as a
> > > > > > > > > > sanity
> > > > > > > > > > > > > > check to
> > > > > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > > > > > > > the broker is using the correct
> > log
> > > > > > > > directory.
> > > > > > > > > > > > Similarly
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > controller.id value on
> > controllers
> > > > -- it
> > > > > > > > would
> > > > > > > > > > > > allow the
> > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > sanity check for quorum
> > controllers.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > That's a good point.  I will add
> > > > broker.id
> > > > > > > > back,
> > > > > > > > > > and
> > > > > > > > > > > > also
> > > > > > > > > > > > > > add
> > > > > > > > > > > > > > > > > > > > > > controller.id as a possibility.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 7:41 PM
> > Colin
> > > > > > McCabe
> > > > > > > > <
> > > > > > > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020, at 16:10,
> > > > Jun Rao
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few
> > more
> > > > > > > > comments.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks again for the reply.
> > Sorry
> > > > for
> > > > > > the
> > > > > > > > long
> > > > > > > > > > > > > > hiatus.  I
> > > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > vacation for a while.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 55. There is still text that
> > > > favors
> > > > > > new
> > > > > > > > > > broker
> > > > > > > > > > > > > > > > registration.
> > > > > > > > > > > > > > > > > > > > "When a
> > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > first starts up, when it is
> > in
> > > > the
> > > > > > > > INITIAL
> > > > > > > > > > > > state, it
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > > > > > > "win"
> > > > > > > > > > > > > > > > > > > > > > > > > broker ID conflicts.
> > However,
> > > > once
> > > > > > it is
> > > > > > > > > > > > granted a
> > > > > > > > > > > > > > > > lease, it
> > > > > > > > > > > > > > > > > > > > > > transitions
> > > > > > > > > > > > > > > > > > > > > > > > > out of the INITIAL state.
> > > > > > Thereafter,
> > > > > > > > it may
> > > > > > > > > > > > lose
> > > > > > > > > > > > > > > > subsequent
> > > > > > > > > > > > > > > > > > > > > > conflicts if
> > > > > > > > > > > > > > > > > > > > > > > > > its broker epoch is stale.
> > (See
> > > > > > KIP-380
> > > > > > > > for
> > > > > > > > > > some
> > > > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > epoch.)  The reason for
> > favoring
> > > > new
> > > > > > > > > > processes
> > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > accommodate
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > > > > > > case where a process is
> > killed
> > > > with
> > > > > > kill
> > > > > > > > -9
> > > > > > > > > > and
> > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > restarted.
> > > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > want it
> > > > > > > > > > > > > > > > > > > > > > > > > to be able to reclaim its
> > old ID
> > > > > > quickly
> > > > > > > > in
> > > > > > > > > > this
> > > > > > > > > > > > > > case."
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reminder.  I
> > have
> > > > > > clarified
> > > > > > > > the
> > > > > > > > > > > > language
> > > > > > > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > > > > > > Hopefully now it is clear that we
> > don't
> > > > > > allow
> > > > > > > > quick
> > > > > > > > > > > > re-use
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 80.1 Sounds good. Could you
> > > > document
> > > > > > that
> > > > > > > > > > > > listeners
> > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > required
> > > > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > now? It would also be useful
> > to
> > > > > > annotate
> > > > > > > > > > other
> > > > > > > > > > > > > > required
> > > > > > > > > > > > > > > > > > configs.
> > > > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > > > > > > > example, controller.connect
> > > > should be
> > > > > > > > > > required.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I added a note specifying that
> > > > these
> > > > > > are
> > > > > > > > > > required.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 80.2 Could you list all
> > > > deprecated
> > > > > > > > existing
> > > > > > > > > > > > configs?
> > > > > > > > > > > > > > > > Another
> > > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > control.plane.listener.name
> > > > since
> > > > > > the
> > > > > > > > > > > > controller no
> > > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > sends
> > > > > > > > > > > > > > > > > > > > > > > > > LeaderAndIsr, UpdateMetadata
> > and
> > > > > > > > StopReplica
> > > > > > > > > > > > > > requests.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I added a section specifying
> > some
> > > > > > > > deprecated
> > > > > > > > > > > > configs.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 83.1 It seems that the
> > broker can
> > > > > > > > transition
> > > > > > > > > > from
> > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > RUNNING
> > > > > > > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > > > > > > registering for a new broker
> > > > epoch.
> > > > > > I am
> > > > > > > > not
> > > > > > > > > > > > sure how
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > > > > > > Once the
> > > > > > > > > > > > > > > > > > > > > > > > > controller fences a broker,
> > > > there is
> > > > > > no
> > > > > > > > need
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > > > > > > > > boker epoch around. So, if
> > the
> > > > fenced
> > > > > > > > > > broker's
> > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > > > > > > > > existing broker epoch will be
> > > > > > rejected,
> > > > > > > > > > leading
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > back
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > FENCED state again.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The broker epoch refers to the
> > > > broker
> > > > > > > > > > registration.
> > > > > > > > > > > > > > So we
> > > > > > > > > > > > > > > > DO
> > > > > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > broker epoch around even while the
> > > > broker
> > > > > > is
> > > > > > > > > > fenced.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The broker epoch changes only
> > when
> > > > > > there
> > > > > > > > is a
> > > > > > > > > > new
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > registration.  Fencing or
> > unfencing the
> > > > > > broker
> > > > > > > > > > doesn't
> > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > epoch.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 83.5 Good point on KIP-590.
> > Then
> > > > > > should
> > > > > > > > we
> > > > > > > > > > > > expose the
> > > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > debugging purposes? If not,
> > we
> > > > should
> > > > > > > > > > deprecate
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controllerID
> > > > > > > > > > > > > > > > > > > > > > field in
> > > > > > > > > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I think it's OK to expose it
> > for
> > > > now,
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > > proviso
> > > > > > > > > > > > > > > > that it
> > > > > > > > > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > > > > > > > > be reachable by clients.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 90. We rejected the shared ID
> > > > with
> > > > > > just
> > > > > > > > one
> > > > > > > > > > > > reason
> > > > > > > > > > > > > > "This
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > not a
> > > > > > > > > > > > > > > > > > > > > > good idea
> > > > > > > > > > > > > > > > > > > > > > > > > because NetworkClient
> > assumes a
> > > > > > single ID
> > > > > > > > > > > > space.  So
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > both a
> > > > > > > > > > > > > > > > > > > > > > > > > controller 1 and a broker 1,
> > we
> > > > don't
> > > > > > > > have a
> > > > > > > > > > way
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > picking
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > "right"
> > > > > > > > > > > > > > > > > > > > > > > > > one." This doesn't seem to
> > be a
> > > > > > strong
> > > > > > > > > > reason.
> > > > > > > > > > > > For
> > > > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > > > > address the NetworkClient
> > issue
> > > > with
> > > > > > the
> > > > > > > > node
> > > > > > > > > > > > type
> > > > > > > > > > > > > > as you
> > > > > > > > > > > > > > > > > > pointed
> > > > > > > > > > > > > > > > > > > > > > out or
> > > > > > > > > > > > > > > > > > > > > > > > > using the negative value of a
> > > > broker
> > > > > > ID
> > > > > > > > as
> > > > > > > > > > the
> > > > > > > > > > > > > > > > controller ID.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It would require a lot of code
> > > > changes
> > > > > > to
> > > > > > > > > > support
> > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > > > > types of
> > > > > > > > > > > > > > > > > > > > > > node IDs.  It's not clear to me
> > that
> > > > the
> > > > > > end
> > > > > > > > result
> > > > > > > > > > > > would
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > better --
> > > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > tend to think it would be worse,
> > since
> > > > it
> > > > > > > > would be
> > > > > > > > > > more
> > > > > > > > > > > > > > > > complex.
> > > > > > > > > > > > > > > > > > In a
> > > > > > > > > > > > > > > > > > > > > > similar vein, using negative
> > numbers
> > > > seems
> > > > > > > > > > dangerous,
> > > > > > > > > > > > > > since we
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > negatives or -1 as "special
> > values" in
> > > > many
> > > > > > > > places.
> > > > > > > > > > > > For
> > > > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > > > > -1
> > > > > > > > > > > > > > > > > > > > often
> > > > > > > > > > > > > > > > > > > > > > represents "no such node."
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > One important thing to keep in
> > > > mind is
> > > > > > > > that we
> > > > > > > > > > > > want to
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > able
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > transition from a broker and a
> > > > controller
> > > > > > being
> > > > > > > > > > > > co-located
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > > > longer being co-located.  This is
> > much
> > > > > > easier
> > > > > > > > to do
> > > > > > > > > > > > when
> > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 100. In KIP-589
> > > > > > > > > > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > > > > > > > > >,
> > > > > > > > > > > > > > > > > > > > > > > > > the broker reports all
> > offline
> > > > > > replicas
> > > > > > > > due
> > > > > > > > > > to a
> > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > failure to
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > controller. It seems this
> > > > information
> > > > > > > > needs
> > > > > > > > > > to be
> > > > > > > > > > > > > > > > persisted
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > > > > log. Do we have a
> > corresponding
> > > > > > record
> > > > > > > > for
> > > > > > > > > > that?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hmm, I have to look into this a
> > > > little
> > > > > > bit
> > > > > > > > > > more.
> > > > > > > > > > > > We
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > a new
> > > > > > > > > > > > > > > > > > > > > > record type.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 101. Currently, StopReplica
> > > > request
> > > > > > has 2
> > > > > > > > > > modes,
> > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > > > > > and with
> > > > > > > > > > > > > > > > > > > > > > > > > deletion. The former is used
> > for
> > > > > > > > controlled
> > > > > > > > > > > > shutdown
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > handling
> > > > > > > > > > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > > > > > > > > failure, and causes the
> > follower
> > > > to
> > > > > > > > stop. The
> > > > > > > > > > > > latter
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > > > > > > > > and partition reassignment,
> > and
> > > > > > causes
> > > > > > > > the
> > > > > > > > > > > > replica
> > > > > > > > > > > > > > to be
> > > > > > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > > > > > > > Since we
> > > > > > > > > > > > > > > > > > > > > > > > > are deprecating StopReplica,
> > > > could we
> > > > > > > > > > document
> > > > > > > > > > > > what
> > > > > > > > > > > > > > > > triggers
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > stopping
> > > > > > > > > > > > > > > > > > > > > > > > > of a follower and the
> > deleting
> > > > of a
> > > > > > > > replica
> > > > > > > > > > now?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > RemoveTopic triggers
> > deletion.  In
> > > > > > general
> > > > > > > > the
> > > > > > > > > > > > > > > > functionality of
> > > > > > > > > > > > > > > > > > > > > > StopReplica is subsumed by the
> > metadata
> > > > > > > > records.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 102. Should we include the
> > > > metadata
> > > > > > > > topic in
> > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > > > > > > If so,
> > > > > > > > > > > > > > > > > > > > > > > > > when it will be included and
> > what
> > > > > > will
> > > > > > > > the
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > like?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > No, it won't be included in the
> > > > > > metadata
> > > > > > > > > > response
> > > > > > > > > > > > sent
> > > > > > > > > > > > > > back
> > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 103. "The active controller
> > > > assigns
> > > > > > the
> > > > > > > > > > broker a
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > epoch,
> > > > > > > > > > > > > > > > > > > > > > based on
> > > > > > > > > > > > > > > > > > > > > > > > > the latest committed offset
> > in
> > > > the
> > > > > > log."
> > > > > > > > This
> > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > inaccurate
> > > > > > > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > latest committed offset
> > doesn't
> > > > > > always
> > > > > > > > > > advance on
> > > > > > > > > > > > > > every
> > > > > > > > > > > > > > > > log
> > > > > > > > > > > > > > > > > > > > append.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Given that the new broker epoch
> > > > won't
> > > > > > be
> > > > > > > > > > visible
> > > > > > > > > > > > until
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > happened, I have changed this to
> > "the
> > > > next
> > > > > > > > > > available
> > > > > > > > > > > > > > offset in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > log"
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 104. REGISTERING(1) : It says
> > > > > > > > "Otherwise, the
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > moves
> > > > > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > > > > > > > > > > state.". It seems this
> > should be
> > > > > > RUNNING?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 105. RUNNING: Should we
> > require
> > > > the
> > > > > > > > broker to
> > > > > > > > > > > > catch
> > > > > > > > > > > > > > up
> > > > > > > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > > > > > > metadata log
> > > > > > > > > > > > > > > > > > > > > > > > > to get into this state?
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > For 104 and 105, these sections
> > > > have
> > > > > > been
> > > > > > > > > > reworked.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20
> > PM
> > > > Colin
> > > > > > > > McCabe
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2020, at
> > > > 05:51, Tom
> > > > > > > > Bentley
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 19, 2020, at
> > > > 08:59,
> > > > > > Ron
> > > > > > > > > > Dagostino
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks
> > for the
> > > > > > hard
> > > > > > > > work
> > > > > > > > > > on
> > > > > > > > > > > > this
> > > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have some questions
> > > > about
> > > > > > what
> > > > > > > > > > happens
> > > > > > > > > > > > to a
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > when it
> > > > > > > > > > > > > > > > > > > > > > becomes
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > fenced (e.g. because
> > it
> > > > can't
> > > > > > > > send a
> > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > request to
> > > > > > > > > > > > > > > > > > > > > > keep its
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > lease).  The KIP says
> > > > "When a
> > > > > > > > broker
> > > > > > > > > > is
> > > > > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > cannot
> > > > > > > > > > > > > > > > > > > > > > process any
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > client requests.
> > This
> > > > > > prevents
> > > > > > > > > > brokers
> > > > > > > > > > > > which
> > > > > > > > > > > > > > > > are not
> > > > > > > > > > > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > metadata updates or
> > that
> > > > are
> > > > > > not
> > > > > > > > > > > > receiving
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > > > > > > > > them fast
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > enough from causing
> > > > issues to
> > > > > > > > > > clients."
> > > > > > > > > > > > And
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > description of the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > FENCED(4) state it
> > > > likewise
> > > > > > says
> > > > > > > > > > "While
> > > > > > > > > > > > in
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > state,
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > not respond to client
> > > > > > > > requests."  It
> > > > > > > > > > > > makes
> > > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > > > > that a
> > > > > > > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > should not accept
> > > > producer
> > > > > > > > requests
> > > > > > > > > > -- I
> > > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > would result in
> > > > > > > > > > > > NotLeaderOrFollowerException.
> > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > about KIP-392
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > (fetch from follower)
> > > > > > consumer
> > > > > > > > > > > > requests?  It
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > conceivable
> > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > could continue.
> > Related
> > > > to
> > > > > > that,
> > > > > > > > > > would a
> > > > > > > > > > > > > > fenced
> > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > fetch data for
> > partitions
> > > > > > where
> > > > > > > > it
> > > > > > > > > > > > thinks it
> > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > follower?
> > > > > > > > > > > > > > > > > > > > > > Even if
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > it rejects consumer
> > > > requests
> > > > > > it
> > > > > > > > might
> > > > > > > > > > > > still
> > > > > > > > > > > > > > > > continue
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > fetch as a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > follower.  Might it
> > be
> > > > > > helpful to
> > > > > > > > > > clarify
> > > > > > > > > > > > > > both
> > > > > > > > > > > > > > > > > > decisions
> > > > > > > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ron,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Good question.  I
> > think a
> > > > > > fenced
> > > > > > > > broker
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > fetch on
> > > > > > > > > > > > > > > > > > > > > > > > > > > > partitions it was
> > already
> > > > > > fetching
> > > > > > > > > > before
> > > > > > > > > > > > it
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > > > > > > > > unless it
> > > > > > > > > > > > > > > > > > > > > > > > > > hits a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > problem.  At that
> > point it
> > > > > > won't be
> > > > > > > > > > able to
> > > > > > > > > > > > > > > > continue,
> > > > > > > > > > > > > > > > > > > > since it
> > > > > > > > > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the new metadata.  For
> > > > > > example, it
> > > > > > > > > > won't
> > > > > > > > > > > > know
> > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the partitions it's
> > > > fetching.
> > > > > > The
> > > > > > > > > > > > rationale
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > continuing to
> > > > > > > > > > > > > > > > > > > > > > fetch
> > > > > > > > > > > > > > > > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > try to avoid
> > disruptions as
> > > > > > much as
> > > > > > > > > > > > possible.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think fenced
> > > > brokers
> > > > > > should
> > > > > > > > > > accept
> > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > requests.
> > > > > > > > > > > > > > > > > > > > > > The issue
> > > > > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > that the fenced broker
> > may
> > > > or
> > > > > > may
> > > > > > > > not
> > > > > > > > > > have
> > > > > > > > > > > > any
> > > > > > > > > > > > > > > > data it
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > supposed to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > have.  It may or may
> > not
> > > > have
> > > > > > > > applied
> > > > > > > > > > any
> > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > changes, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > it is supposed to have
> > > > applied.
> > > > > > > > So it
> > > > > > > > > > > > could
> > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > > > > > > > confusing, and
> > > > > > > > > > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > > > > > > > > potentially waste the
> > > > client's
> > > > > > > > time.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > When fenced, how would
> > the
> > > > broker
> > > > > > > > reply
> > > > > > > > > > to a
> > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > which did
> > > > > > > > > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > > > > > > > > request?
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > The broker will respond
> > with a
> > > > > > > > retryable
> > > > > > > > > > error
> > > > > > > > > > > > in
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > > > > Once
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > client has re-fetched its
> > > > > > metadata, it
> > > > > > > > > > will no
> > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > see
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > > > > > as part of the cluster.  I
> > > > added a
> > > > > > > > note to
> > > > > > > > > > the
> > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >



-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Posted by Colin McCabe <cm...@apache.org>.
Hi Gwen,

Thanks for taking a look.  Yes, I do think unregisterBroker would be a better name than decommissionBroker.  Let's do that, and also rename the associated RPC.

cheers,
Colin


On Sat, Feb 6, 2021, at 21:13, Gwen Shapira wrote:
> I realize that I'm very very late to a pretty big party, but I was
> just looking at the new code (super readable, btw), and noticed that
> the inverse API for registerBroker() is called decomissionBroker().
> It would be way clearer that we are undoing a registration is we
> rename it to unregisterBroker() (and the AdminClient API that matches,
> to avoid more confusion).
> 
> Any objections?
> 
> On Fri, Dec 18, 2020 at 10:08 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > Hi, Colin,
> >
> > Thanks for the reply. The KIP looks good to me now. Thanks for your
> > diligence.
> >
> > Jun
> >
> > On Thu, Dec 17, 2020 at 5:15 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > On Thu, Dec 17, 2020, at 17:02, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. Sounds good. One more comment.
> > > >
> > > > 231. Currently, we have sasl.mechanism.inter.broker.protocol for inter
> > > > broker communication. It seems that we need a similar config for
> > > specifying
> > > > the sasl mechanism for the communication between the broker and the
> > > > controller.
> > > >
> > >
> > > Hi Jun,
> > >
> > > Yeah... sounds like we could use a new configuration key for this.  I
> > > added sasl.mechanism.controller.protocol for this.
> > >
> > > regards,
> > > Colin
> > >
> > > > Jun
> > > >
> > > > On Thu, Dec 17, 2020 at 2:29 PM Colin McCabe <cm...@apache.org> wrote:
> > > >
> > > > > On Thu, Dec 17, 2020, at 10:19, Jun Rao wrote:
> > > > > > Hi, Colin,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 211. Hmm, I still don't see a clear benefit of registering a broker
> > > > > before
> > > > > > recovery. It's possible for the recovery to take time. However,
> > > during
> > > > > the
> > > > > > recovery mode, it seems the broker will still be in the fenced mode
> > > and
> > > > > > won't be able to do work for the clients. So, registering and
> > > > > heartbeating
> > > > > > early seems to only add unnecessary overhead. For your point on topic
> > > > > > creation, I thought we now allow replicas to be created on
> > > > > > unregistered brokers.
> > > > > >
> > > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks again for the reviews.
> > > > >
> > > > > Topics cannot be created on unregistered brokers.  They can be created
> > > on
> > > > > registered but fenced brokers.  So for that reason I think it makes
> > > sense
> > > > > to register as early as possible.
> > > > >
> > > > > > 230. Currently, we do have a ControllerId field in MetadataResponse.
> > > In
> > > > > the
> > > > > > early discussion, I thought that we want to expose the controller for
> > > > > > debugging purposes, but not used by the client library.
> > > > > >
> > > > >
> > > > > The current plan is that we will expose the controller node ID, but the
> > > > > controller will not be included in the list of nodes in the metadata
> > > > > response.
> > > > >
> > > > > It's not really possible to include the controller in that list of
> > > nodes
> > > > > because the controller may not share the same set of listeners as the
> > > > > broker.  So, for example, maybe the controller endpoint is using a
> > > > > different type of security than the broker.  So while we could pass
> > > back a
> > > > > hostname and port, the client would have no way to connect since it
> > > doesn't
> > > > > know what security settings to use.
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Wed, Dec 16, 2020 at 9:13 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > On Wed, Dec 16, 2020, at 18:13, Jun Rao wrote:
> > > > > > > > Hi, Colin,
> > > > > > > >
> > > > > > > > Thanks for the reply. Just a couple of more comments.
> > > > > > > >
> > > > > > > > 211. Currently, the broker only registers itself in ZK after log
> > > > > > > recovery.
> > > > > > > > Is there any benefit to change that? As you mentioned, the broker
> > > > > can't
> > > > > > > do
> > > > > > > > much before completing log recovery.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Previously, it wasn't possible to register in ZK without
> > > immediately
> > > > > > > getting added to the MetadataResponse.  So I think that's the main
> > > > > reason
> > > > > > > why registration was delayed until after log recovery.  Since that
> > > > > > > constraint doesn't exist any more, there seems to be no reason to
> > > delay
> > > > > > > registration.
> > > > > > >
> > > > > > > I think delaying registration would have some major downsides.  If
> > > log
> > > > > > > recovery takes a while, that's a longer window during which someone
> > > > > else
> > > > > > > could register a broker with the same ID.  Having parts of the
> > > cluster
> > > > > > > missing for a while gives up some of the benefit of separating
> > > > > registration
> > > > > > > from fencing.  For example, if a broker somehow gets unregistered
> > > and
> > > > > we
> > > > > > > want to re-register it, but we have to wait for a 10 minute log
> > > > > recovery to
> > > > > > > do that, that could be a window during which topics can't be
> > > created
> > > > > that
> > > > > > > need to be on that broker, and so forth.
> > > > > > >
> > > > > > > > 230. Regarding MetadataResponse, there is a slight awkwardness.
> > > We
> > > > > return
> > > > > > > > rack for each node. However, if that node is for the controller,
> > > the
> > > > > rack
> > > > > > > > field is not really relevant. Should we clean it up here or in
> > > > > another
> > > > > > > KIP
> > > > > > > > like KIP-700?
> > > > > > >
> > > > > > > Oh, controllers don't appear in the MetadataResponses returned to
> > > > > clients,
> > > > > > > since clients can't access them.  I should have been more clear
> > > about
> > > > > that
> > > > > > > in the KIP-- I added a sentence to "Networking" describing this.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe <cmccabe@apache.org
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > > > > > > > > > Hi, Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply. A few follow up comments.
> > > > > > > > > >
> > > > > > > > > > 211. When does the broker send the BrokerRegistration
> > > request to
> > > > > the
> > > > > > > > > > controller? Is it after the recovery phase? If so, at that
> > > > > point, the
> > > > > > > > > > broker has already caught up on the metadata (in order to
> > > clean
> > > > > up
> > > > > > > > > deleted
> > > > > > > > > > partitions). Then, it seems we don't need the ShouldFence
> > > field
> > > > > > > > > > in BrokerHeartbeatRequest?
> > > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > Thanks again for the reviews.
> > > > > > > > >
> > > > > > > > > The broker sends the registration request as soon as it starts
> > > > > up.  It
> > > > > > > > > cannot wait until the recovery phase is over since sometimes
> > > log
> > > > > > > recovery
> > > > > > > > > can take quite a long time.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 213. kafka-cluster.sh
> > > > > > > > > > 213.1 For the decommision example, should the command take a
> > > > > broker
> > > > > > > id?
> > > > > > > > >
> > > > > > > > > Yes, the example should have a broker id.  Fixed.
> > > > > > > > >
> > > > > > > > > > 213.2 Existing tools use the "--" command line option (e.g.
> > > > > > > kafka-topics
> > > > > > > > > > --list --topic test). Should we follow the same convention
> > > > > > > > > > for kafka-cluster.sh (and kafka-storage.sh)?
> > > > > > > > >
> > > > > > > > > Hmm.  I don't think argparse4j supports using double dashes to
> > > > > identify
> > > > > > > > > subcommands.  I think it might be confusing as well, since the
> > > > > > > subcommand
> > > > > > > > > must come first, unlike a plain old argument which can be
> > > anywhere
> > > > > on
> > > > > > > the
> > > > > > > > > command line.
> > > > > > > > >
> > > > > > > > > > 213.3 Should we add an admin api for broker decommission so
> > > that
> > > > > > > this can
> > > > > > > > > > be done programmatically?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, there is an admin client API for decommissioning as well.
> > > > > > > > >
> > > > > > > > > > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the
> > > > > > > request was
> > > > > > > > > > sent to is not the controller". I thought the clients never
> > > send
> > > > > a
> > > > > > > > > request
> > > > > > > > > > to the controller directly now and the broker will forward
> > > it to
> > > > > the
> > > > > > > > > > controller?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If the controller moved recently, it's possible that the broker
> > > > > could
> > > > > > > send
> > > > > > > > > to a controller that has just recently become inactive.  In
> > > that
> > > > > case
> > > > > > > > > NOT_CONTROLLER would be returned.  (A standby controller
> > > returns
> > > > > > > > > NOT_CONTROLLER for most APIs).
> > > > > > > > >
> > > > > > > > > > 221. Could we add the required ACL for the new requests?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Good point.  I added the required ACL for each new RPC.
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe <
> > > > > cmccabe@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply. A few more comments below.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Jun,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the comments.
> > > > > > > > > > >
> > > > > > > > > > > > 206. "RemoveTopic is the last step, that scrubs all
> > > metadata
> > > > > > > about
> > > > > > > > > the
> > > > > > > > > > > > topic.  In order to get to that last step, the topic data
> > > > > needs
> > > > > > > to
> > > > > > > > > > > removed
> > > > > > > > > > > > from all brokers (after each broker notices that the
> > > topic is
> > > > > > > being
> > > > > > > > > > > > deleted)." Currently, this is done based on the response
> > > of
> > > > > > > > > > > > StopReplicaRequest. Since the controller no longer sends
> > > this
> > > > > > > > > request,
> > > > > > > > > > > how
> > > > > > > > > > > > does the controller know that the data for the deleted
> > > topic
> > > > > has
> > > > > > > > > > > > been removed in the brokers?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > That's a good point... definitely an oversight.
> > > > > > > > > > >
> > > > > > > > > > > It seems complex to force the controller to track when log
> > > > > > > directories
> > > > > > > > > > > have been deleted.  Let's just assume that KIP-516 has been
> > > > > > > > > implemented,
> > > > > > > > > > > and track them by UUID.  Then we can just have a single
> > > topic
> > > > > > > deletion
> > > > > > > > > > > record.
> > > > > > > > > > >
> > > > > > > > > > > I added a section on "topic identifiers" describing this.
> > > > > > > > > > >
> > > > > > > > > > > > 210. Thanks for the explanation. Sounds good to me.
> > > > > > > > > > > >
> > > > > > > > > > > > 211. I still don't see an example when ShouldFence is
> > > set to
> > > > > > > true in
> > > > > > > > > > > > BrokerHeartbeatReques. Could we add one?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It is sent to true when the broker is first starting up and
> > > > > > > doesn't yet
> > > > > > > > > > > want to be unfenced.  I added a longer explanation of this
> > > in
> > > > > the
> > > > > > > > > "Broker
> > > > > > > > > > > Leases" section.
> > > > > > > > > > >
> > > > > > > > > > > > 213. The KIP now allows replicas to be assigned on
> > > brokers
> > > > > that
> > > > > > > are
> > > > > > > > > > > fenced,
> > > > > > > > > > > > which is an improvement. How do we permanently remove a
> > > > > broker
> > > > > > > (e.g.
> > > > > > > > > > > > cluster shrinking) to prevent it from being used for
> > > future
> > > > > > > replica
> > > > > > > > > > > > assignments?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This is a fair point.  I will create a kafka-cluster.sh
> > > script
> > > > > > > which
> > > > > > > > > can
> > > > > > > > > > > do this, plus a DecommissionBrokerRequest.
> > > > > > > > > > >
> > > > > > > > > > > As a bonus the kafka-cluster.sh script can help people
> > > find the
> > > > > > > > > cluster ID
> > > > > > > > > > > of brokers, something that people have wanted a tool for
> > > in the
> > > > > > > past.
> > > > > > > > > > >
> > > > > > > > > > > > 214. "Currently, when a node is down, all of its ZK
> > > > > registration
> > > > > > > > > > > > information is gone.  But  we need this information in
> > > order
> > > > > to
> > > > > > > > > > > understand
> > > > > > > > > > > > things like whether the replicas of a particular
> > > partition
> > > > > are
> > > > > > > > > > > > well-balanced across racks." This is not quite true.
> > > > > Currently,
> > > > > > > even
> > > > > > > > > when
> > > > > > > > > > > > ZK registration is gone, the existing replica assignment
> > > is
> > > > > still
> > > > > > > > > > > available
> > > > > > > > > > > > in the metadata response.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I agree that the existing replica assignment is still
> > > > > available.
> > > > > > > But
> > > > > > > > > that
> > > > > > > > > > > just tells you partition X is on nodes A, B, and C.  If you
> > > > > don't
> > > > > > > have
> > > > > > > > > the
> > > > > > > > > > > ZK registration for one or more of A, B, or C then you
> > > don't
> > > > > know
> > > > > > > > > whether
> > > > > > > > > > > we are following the policy of "two replicas on one rack,
> > > one
> > > > > > > replica
> > > > > > > > > on
> > > > > > > > > > > another."  Or any other more complex rack placement policy
> > > > > that you
> > > > > > > > > might
> > > > > > > > > > > have.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe <
> > > > > cmccabe@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the reply. A few more follow up comments.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 210. initial.broker.registration.timeout.ms: The
> > > default
> > > > > > > value
> > > > > > > > > is
> > > > > > > > > > > 90sec,
> > > > > > > > > > > > > > which seems long. If a broker fails the registration
> > > > > because
> > > > > > > of
> > > > > > > > > > > incorrect
> > > > > > > > > > > > > > configs, we want to fail faster. In comparison, the
> > > > > defaults
> > > > > > > for
> > > > > > > > > > > > > > zookeeper.connection.timeout.ms is 18 secs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I agree that the initial connection timeout here is
> > > longer
> > > > > than
> > > > > > > > > what we
> > > > > > > > > > > > > had with ZK.  The main reason I selected a slightly
> > > longer
> > > > > > > timeout
> > > > > > > > > > > here is
> > > > > > > > > > > > > to handle the case where the controllers and the
> > > brokers
> > > > > are
> > > > > > > > > > > co-located.
> > > > > > > > > > > > > For example, if you have a 3 node cluster and all three
> > > > > nodes
> > > > > > > are
> > > > > > > > > > > > > controllers+brokers, when you first bring up the
> > > cluster,
> > > > > we
> > > > > > > will
> > > > > > > > > have
> > > > > > > > > > > to
> > > > > > > > > > > > > stand up the controller quorum and then handle broker
> > > > > > > > > registrations.
> > > > > > > > > > > > > Although we believe Raft will start up relatively
> > > quickly,
> > > > > it's
> > > > > > > > > good to
> > > > > > > > > > > > > leave some extra margin here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think there's a big disadvantage to having a
> > > > > slightly
> > > > > > > > > longer
> > > > > > > > > > > > > timeout here.  After all, starting up the brokers with
> > > no
> > > > > > > > > controllers
> > > > > > > > > > > is an
> > > > > > > > > > > > > admin mistake, which we don't expect to see very often.
> > > > > Maybe
> > > > > > > > > let's
> > > > > > > > > > > set it
> > > > > > > > > > > > > to 60 seconds for now and maybe see if we can tweak it
> > > in
> > > > > the
> > > > > > > > > future if
> > > > > > > > > > > > > that turns out to be too long or short?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 211. "Once they are all moved, the controller
> > > responds
> > > > > to the
> > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > with a nextState of SHUTDOWN." It seems that
> > > nextState
> > > > > is no
> > > > > > > > > longer
> > > > > > > > > > > > > > relevant. Also, could you add a bit more explanation
> > > on
> > > > > when
> > > > > > > > > > > ShouldFence
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > set to true in BrokerHeartbeatRequest?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Good catch.  I removed the obsolete section referring
> > > to
> > > > > > > nextState
> > > > > > > > > and
> > > > > > > > > > > > > added a reference to the new boolean.  I also added
> > > some
> > > > > more
> > > > > > > > > > > information
> > > > > > > > > > > > > about ShouldFence and about the rationale for
> > > separating
> > > > > > > fencing
> > > > > > > > > from
> > > > > > > > > > > > > registration.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 212. Related to the above, what does the broker do if
> > > > > > > IsFenced is
> > > > > > > > > > > true in
> > > > > > > > > > > > > > the BrokerHeartbeatResponse? Will the broker do the
> > > same
> > > > > > > thing on
> > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > a FenceBrokerRecord from the metadata log?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The broker only checks this boolean during the startup
> > > > > process.
> > > > > > > > > After
> > > > > > > > > > > the
> > > > > > > > > > > > > startup process is finished, it ignores this boolean.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The broker uses fence / unfence records in the metadata
> > > > > log to
> > > > > > > > > > > determine
> > > > > > > > > > > > > which brokers should appear in its MetadataResponse.
> > > > > > > > > > > > >
> > > > > > > > > > > > > best,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe <
> > > > > > > cmccabe@apache.org
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The KIP says that "brokers which are fenced will
> > > not
> > > > > > > appear
> > > > > > > > > in
> > > > > > > > > > > > > > > > MetadataResponses.  So clients that have
> > > up-to-date
> > > > > > > metadata
> > > > > > > > > will
> > > > > > > > > > > > > not try
> > > > > > > > > > > > > > > > to contact fenced brokers.", which is fine, but
> > > it
> > > > > > > doesn't
> > > > > > > > > seem
> > > > > > > > > > > to
> > > > > > > > > > > > > > > > elaborate on what happens for clients which try
> > > to
> > > > > > > contact a
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > (using
> > > > > > > > > > > > > > > > stale metadata, for example) and find it in the
> > > > > > > > > > > registered-but-fenced
> > > > > > > > > > > > > > > > state.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have dropped the broker-side fencing from this
> > > > > > > proposal.  So
> > > > > > > > > now
> > > > > > > > > > > the
> > > > > > > > > > > > > > > fencing is basically the same as today's fencing:
> > > it's
> > > > > > > > > > > controller-side
> > > > > > > > > > > > > > > only.  That means that clients using stale
> > > metadata can
> > > > > > > contact
> > > > > > > > > > > fenced
> > > > > > > > > > > > > > > brokers and communicate with them.  The only case
> > > > > where the
> > > > > > > > > broker
> > > > > > > > > > > > > will be
> > > > > > > > > > > > > > > inaccessible is when it hasn't finished starting
> > > yet
> > > > > > > (i.e., has
> > > > > > > > > > > not yet
> > > > > > > > > > > > > > > attained RUNNING state.)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Just like today, we expect the safeguards built
> > > into
> > > > > the
> > > > > > > > > > > replication
> > > > > > > > > > > > > > > protocol to prevent the worst corner cases that
> > > could
> > > > > > > result
> > > > > > > > > from
> > > > > > > > > > > > > this.  I
> > > > > > > > > > > > > > > do think we will probably take up this issue later
> > > and
> > > > > > > find a
> > > > > > > > > > > better
> > > > > > > > > > > > > > > solution for client-side metadata, but this KIP is
> > > big
> > > > > > > enough
> > > > > > > > > > > as-is.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You said previously that the broker will respond
> > > > > with a
> > > > > > > > > retriable
> > > > > > > > > > > > > > > > error, which is again fine, but you didn't
> > > answer my
> > > > > > > question
> > > > > > > > > > > about
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > error code you've chosen for this. I realise that
> > > > > this
> > > > > > > > > doesn't
> > > > > > > > > > > really
> > > > > > > > > > > > > > > > affect the correctness of the behaviour, but I'm
> > > not
> > > > > > > aware
> > > > > > > > > of any
> > > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > > error code which is a good fit. So it would be
> > > good
> > > > > to
> > > > > > > > > understand
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > you're making this backward compatible for
> > > clients.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Many thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe <
> > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the reply. Just a couple of more
> > > > > comments
> > > > > > > > > below.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 210. Since we are deprecating
> > > > > > > > > > > zookeeper.connection.timeout.ms,
> > > > > > > > > > > > > > > should we
> > > > > > > > > > > > > > > > > > add a new config to bound the time for a
> > > broker
> > > > > to
> > > > > > > > > connect
> > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > > controller during starting up?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Good idea.  I added
> > > > > > > initial.broker.registration.timeout.ms
> > > > > > > > > for
> > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 211. BrokerHeartbeat no longer has the state
> > > > > field
> > > > > > > in the
> > > > > > > > > > > > > > > > > request/response.
> > > > > > > > > > > > > > > > > > However, (a) the controller shutdown section
> > > > > still
> > > > > > > has
> > > > > > > > > "In
> > > > > > > > > > > its
> > > > > > > > > > > > > > > periodic
> > > > > > > > > > > > > > > > > > heartbeats, the broker asks the controller
> > > if it
> > > > > can
> > > > > > > > > > > transition
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > SHUTDOWN state.  This motivates the
> > > controller to
> > > > > > > move
> > > > > > > > > all
> > > > > > > > > > > of the
> > > > > > > > > > > > > > > leaders
> > > > > > > > > > > > > > > > > > off of that broker.  Once they are all
> > > moved, the
> > > > > > > > > controller
> > > > > > > > > > > > > > > responds to
> > > > > > > > > > > > > > > > > > the heartbeat with a nextState of SHUTDOWN.";
> > > > > (2) the
> > > > > > > > > > > > > description of
> > > > > > > > > > > > > > > > > > BrokerHeartbeat still references
> > > currentState and
> > > > > > > > > > > targetState.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks.  I've made these sections clearer and
> > > > > removed
> > > > > > > the
> > > > > > > > > > > obsolete
> > > > > > > > > > > > > > > > > references to sending states.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Dec 11, 2020 at 1:33 PM Colin McCabe
> > > <
> > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Dec 9, 2020, at 10:10, Jun Rao
> > > wrote:
> > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the update. A few more follow
> > > up
> > > > > > > comments.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks again for the review.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 100. FailedReplicaRecord: Since this is
> > > > > reported
> > > > > > > by
> > > > > > > > > each
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > independently, perhaps we could use a
> > > more
> > > > > > > concise
> > > > > > > > > > > > > representation
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > a top level broker field, an array of
> > > topics,
> > > > > > > which
> > > > > > > > > has
> > > > > > > > > > > an
> > > > > > > > > > > > > array
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > partitions.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The issue is that there is a size limit on
> > > the
> > > > > each
> > > > > > > > > record.
> > > > > > > > > > > > > > > Putting
> > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > of the partitions of a log directory into a
> > > > > single
> > > > > > > > > record
> > > > > > > > > > > would
> > > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > > break that in many cases.  Still, we can
> > > > > optimize a
> > > > > > > > > bit by
> > > > > > > > > > > > > having
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > array
> > > > > > > > > > > > > > > > > > > of partition IDs, since nearly all the
> > > time, we
> > > > > > > have
> > > > > > > > > more
> > > > > > > > > > > than
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > > > > > same topic.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 200. Sounds good. If we remove the
> > > > > broker-side
> > > > > > > > > fencing
> > > > > > > > > > > > > logic, do
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > still keep FENCED in broker state? Do we
> > > > > plan to
> > > > > > > > > expose
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > states
> > > > > > > > > > > > > > > > > > > > through the existing BrokerState metric
> > > and
> > > > > if
> > > > > > > so,
> > > > > > > > > what
> > > > > > > > > > > are
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > values
> > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > the new states?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > No, we don't need FENCED any more.  I have
> > > > > removed
> > > > > > > it
> > > > > > > > > from
> > > > > > > > > > > the
> > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The new states are very similar to the
> > > current
> > > > > > > ones,
> > > > > > > > > > > actually.
> > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > no new states or removed ones.  The main
> > > > > change in
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > machine
> > > > > > > > > > > > > > > > > > > is that the
> > > RECOVERING_FROM_UNCLEAN_SHUTDOWN
> > > > > state
> > > > > > > has
> > > > > > > > > been
> > > > > > > > > > > > > > > renamed to
> > > > > > > > > > > > > > > > > > > RECOVERY.  Also, unlike previously, the
> > > broker
> > > > > will
> > > > > > > > > always
> > > > > > > > > > > pass
> > > > > > > > > > > > > > > through
> > > > > > > > > > > > > > > > > > > RECOVERY (although it may only stay in this
> > > > > state
> > > > > > > for a
> > > > > > > > > > > very
> > > > > > > > > > > > > short
> > > > > > > > > > > > > > > > > amount
> > > > > > > > > > > > > > > > > > > of time).
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 201. This may be fine too. Could we
> > > document
> > > > > what
> > > > > > > > > happens
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > broker.id/controller.id in
> > > > > metadata.properties
> > > > > > > don't
> > > > > > > > > > > match
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > config when the broker starts up?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I added some documentation about this.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 204. There is still "The highest metadata
> > > > > offset
> > > > > > > > > which
> > > > > > > > > > > the
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > has not
> > > > > > > > > > > > > > > > > > > > reached" referenced under
> > > BrokerRegistration.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It should be CurrentMetadataOffset.  Fixed.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 206. Is that separate step needed given
> > > > > KIP-516?
> > > > > > > With
> > > > > > > > > > > > > KIP-516 (
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsr
> > > > > > > > > > > > > > > > > > > ),
> > > > > > > > > > > > > > > > > > > > we don't need to wait for the topic data
> > > to
> > > > > be
> > > > > > > > > removed
> > > > > > > > > > > from
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > > > before removing the topic metadata. The
> > > > > > > combination
> > > > > > > > > of
> > > > > > > > > > > > > unmatching
> > > > > > > > > > > > > > > > > > > > topicId
> > > > > > > > > > > > > > > > > > > > or the missing topicId from the metadata
> > > is
> > > > > > > enough
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > > clean
> > > > > > > > > > > > > > > > > > > > up deleted topics asynchronously.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It won't be needed once KIP-516 is
> > > adopted, but
> > > > > > > this
> > > > > > > > > hasn't
> > > > > > > > > > > > > been
> > > > > > > > > > > > > > > > > > > implemented yet.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Dec 8, 2020 at 5:27 PM Colin
> > > McCabe <
> > > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Thu, Dec 3, 2020, at 16:37, Jun Rao
> > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for the updated KIP. A few
> > > more
> > > > > > > comments
> > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks again for the reviews.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 80.2 For deprecated configs, we need
> > > to
> > > > > > > include
> > > > > > > > > > > > > zookeeper.*
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > broker.id.generation.enable.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Added.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 83.1 If a broker is down, does the
> > > > > controller
> > > > > > > > > keep
> > > > > > > > > > > the
> > > > > > > > > > > > > > > previously
> > > > > > > > > > > > > > > > > > > > > > registered broker epoch forever? If
> > > not,
> > > > > how
> > > > > > > long
> > > > > > > > > > > does
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > > > > > > it? What does the controller do when
> > > > > > > receiving a
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > > > > > with an unfound broker epoch?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Yes, the controller keeps the previous
> > > > > > > registration
> > > > > > > > > > > > > forever.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Broker heartbeat requests with an
> > > incorrect
> > > > > > > broker
> > > > > > > > > > > epoch
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > rejected
> > > > > > > > > > > > > > > > > > > > > with STALE_BROKER_EPOCH.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 100. Have you figured out if we need
> > > to
> > > > > add
> > > > > > > a new
> > > > > > > > > > > record
> > > > > > > > > > > > > > > type for
> > > > > > > > > > > > > > > > > > > > > reporting
> > > > > > > > > > > > > > > > > > > > > > partitions on failed disks?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I added FailedReplicaRecord to reflect
> > > the
> > > > > case
> > > > > > > > > where a
> > > > > > > > > > > > > JBOD
> > > > > > > > > > > > > > > > > directory
> > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > failed, leading to failed replicas.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 102. For debugging purposes,
> > > sometimes
> > > > > it's
> > > > > > > > > useful to
> > > > > > > > > > > > > read
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > topic using tools like
> > > console-consumer.
> > > > > > > Should
> > > > > > > > > we
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > > > > > > > > > how?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > For now, we have the ability to read
> > > the
> > > > > > > metadata
> > > > > > > > > logs
> > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > > dump-logs
> > > > > > > > > > > > > > > > > > > > > tool.  I think we will come up with
> > > some
> > > > > other
> > > > > > > > > tools
> > > > > > > > > > > in the
> > > > > > > > > > > > > > > future
> > > > > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > get experience.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 200. "brokers which are fenced will
> > > not
> > > > > > > appear in
> > > > > > > > > > > > > > > > > MetadataResponses.
> > > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > broker will not respond to these
> > > > > requests--
> > > > > > > > > instead,
> > > > > > > > > > > it
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > > > > > disconnect." If the controller is
> > > > > > > partitioned off
> > > > > > > > > > > from
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > brokers,
> > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > design will cause every broker to
> > > stop
> > > > > > > accepting
> > > > > > > > > new
> > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > > requests. In
> > > > > > > > > > > > > > > > > > > > > > contrast, if ZK is partitioned off,
> > > the
> > > > > > > existing
> > > > > > > > > > > > > behavior is
> > > > > > > > > > > > > > > > > that the
> > > > > > > > > > > > > > > > > > > > > > brokers can continue to work based
> > > on the
> > > > > > > last
> > > > > > > > > known
> > > > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > > > > > So, I
> > > > > > > > > > > > > > > > > > > am
> > > > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > > sure if we should change the existing
> > > > > > > behavior
> > > > > > > > > > > because
> > > > > > > > > > > > > of the
> > > > > > > > > > > > > > > > > bigger
> > > > > > > > > > > > > > > > > > > > > impact
> > > > > > > > > > > > > > > > > > > > > > in the new one. Another option is to
> > > > > keep the
> > > > > > > > > > > existing
> > > > > > > > > > > > > > > behavior
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > > > a metric for fenced brokers so that
> > > the
> > > > > > > operator
> > > > > > > > > > > could be
> > > > > > > > > > > > > > > > > alerted.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I'm skeptical about how well running
> > > > > without ZK
> > > > > > > > > > > currently
> > > > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > > > However,
> > > > > > > > > > > > > > > > > > > > > I will move the broker-side fencing
> > > into a
> > > > > > > > > follow-up
> > > > > > > > > > > KIP.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > KIP is
> > > > > > > > > > > > > > > > > > > > > already pretty large and there is no
> > > hard
> > > > > > > > > dependency on
> > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > also be other ways of accomplishing the
> > > > > > > positive
> > > > > > > > > > > effects of
> > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > broker-side fencing, so more
> > > discussion is
> > > > > > > needed.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 201. I read Ron's comment, but I am
> > > > > still not
> > > > > > > > > sure
> > > > > > > > > > > the
> > > > > > > > > > > > > > > benefit of
> > > > > > > > > > > > > > > > > > > keeping
> > > > > > > > > > > > > > > > > > > > > > broker.id and controller.id in
> > > > > > > meta.properties.
> > > > > > > > > It
> > > > > > > > > > > seems
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > we are
> > > > > > > > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > > > > > > duplicating the same info in two
> > > places
> > > > > and
> > > > > > > have
> > > > > > > > > the
> > > > > > > > > > > > > > > additional
> > > > > > > > > > > > > > > > > > > burden of
> > > > > > > > > > > > > > > > > > > > > > making sure the values in the two
> > > places
> > > > > are
> > > > > > > > > > > consistent.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I think the reasoning is that having
> > > > > broker.id
> > > > > > > > > > > protects us
> > > > > > > > > > > > > > > against
> > > > > > > > > > > > > > > > > > > > > accidentally bringing up a broker with
> > > a
> > > > > disk
> > > > > > > from
> > > > > > > > > a
> > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > broker.  I
> > > > > > > > > > > > > > > > > > > > > don't feel strongly about this but it
> > > > > seemed
> > > > > > > > > simpler to
> > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 202.
> > > > > controller.connect.security.protocol: Is
> > > > > > > > > this
> > > > > > > > > > > needed
> > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > > controller.listener.names and
> > > > > > > > > > > > > listener.security.protocol.map
> > > > > > > > > > > > > > > > > imply
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > security protocol already?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > You're right, this isn't needed.  I'll
> > > > > remove
> > > > > > > it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 203.
> > > registration.heartbeat.interval.ms:
> > > > > It
> > > > > > > > > > > defaults to
> > > > > > > > > > > > > 2k.
> > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > uses
> > > > > > > > > > > > > > > > > > > 1/3
> > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > the session timeout for heartbeat.
> > > So,
> > > > > given
> > > > > > > the
> > > > > > > > > > > default
> > > > > > > > > > > > > 18k
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > registration.lease.timeout.ms,
> > > should we
> > > > > > > default
> > > > > > > > > > > > > > > > > > > > > > registration.heartbeat.interval.ms
> > > to
> > > > > 6k?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 6 seconds seems like a pretty long time
> > > > > between
> > > > > > > > > > > > > heartbeats.  It
> > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > useful to know when a broker is missing
> > > > > > > heartbeats,
> > > > > > > > > > > with
> > > > > > > > > > > > > less
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > > that.  I provisionally set it to 3
> > > seconds
> > > > > (we
> > > > > > > can
> > > > > > > > > > > always
> > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > > later...)
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I also changed the name of these
> > > > > > > configurations to
> > > > > > > > > "
> > > > > > > > > > > > > > > > > > > > > broker.heartbeat.interval.ms" and "
> > > > > > > > > > > > > > > broker.registration.timeout.ms"
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > try
> > > > > > > > > > > > > > > > > > > > > to clarify them a bit.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 204. "The highest metadata offset
> > > which
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > has
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > reached."
> > > > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > > > seems this should be "has reached".
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I changed this to "one more than the
> > > > > highest
> > > > > > > > > metadata
> > > > > > > > > > > > > offset
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > broker has reached."
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 205. UnfenceBrokerRecord and
> > > > > > > > > UnregisterBrokerRecord:
> > > > > > > > > > > To
> > > > > > > > > > > > > me,
> > > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > > > seem to
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > the same. Do we need both?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Unregistration means that the broker
> > > has
> > > > > been
> > > > > > > > > removed
> > > > > > > > > > > from
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > cluster.
> > > > > > > > > > > > > > > > > > > > > That is different than unfencing, which
> > > > > marks
> > > > > > > the
> > > > > > > > > > > broker as
> > > > > > > > > > > > > > > active.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 206. TopicRecord: The Deleting field
> > > is
> > > > > used
> > > > > > > to
> > > > > > > > > > > indicate
> > > > > > > > > > > > > > > that the
> > > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > being deleted. I am wondering if
> > > this is
> > > > > > > really
> > > > > > > > > > > needed
> > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > RemoveTopic
> > > > > > > > > > > > > > > > > > > > > > already indicates the same thing.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > RemoveTopic is the last step, that
> > > scrubs
> > > > > all
> > > > > > > > > metadata
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > topic.
> > > > > > > > > > > > > > > > > > > > > In order to get to that last step, the
> > > > > topic
> > > > > > > data
> > > > > > > > > > > needs to
> > > > > > > > > > > > > > > removed
> > > > > > > > > > > > > > > > > > > from all
> > > > > > > > > > > > > > > > > > > > > brokers (after each broker notices
> > > that the
> > > > > > > topic
> > > > > > > > > is
> > > > > > > > > > > being
> > > > > > > > > > > > > > > > > deleted).
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 2, 2020 at 2:50 PM Colin
> > > > > McCabe <
> > > > > > > > > > > > > > > cmccabe@apache.org>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 2, 2020, at 14:07, Ron
> > > > > > > Dagostino
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks for the
> > > updates.
> > > > > It's
> > > > > > > now
> > > > > > > > > > > clear
> > > > > > > > > > > > > to me
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > > > > > > > keep their broker epoch for the
> > > life
> > > > > of
> > > > > > > their
> > > > > > > > > > > JVM --
> > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > > > register
> > > > > > > > > > > > > > > > > > > > > > > > once, get their broker epoch in
> > > the
> > > > > > > > > response, and
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > never
> > > > > > > > > > > > > > > > > > > > > > > > re-register again.  Brokers may
> > > get
> > > > > > > fenced,
> > > > > > > > > but
> > > > > > > > > > > they
> > > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > > broker epoch for the life of
> > > their
> > > > > JVM.
> > > > > > > The
> > > > > > > > > > > > > incarnation
> > > > > > > > > > > > > > > ID
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > > > > kept for the life of the JVM but
> > > is
> > > > > > > > > generated by
> > > > > > > > > > > the
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > itself
> > > > > > > > > > > > > > > > > > > > > > > > upon startup, and the
> > > combination of
> > > > > the
> > > > > > > two
> > > > > > > > > > > allows
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > Controller to
> > > > > > > > > > > > > > > > > > > > > > > > act idempotently if any
> > > > > previously-sent
> > > > > > > > > > > registration
> > > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > > gets
> > > > > > > > > > > > > > > > > > > > > > > > lost.  Makes sense.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks, Ron.  That's a good
> > > summary.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > One thing I wonder about is if it
> > > > > might
> > > > > > > be
> > > > > > > > > > > helpful
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > > > > > > send the Cluster ID as determined
> > > > > from
> > > > > > > its
> > > > > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > > > > > registration request.  Does it
> > > even
> > > > > make
> > > > > > > > > sense
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > > > > > > > successfully register and enter
> > > the
> > > > > > > Fenced
> > > > > > > > > state
> > > > > > > > > > > if
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > wrong
> > > > > > > > > > > > > > > > > > > > > > > > Cluster ID?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Yeah, that's a good idea.  Let's
> > > have
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > pass
> > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > > > ID in
> > > > > > > > > > > > > > > > > > > > > > > the registration RPC, and then
> > > > > > > registration can
> > > > > > > > > > > fail
> > > > > > > > > > > > > if the
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > configured for the wrong cluster.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >  The nextMetadatOffset value
> > > that the
> > > > > > > broker
> > > > > > > > > > > > > communicates
> > > > > > > > > > > > > > > > > > > > > > > > in its registration request only
> > > has
> > > > > > > meaning
> > > > > > > > > > > within
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > correct
> > > > > > > > > > > > > > > > > > > > > > > > cluster, so it feels to me that
> > > the
> > > > > > > > > Controller
> > > > > > > > > > > should
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > perform this sanity check.
> > > There is
> > > > > > > > > currently
> > > > > > > > > > > > > (pre-KIP
> > > > > > > > > > > > > > > 500)
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > check
> > > > > > > > > > > > > > > > > > > > > > > > in the broker to make sure its
> > > > > configured
> > > > > > > > > > > cluster ID
> > > > > > > > > > > > > > > matches
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > > > > > > stored in ZooKeeper, and we will
> > > > > have to
> > > > > > > > > perform
> > > > > > > > > > > this
> > > > > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > > > > > > > somewhere in the KIP-500 world.
> > > If
> > > > > the
> > > > > > > > > > > Controller
> > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > do it
> > > > > > > > > > > > > > > > > > > > > > > > within the registration request
> > > then
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > will
> > > > > > > > > > > > > > > have to
> > > > > > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > > > > > metadata request to the
> > > Controller,
> > > > > > > retrieve
> > > > > > > > > the
> > > > > > > > > > > > > Cluster
> > > > > > > > > > > > > > > ID,
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > perform the check itself.  It
> > > feels
> > > > > to me
> > > > > > > > > that it
> > > > > > > > > > > > > might
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > better for
> > > > > > > > > > > > > > > > > > > > > > > > the Controller to just do it, and
> > > > > then
> > > > > > > the
> > > > > > > > > broker
> > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > have to
> > > > > > > > > > > > > > > > > > > > > > > > worry about it anymore once it
> > > > > > > successfully
> > > > > > > > > > > > > registers.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I also have a question about the
> > > > > > > broker.id
> > > > > > > > > > > value and
> > > > > > > > > > > > > > > > > > > > > meta.properties.
> > > > > > > > > > > > > > > > > > > > > > > > The KIP now says "In version 0 of
> > > > > > > > > > > meta.properties,
> > > > > > > > > > > > > there
> > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > broker.id field.  Version 1
> > > does not
> > > > > > > have
> > > > > > > > > this
> > > > > > > > > > > > > field.
> > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > is no
> > > > > > > > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > > > > > > > needed because we no longer
> > > support
> > > > > > > dynamic
> > > > > > > > > > > broker id
> > > > > > > > > > > > > > > > > > > assignment."
> > > > > > > > > > > > > > > > > > > > > > > > But then there is an example
> > > version
> > > > > 1
> > > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > shows
> > > > > > > > > > > > > > > > > > > > > > > > the broker.id value.  I actually
> > > > > wonder
> > > > > > > if
> > > > > > > > > > > maybe the
> > > > > > > > > > > > > > > > > broker.id
> > > > > > > > > > > > > > > > > > > value
> > > > > > > > > > > > > > > > > > > > > > > > would be good to keep in the
> > > version
> > > > > 1
> > > > > > > > > > > > > meta.properties
> > > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > currently (pre-KIP 500, version
> > > 0)
> > > > > acts
> > > > > > > as a
> > > > > > > > > > > sanity
> > > > > > > > > > > > > > > check to
> > > > > > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > > > > > > > > the broker is using the correct
> > > log
> > > > > > > > > directory.
> > > > > > > > > > > > > Similarly
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > controller.id value on
> > > controllers
> > > > > -- it
> > > > > > > > > would
> > > > > > > > > > > > > allow the
> > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > sanity check for quorum
> > > controllers.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > That's a good point.  I will add
> > > > > broker.id
> > > > > > > > > back,
> > > > > > > > > > > and
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > > add
> > > > > > > > > > > > > > > > > > > > > > > controller.id as a possibility.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 7:41 PM
> > > Colin
> > > > > > > McCabe
> > > > > > > > > <
> > > > > > > > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020, at 16:10,
> > > > > Jun Rao
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few
> > > more
> > > > > > > > > comments.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks again for the reply.
> > > Sorry
> > > > > for
> > > > > > > the
> > > > > > > > > long
> > > > > > > > > > > > > > > hiatus.  I
> > > > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > > vacation for a while.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 55. There is still text that
> > > > > favors
> > > > > > > new
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > registration.
> > > > > > > > > > > > > > > > > > > > > "When a
> > > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > > first starts up, when it is
> > > in
> > > > > the
> > > > > > > > > INITIAL
> > > > > > > > > > > > > state, it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > > > > > > > "win"
> > > > > > > > > > > > > > > > > > > > > > > > > > broker ID conflicts.
> > > However,
> > > > > once
> > > > > > > it is
> > > > > > > > > > > > > granted a
> > > > > > > > > > > > > > > > > lease, it
> > > > > > > > > > > > > > > > > > > > > > > transitions
> > > > > > > > > > > > > > > > > > > > > > > > > > out of the INITIAL state.
> > > > > > > Thereafter,
> > > > > > > > > it may
> > > > > > > > > > > > > lose
> > > > > > > > > > > > > > > > > subsequent
> > > > > > > > > > > > > > > > > > > > > > > conflicts if
> > > > > > > > > > > > > > > > > > > > > > > > > > its broker epoch is stale.
> > > (See
> > > > > > > KIP-380
> > > > > > > > > for
> > > > > > > > > > > some
> > > > > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > > epoch.)  The reason for
> > > favoring
> > > > > new
> > > > > > > > > > > processes
> > > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > > accommodate
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > > > > > > > case where a process is
> > > killed
> > > > > with
> > > > > > > kill
> > > > > > > > > -9
> > > > > > > > > > > and
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > > restarted.
> > > > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > > want it
> > > > > > > > > > > > > > > > > > > > > > > > > > to be able to reclaim its
> > > old ID
> > > > > > > quickly
> > > > > > > > > in
> > > > > > > > > > > this
> > > > > > > > > > > > > > > case."
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the reminder.  I
> > > have
> > > > > > > clarified
> > > > > > > > > the
> > > > > > > > > > > > > language
> > > > > > > > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > > > > > > > Hopefully now it is clear that we
> > > don't
> > > > > > > allow
> > > > > > > > > quick
> > > > > > > > > > > > > re-use
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 80.1 Sounds good. Could you
> > > > > document
> > > > > > > that
> > > > > > > > > > > > > listeners
> > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > required
> > > > > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > > now? It would also be useful
> > > to
> > > > > > > annotate
> > > > > > > > > > > other
> > > > > > > > > > > > > > > required
> > > > > > > > > > > > > > > > > > > configs.
> > > > > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > > > > > > > > example, controller.connect
> > > > > should be
> > > > > > > > > > > required.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I added a note specifying that
> > > > > these
> > > > > > > are
> > > > > > > > > > > required.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 80.2 Could you list all
> > > > > deprecated
> > > > > > > > > existing
> > > > > > > > > > > > > configs?
> > > > > > > > > > > > > > > > > Another
> > > > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > control.plane.listener.name
> > > > > since
> > > > > > > the
> > > > > > > > > > > > > controller no
> > > > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > > sends
> > > > > > > > > > > > > > > > > > > > > > > > > > LeaderAndIsr, UpdateMetadata
> > > and
> > > > > > > > > StopReplica
> > > > > > > > > > > > > > > requests.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I added a section specifying
> > > some
> > > > > > > > > deprecated
> > > > > > > > > > > > > configs.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 83.1 It seems that the
> > > broker can
> > > > > > > > > transition
> > > > > > > > > > > from
> > > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > RUNNING
> > > > > > > > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > > > > > > > registering for a new broker
> > > > > epoch.
> > > > > > > I am
> > > > > > > > > not
> > > > > > > > > > > > > sure how
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > > > > > > > Once the
> > > > > > > > > > > > > > > > > > > > > > > > > > controller fences a broker,
> > > > > there is
> > > > > > > no
> > > > > > > > > need
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > > > > > > > > > boker epoch around. So, if
> > > the
> > > > > fenced
> > > > > > > > > > > broker's
> > > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > > > > > > > > > existing broker epoch will be
> > > > > > > rejected,
> > > > > > > > > > > leading
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > back
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > FENCED state again.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > The broker epoch refers to the
> > > > > broker
> > > > > > > > > > > registration.
> > > > > > > > > > > > > > > So we
> > > > > > > > > > > > > > > > > DO
> > > > > > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > broker epoch around even while the
> > > > > broker
> > > > > > > is
> > > > > > > > > > > fenced.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > The broker epoch changes only
> > > when
> > > > > > > there
> > > > > > > > > is a
> > > > > > > > > > > new
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > registration.  Fencing or
> > > unfencing the
> > > > > > > broker
> > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > epoch.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 83.5 Good point on KIP-590.
> > > Then
> > > > > > > should
> > > > > > > > > we
> > > > > > > > > > > > > expose the
> > > > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > > debugging purposes? If not,
> > > we
> > > > > should
> > > > > > > > > > > deprecate
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > controllerID
> > > > > > > > > > > > > > > > > > > > > > > field in
> > > > > > > > > > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I think it's OK to expose it
> > > for
> > > > > now,
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > > proviso
> > > > > > > > > > > > > > > > > that it
> > > > > > > > > > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > > > > > > > > > be reachable by clients.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 90. We rejected the shared ID
> > > > > with
> > > > > > > just
> > > > > > > > > one
> > > > > > > > > > > > > reason
> > > > > > > > > > > > > > > "This
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > not a
> > > > > > > > > > > > > > > > > > > > > > > good idea
> > > > > > > > > > > > > > > > > > > > > > > > > > because NetworkClient
> > > assumes a
> > > > > > > single ID
> > > > > > > > > > > > > space.  So
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > > both a
> > > > > > > > > > > > > > > > > > > > > > > > > > controller 1 and a broker 1,
> > > we
> > > > > don't
> > > > > > > > > have a
> > > > > > > > > > > way
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > picking
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > "right"
> > > > > > > > > > > > > > > > > > > > > > > > > > one." This doesn't seem to
> > > be a
> > > > > > > strong
> > > > > > > > > > > reason.
> > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > > > > > address the NetworkClient
> > > issue
> > > > > with
> > > > > > > the
> > > > > > > > > node
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > as you
> > > > > > > > > > > > > > > > > > > pointed
> > > > > > > > > > > > > > > > > > > > > > > out or
> > > > > > > > > > > > > > > > > > > > > > > > > > using the negative value of a
> > > > > broker
> > > > > > > ID
> > > > > > > > > as
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > controller ID.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > It would require a lot of code
> > > > > changes
> > > > > > > to
> > > > > > > > > > > support
> > > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > > > > > types of
> > > > > > > > > > > > > > > > > > > > > > > node IDs.  It's not clear to me
> > > that
> > > > > the
> > > > > > > end
> > > > > > > > > result
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > better --
> > > > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > > tend to think it would be worse,
> > > since
> > > > > it
> > > > > > > > > would be
> > > > > > > > > > > more
> > > > > > > > > > > > > > > > > complex.
> > > > > > > > > > > > > > > > > > > In a
> > > > > > > > > > > > > > > > > > > > > > > similar vein, using negative
> > > numbers
> > > > > seems
> > > > > > > > > > > dangerous,
> > > > > > > > > > > > > > > since we
> > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > > negatives or -1 as "special
> > > values" in
> > > > > many
> > > > > > > > > places.
> > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > > > > > -1
> > > > > > > > > > > > > > > > > > > > > often
> > > > > > > > > > > > > > > > > > > > > > > represents "no such node."
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > One important thing to keep in
> > > > > mind is
> > > > > > > > > that we
> > > > > > > > > > > > > want to
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > able
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > transition from a broker and a
> > > > > controller
> > > > > > > being
> > > > > > > > > > > > > co-located
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > > > > longer being co-located.  This is
> > > much
> > > > > > > easier
> > > > > > > > > to do
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 100. In KIP-589
> > > > > > > > > > > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > > > > > > > > > >,
> > > > > > > > > > > > > > > > > > > > > > > > > > the broker reports all
> > > offline
> > > > > > > replicas
> > > > > > > > > due
> > > > > > > > > > > to a
> > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > > failure to
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > controller. It seems this
> > > > > information
> > > > > > > > > needs
> > > > > > > > > > > to be
> > > > > > > > > > > > > > > > > persisted
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > > > > > log. Do we have a
> > > corresponding
> > > > > > > record
> > > > > > > > > for
> > > > > > > > > > > that?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Hmm, I have to look into this a
> > > > > little
> > > > > > > bit
> > > > > > > > > > > more.
> > > > > > > > > > > > > We
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > a new
> > > > > > > > > > > > > > > > > > > > > > > record type.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 101. Currently, StopReplica
> > > > > request
> > > > > > > has 2
> > > > > > > > > > > modes,
> > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > > > > > > and with
> > > > > > > > > > > > > > > > > > > > > > > > > > deletion. The former is used
> > > for
> > > > > > > > > controlled
> > > > > > > > > > > > > shutdown
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > handling
> > > > > > > > > > > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > > > > > > > > > failure, and causes the
> > > follower
> > > > > to
> > > > > > > > > stop. The
> > > > > > > > > > > > > latter
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > > > > > > > > > and partition reassignment,
> > > and
> > > > > > > causes
> > > > > > > > > the
> > > > > > > > > > > > > replica
> > > > > > > > > > > > > > > to be
> > > > > > > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > > > > > > > > Since we
> > > > > > > > > > > > > > > > > > > > > > > > > > are deprecating StopReplica,
> > > > > could we
> > > > > > > > > > > document
> > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > triggers
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > stopping
> > > > > > > > > > > > > > > > > > > > > > > > > > of a follower and the
> > > deleting
> > > > > of a
> > > > > > > > > replica
> > > > > > > > > > > now?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > RemoveTopic triggers
> > > deletion.  In
> > > > > > > general
> > > > > > > > > the
> > > > > > > > > > > > > > > > > functionality of
> > > > > > > > > > > > > > > > > > > > > > > StopReplica is subsumed by the
> > > metadata
> > > > > > > > > records.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 102. Should we include the
> > > > > metadata
> > > > > > > > > topic in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > > > > > > > If so,
> > > > > > > > > > > > > > > > > > > > > > > > > > when it will be included and
> > > what
> > > > > > > will
> > > > > > > > > the
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > like?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > No, it won't be included in the
> > > > > > > metadata
> > > > > > > > > > > response
> > > > > > > > > > > > > sent
> > > > > > > > > > > > > > > back
> > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 103. "The active controller
> > > > > assigns
> > > > > > > the
> > > > > > > > > > > broker a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > epoch,
> > > > > > > > > > > > > > > > > > > > > > > based on
> > > > > > > > > > > > > > > > > > > > > > > > > > the latest committed offset
> > > in
> > > > > the
> > > > > > > log."
> > > > > > > > > This
> > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > > inaccurate
> > > > > > > > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > latest committed offset
> > > doesn't
> > > > > > > always
> > > > > > > > > > > advance on
> > > > > > > > > > > > > > > every
> > > > > > > > > > > > > > > > > log
> > > > > > > > > > > > > > > > > > > > > append.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Given that the new broker epoch
> > > > > won't
> > > > > > > be
> > > > > > > > > > > visible
> > > > > > > > > > > > > until
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > happened, I have changed this to
> > > "the
> > > > > next
> > > > > > > > > > > available
> > > > > > > > > > > > > > > offset in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > log"
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 104. REGISTERING(1) : It says
> > > > > > > > > "Otherwise, the
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > moves
> > > > > > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > > > > > > > > > > > state.". It seems this
> > > should be
> > > > > > > RUNNING?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 105. RUNNING: Should we
> > > require
> > > > > the
> > > > > > > > > broker to
> > > > > > > > > > > > > catch
> > > > > > > > > > > > > > > up
> > > > > > > > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > > > > > > > metadata log
> > > > > > > > > > > > > > > > > > > > > > > > > > to get into this state?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > For 104 and 105, these sections
> > > > > have
> > > > > > > been
> > > > > > > > > > > reworked.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20
> > > PM
> > > > > Colin
> > > > > > > > > McCabe
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > cmccabe@apache.org
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2020, at
> > > > > 05:51, Tom
> > > > > > > > > Bentley
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 19, 2020, at
> > > > > 08:59,
> > > > > > > Ron
> > > > > > > > > > > Dagostino
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks
> > > for the
> > > > > > > hard
> > > > > > > > > work
> > > > > > > > > > > on
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have some questions
> > > > > about
> > > > > > > what
> > > > > > > > > > > happens
> > > > > > > > > > > > > to a
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > when it
> > > > > > > > > > > > > > > > > > > > > > > becomes
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fenced (e.g. because
> > > it
> > > > > can't
> > > > > > > > > send a
> > > > > > > > > > > > > > > heartbeat
> > > > > > > > > > > > > > > > > > > request to
> > > > > > > > > > > > > > > > > > > > > > > keep its
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > lease).  The KIP says
> > > > > "When a
> > > > > > > > > broker
> > > > > > > > > > > is
> > > > > > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > cannot
> > > > > > > > > > > > > > > > > > > > > > > process any
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > client requests.
> > > This
> > > > > > > prevents
> > > > > > > > > > > brokers
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > are not
> > > > > > > > > > > > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > metadata updates or
> > > that
> > > > > are
> > > > > > > not
> > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > > > > > > > > > them fast
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > enough from causing
> > > > > issues to
> > > > > > > > > > > clients."
> > > > > > > > > > > > > And
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > description of the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > FENCED(4) state it
> > > > > likewise
> > > > > > > says
> > > > > > > > > > > "While
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > state,
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > not respond to client
> > > > > > > > > requests."  It
> > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > > > > > that a
> > > > > > > > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > should not accept
> > > > > producer
> > > > > > > > > requests
> > > > > > > > > > > -- I
> > > > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > would result in
> > > > > > > > > > > > > NotLeaderOrFollowerException.
> > > > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > about KIP-392
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (fetch from follower)
> > > > > > > consumer
> > > > > > > > > > > > > requests?  It
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > conceivable
> > > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > could continue.
> > > Related
> > > > > to
> > > > > > > that,
> > > > > > > > > > > would a
> > > > > > > > > > > > > > > fenced
> > > > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fetch data for
> > > partitions
> > > > > > > where
> > > > > > > > > it
> > > > > > > > > > > > > thinks it
> > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > follower?
> > > > > > > > > > > > > > > > > > > > > > > Even if
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it rejects consumer
> > > > > requests
> > > > > > > it
> > > > > > > > > might
> > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > > continue
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > fetch as a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > follower.  Might it
> > > be
> > > > > > > helpful to
> > > > > > > > > > > clarify
> > > > > > > > > > > > > > > both
> > > > > > > > > > > > > > > > > > > decisions
> > > > > > > > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ron,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good question.  I
> > > think a
> > > > > > > fenced
> > > > > > > > > broker
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > > fetch on
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > partitions it was
> > > already
> > > > > > > fetching
> > > > > > > > > > > before
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > > > > > > > > > unless it
> > > > > > > > > > > > > > > > > > > > > > > > > > > hits a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > problem.  At that
> > > point it
> > > > > > > won't be
> > > > > > > > > > > able to
> > > > > > > > > > > > > > > > > continue,
> > > > > > > > > > > > > > > > > > > > > since it
> > > > > > > > > > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > the new metadata.  For
> > > > > > > example, it
> > > > > > > > > > > won't
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > the partitions it's
> > > > > fetching.
> > > > > > > The
> > > > > > > > > > > > > rationale
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > continuing to
> > > > > > > > > > > > > > > > > > > > > > > fetch
> > > > > > > > > > > > > > > > > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > try to avoid
> > > disruptions as
> > > > > > > much as
> > > > > > > > > > > > > possible.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think fenced
> > > > > brokers
> > > > > > > should
> > > > > > > > > > > accept
> > > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > > requests.
> > > > > > > > > > > > > > > > > > > > > > > The issue
> > > > > > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > that the fenced broker
> > > may
> > > > > or
> > > > > > > may
> > > > > > > > > not
> > > > > > > > > > > have
> > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > data it
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > supposed to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > have.  It may or may
> > > not
> > > > > have
> > > > > > > > > applied
> > > > > > > > > > > any
> > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > > changes, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > it is supposed to have
> > > > > applied.
> > > > > > > > > So it
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > > > > > > > > confusing, and
> > > > > > > > > > > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > potentially waste the
> > > > > client's
> > > > > > > > > time.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > When fenced, how would
> > > the
> > > > > broker
> > > > > > > > > reply
> > > > > > > > > > > to a
> > > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > > which did
> > > > > > > > > > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > request?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > The broker will respond
> > > with a
> > > > > > > > > retryable
> > > > > > > > > > > error
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > > > > > Once
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > client has re-fetched its
> > > > > > > metadata, it
> > > > > > > > > > > will no
> > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > see
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > > > > > > as part of the cluster.  I
> > > > > added a
> > > > > > > > > note to
> > > > > > > > > > > the
> > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> 
> 
> 
> -- 
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>