You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2018/09/05 22:12:47 UTC

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Hi, Lucas,

Thanks for the reply. Have you actually updated the KIP? The wiki says that
it's last updated on Aug. 22. and some of the changes that you mentioned
(#1 and #3) are not there.

Also, regarding Joel's comment on network/request idle ratio metrics, could
you comment on whether they include the new controller listener? If not, do
we need additional metrics to measure the utilization of the io thread for
the control plane?

Jun

On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the comments, Jun.
>
> 1. I think the answer should be no, since the "inter.broker.listener.name"
> are also used
> for replication traffic, and merging these two types of request to the
> single threaded tunnel
> would defeat the purpose of this KIP and also hurt replication throughput.
> So I think that means
> we should validate to make sure when the new config is set, it's different
> from "inter.broker.listener.name"
> or "security.inter.broker.protocol", whichever is set.
>
> 2. Normally all broker configs in a given cluster are changed at the same
> time. If there is a typo in the
> controller.listener.name and it's not available in the endpoints list, we
> could catch it, give an error
> and block restart of the first broker in the cluster. With that, we could
> keep the current behavior
> in the KIP write up that falls back to "inter.broker.listener.nam" when the
> "controller.listener.name"
> is not found during the migration phase of this KIP. Thoughts?
>
> 3. That makes sense, and I've changed it.
>
> Thanks,
> Lucas
>
> On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > Sorry for the delay. The new proposal looks good to me overall. A few
> minor
> > comments below.
> >
> > 1. It's possible that listener.name.for.controller is set, but set to the
> > same value as inter.broker.listener.name. In that case, should we have a
> > single network thread and the request handling thread for that listener?
> >
> > 2. Currently, the controller always picks the listener specified by
> > inter.broker.listener.name even if the listener name is not present in
> the
> > receiving broker. This KIP proposes a slightly different approach for
> > picking listener.name.for.controller only when the receiving end has the
> > listener and switches listener.name.for.controller otherwise. There are
> > some tradeoffs between the two approaches. To change the inter broker
> > listener, the former requires 2 steps: (1) adding the new listener to
> > listener list in every broker and (2) changing
> > listener.name.for.controller.
> > The latter can do both changes in 1 step. On the hand, if
> > listener.name.for.controller
> > is mis-configured, the former will report an error and the latter will
> hide
> > it (so the user may not know the misconfiguration). It seems that we
> should
> > pick one approach to handle both listener.name.for.controller and
> > inter.broker.listener.name consistently. To me, the former seems
> slightly
> > better.
> >
> > 3. To be consistent with the existing naming, should
> > listener.name.for.controller
> > be controller.listener.name?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Jun and Joel,
> > >
> > > The KIP writeup has changed by quite a bit since your +1.
> > > Can you please take another review? Thanks a lot for your time!
> > >
> > > Lucas
> > >
> > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > >
> > > > +1 on the KIP.
> > > >
> > > > (I'm not sure we actually necessary to introduce the condition
> > variables
> > > > for the concern that Jun raised, but it's an implementation detail
> that
> > > we
> > > > can defer to a discussion in the PR.)
> > > >
> > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > I agree by using the conditional variables, there is no need to add
> > > such
> > > > a
> > > > > new config.
> > > > > Also thanks for approving this KIP.
> > > > >
> > > > > Lucas
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

Thanks for the updated KIP. About the name of the metric, since it's tagged
by type=RequestChannel, it's probably fine to use the proposed name. So, +1
for the KIP.

Jun

On Thu, Oct 4, 2018 at 10:31 AM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
> Please take a look again when you have time.
>
> Regards,
> Lucas
>
> On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > 200. That's a valid concern. So, we can probably just keep the current
> > name.
> >
> > 201. I am thinking that you would upgrade in the same way as changing
> > inter.broker.listener.name. This requires 2 rounds of rolling restart.
> In
> > the first round, we add the controller endpoint to the listeners w/o
> > setting controller.listener.name. In the second round, every broker sets
> > controller.listener.name. At that point, the controller listener is
> ready
> > in every broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks for the further comments, Jun.
> > >
> > > 200. Currently in the code base, we have the term of "ControlBatch"
> > related
> > > to
> > > idempotent/transactional producing. Do you think it's a concern for
> > reusing
> > > the term "control"?
> > >
> > > 201. It's not clear to me how it would work by following the same
> > strategy
> > > for "controller.listener.name".
> > > Say the new controller has its "controller.listener.name" set to the
> > value
> > > "CONTROLLER", and broker 1
> > > has picked up this KIP by announcing
> > > "endpoints": [
> > >         "CONTROLLER://broker1.example.com:9091",
> > >         "INTERNAL://broker1.example.com:9092",
> > >         "EXTERNAL://host1.example.com:9093"
> > >     ],
> > >
> > > while broker2 has not picked up the change, and is announcing
> > > "endpoints": [
> > >         "INTERNAL://broker2.example.com:9092",
> > >         "EXTERNAL://host2.example.com:9093"
> > >     ],
> > > to support both broker 1 for the new behavior and broker 2 for the old
> > > behavior, it seems the controller must
> > > check their published endpoints. Am I missing something?
> > >
> > > Thanks!
> > > Lucas
> > >
> > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > Sorry for the delay. The updated wiki looks good to me overall. Just
> a
> > > > couple more minor comments.
> > > >
> > > > 200. kafka.network:name=ControllerRequestQueueSize,
> type=RequestChannel:
> > > The
> > > > name ControllerRequestQueueSize gives the impression that it's only
> for
> > > the
> > > > controller broker. Perhaps we can just rename all metrics and configs
> > > from
> > > > controller to control. This indicates that the threads and the queues
> > are
> > > > for the control requests (as oppose to data requests).
> > > >
> > > > 201. "<new controller, old broker>: In this scenario, the controller
> > will
> > > > have the "controller.listener.name" config set to a value like
> > > > "CONTROLLER", however the broker's exposed endpoints do not have an
> > entry
> > > > corresponding to the new listener name. Hence the controller should
> > > > preserve the existing behavior by determining the endpoint using
> > > > *inter-broker-listener-name *value. The end result should be the same
> > > > behavior as today." Currently, the controller makes connections based
> > on
> > > > its local inter.broker.listener.name config without checking the
> > target
> > > > broker's ZK registration. For consistency, perhaps we can just follow
> > the
> > > > same strategy for controller.listener.name. This existing behavior
> > seems
> > > > simpler to understand and has the benefit of catching inconsistent
> > > configs
> > > > across brokers.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Sorry to bother you again. Can you please take a look at the wiki
> > again
> > > > > when you have time?
> > > > >
> > > > > Thanks a lot!
> > > > > Lucas
> > > > >
> > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks a lot for the detailed explanation.
> > > > > > I've restored the wiki to a previous version that does not
> require
> > > > config
> > > > > > changes,
> > > > > > and keeps the current behavior with the proposed changes turned
> off
> > > by
> > > > > > default.
> > > > > > I'd appreciate it if you can review it again.
> > > > > >
> > > > > > Thanks!
> > > > > > Lucas
> > > > > >
> > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > >> Hi, Lucas,
> > > > > >>
> > > > > >> When upgrading to a minor release, I think the expectation is
> > that a
> > > > > user
> > > > > >> wouldn't need to make any config changes, other than the usual
> > > > > >> inter.broker.protocol. If we require other config changes during
> > an
> > > > > >> upgrade, then it's probably better to do that in a major
> release.
> > > > > >>
> > > > > >> Regarding your proposal, I think removing host/advertised_host
> in
> > > > favor
> > > > > of
> > > > > >> listeners:advertised_listeners seems useful regardless of this
> > KIP.
> > > > > >> However, that can probably wait until a major release.
> > > > > >>
> > > > > >> As for the controller listener, I am not sure if one has to set
> > it.
> > > To
> > > > > >> make
> > > > > >> a cluster healthy, one sort of have to make sure that the
> request
> > > > queue
> > > > > is
> > > > > >> never full and no request will be sitting in the request queue
> for
> > > > long.
> > > > > >> If
> > > > > >> one does that, setting the controller listener may not be
> > necessary.
> > > > On
> > > > > >> the
> > > > > >> flip side, even if one sets the controller listener, but the
> > request
> > > > > queue
> > > > > >> and the request time for the data part are still high, the
> cluster
> > > may
> > > > > >> still not be healthy. Given that we have already started the 2.1
> > > > release
> > > > > >> planning, perhaps we can start with not requiring the controller
> > > > > listener.
> > > > > >> If this is indeed something that everyone wants to set, we can
> > make
> > > > it a
> > > > > >> required config in a major release.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > @Jun Rao <ju...@confluent.io>
> > > > > >> >
> > > > > >> > I made the recent config changes after thinking about the
> > default
> > > > > >> behavior
> > > > > >> > for adopting this KIP.
> > > > > >> > I think there are basically two options:
> > > > > >> > 1. By default, the behavior proposed in this KIP is turned
> off,
> > > and
> > > > > >> > operators can turn it
> > > > > >> > on by adding the "controller.listener.name" config and
> entries
> > in
> > > > the
> > > > > >> > "listeners" and "advertised.listeners" list.
> > > > > >> > If no "controller.listener.name" is added, it'll be the *same
> > as*
> > > > > the "
> > > > > >> > inter.broker.listener.name",
> > > > > >> > and the proposed feature is effectively turned off.
> > > > > >> > This has been the assumption in the KIP writeup before.
> > > > > >> >
> > > > > >> > 2. By default, the behavior proposed in this KIP is turned on,
> > and
> > > > > >> > operators are forced to
> > > > > >> > recognize the proposed change if their "listeners" config is
> set
> > > > (this
> > > > > >> is
> > > > > >> > most likely in production environments),
> > > > > >> > by allocating a new port for controller connections, and
> adding
> > a
> > > > new
> > > > > >> > endpoint to the "listeners" config.
> > > > > >> > For cases where "listeners" is not set explicitly,
> > > > > >> > there needs to be a default value for it that includes the
> > > > controller
> > > > > >> > listener name,
> > > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > > >> >
> > > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > > >> > I naturally think in the long run, it's worth the effort to
> > adopt
> > > > this
> > > > > >> > feature,
> > > > > >> > in order to prevent issues under circumstances listed in the
> > > > > motivation
> > > > > >> > section.
> > > > > >> >
> > > > > >> > 100, following the argument above, I want to enforce the
> > > separation
> > > > > >> > between controller
> > > > > >> > and data plane requests. Hence the "controller.listener.name"
> > > > should
> > > > > >> > never be the same
> > > > > >> > as the "inter.broker.listener.name", which is intended for
> data
> > > > plane
> > > > > >> > requests.
> > > > > >> >
> > > > > >> > 101, the default value for "listeners" will be
> > > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > > >> > making values of "host", and "port" not being used under any
> > > config
> > > > > >> > settings.
> > > > > >> > And the default value for "advertised.listeners" will be
> derived
> > > > from
> > > > > >> > "listeners",
> > > > > >> > making the values of "advertised.host", and "advertised.port"
> > not
> > > > > being
> > > > > >> > used any more.
> > > > > >> >
> > > > > >> > 102, for upgrading, a single broker that has "listeners"
> and/or
> > > > > >> > "advertised.listeners" set,
> > > > > >> > must add a new endpoint for the CONTROLLER listener name, or
> end
> > > up
> > > > > >> using
> > > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > > > > >> > During rolling upgrade, in cases of <old controller> + <new
> > > broker>
> > > > or
> > > > > >> > <new controller>  + <old broker>
> > > > > >> > we still need to fall back to the current behavior. However
> > after
> > > > the
> > > > > >> > rolling upgrade is done,
> > > > > >> > it is guaranteed that the controller plane and data plane are
> > > > > separated,
> > > > > >> > given
> > > > > >> > the "controller.listener.name" must be different from "
> > > > > >> > inter.broker.listener.name".
> > > > > >> >
> > > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > > >> > Thanks for pointing that out. I did not know that.
> > > > > >> > However my question is if the argument above makes sense, and
> my
> > > > code
> > > > > >> > change
> > > > > >> > causes the configs "host", "port", "advertised.host",
> > > > > "advertised.port"
> > > > > >> to
> > > > > >> > be not used under any circumstance,
> > > > > >> > then it's no different from removing them.
> > > > > >> > Anyway if there is still a concern about removing them, is
> > there a
> > > > new
> > > > > >> > major new version
> > > > > >> > now or in the future where I can remove them?
> > > > > >> >
> > > > > >> > Thanks!
> > > > > >> > Lucas
> > > > > >> >
> > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <
> ismael@juma.me.uk>
> > > > > wrote:
> > > > > >> >
> > > > > >> >> To be clear, we can only remove configs in major new
> versions.
> > > > > >> Otherwise,
> > > > > >> >> we can only deprecate them.
> > > > > >> >>
> > > > > >> >> Ismael
> > > > > >> >>
> > > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > >> >>
> > > > > >> >> > Hi, Lucas,
> > > > > >> >> >
> > > > > >> >> > For the network idlePct, your understanding is correct.
> > > > Currently,
> > > > > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > > > > io-ratio)
> > > > > >> in
> > > > > >> >> the
> > > > > >> >> > selector of all network threads.
> > > > > >> >> >
> > > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > > >> >> >
> > > > > >> >> > I am not still not quite sure about the configs.
> > > > > >> >> >
> > > > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> > > > broker
> > > > > >> >> startup,
> > > > > >> >> > we will validate its value and make sure it's different
> from
> > > the
> > > > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > > > >> >> > controller.listener.name has to be different from
> > > > > >> >> > inter.broker.listener.name?
> > > > > >> >> > That seems limiting.
> > > > > >> >> >
> > > > > >> >> > 101. The KIP says that advertised.listeners and listeners
> > will
> > > > now
> > > > > >> have
> > > > > >> >> a
> > > > > >> >> > different default value including controller. Could you
> > > document
> > > > > what
> > > > > >> >> the
> > > > > >> >> > default value looks like?
> > > > > >> >> >
> > > > > >> >> > 102. About removing the the following configs. How does
> that
> > > > affect
> > > > > >> the
> > > > > >> >> > upgrade path? Do we now expect a user to add a new config
> > when
> > > > > >> upgrading
> > > > > >> >> > from an old version to a new one?
> > > > > >> >> > host
> > > > > >> >> > port
> > > > > >> >> > advertised.host
> > > > > >> >> > advertised.port
> > > > > >> >> >
> > > > > >> >> > Thanks,
> > > > > >> >> >
> > > > > >> >> > Jun
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > > >> >> > >
> > > > > >> >> > > One clarification, currently on the selector level, we
> have
> > > the
> > > > > >> >> > > io-wait-ratio metric.
> > > > > >> >> > > For the new controller *network* thread, we can use it
> > > directly
> > > > > for
> > > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > > >> >> > > so that the logic is similar to the current average
> IdlePct
> > > for
> > > > > >> >> network
> > > > > >> >> > > threads. Is that correct?
> > > > > >> >> > >
> > > > > >> >> > > I've revised the KIP by adding two new metrics for
> > measuring
> > > > the
> > > > > >> >> IdlePct
> > > > > >> >> > > for the two additional threads.
> > > > > >> >> > > Please take a look again. Thanks!
> > > > > >> >> > >
> > > > > >> >> > > Lucas
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <jun@confluent.io
> >
> > > > wrote:
> > > > > >> >> > >
> > > > > >> >> > > > Hi, Lucas,
> > > > > >> >> > > >
> > > > > >> >> > > > Thanks for the updated KIP.
> > > > > >> >> > > >
> > > > > >> >> > > > For monitoring the network thread utilization for the
> > > control
> > > > > >> >> plane, we
> > > > > >> >> > > > already have the metric io-ratio at the selector level
> > > > (idlePct
> > > > > >> is
> > > > > >> >> 1 -
> > > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > > meaningful
> > > > > >> name.
> > > > > >> >> > > >
> > > > > >> >> > > > For monitoring the io thread utilization for the
> control
> > > > plane,
> > > > > >> it's
> > > > > >> >> > > > probably useful to have a separate metric for that. The
> > > > > >> controller
> > > > > >> >> > > request
> > > > > >> >> > > > queue size may not reflect the history in a window.
> > > > > >> >> > > >
> > > > > >> >> > > > Jun
> > > > > >> >> > > >
> > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com>
> > > > > >> >> > > wrote:
> > > > > >> >> > > >
> > > > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> > > forgot
> > > > to
> > > > > >> >> click
> > > > > >> >> > the
> > > > > >> >> > > > > "Update" button, :)
> > > > > >> >> > > > > It's updated now.
> > > > > >> >> > > > >
> > > > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > > > threads,
> > > > > I
> > > > > >> >> > > discussed
> > > > > >> >> > > > > with Joel,
> > > > > >> >> > > > > and think they are not as useful, and I added our
> > > reasoning
> > > > > in
> > > > > >> the
> > > > > >> >> > last
> > > > > >> >> > > > > paragraph of the
> > > > > >> >> > > > > "How are controller requests handled over the
> dedicated
> > > > > >> >> connections?"
> > > > > >> >> > > > > section.
> > > > > >> >> > > > > On the other hand, we don't strongly oppose adding
> them
> > > if
> > > > > you
> > > > > >> >> think
> > > > > >> >> > > they
> > > > > >> >> > > > > are necessary.
> > > > > >> >> > > > >
> > > > > >> >> > > > > Thanks,
> > > > > >> >> > > > > Lucas
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> > jun@confluent.io
> > > >
> > > > > >> wrote:
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Hi, Lucas,
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Thanks for the reply. Have you actually updated the
> > > KIP?
> > > > > The
> > > > > >> >> wiki
> > > > > >> >> > > says
> > > > > >> >> > > > > that
> > > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
> changes
> > > > that
> > > > > >> you
> > > > > >> >> > > > mentioned
> > > > > >> >> > > > > > (#1 and #3) are not there.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Also, regarding Joel's comment on network/request
> > idle
> > > > > ratio
> > > > > >> >> > metrics,
> > > > > >> >> > > > > could
> > > > > >> >> > > > > > you comment on whether they include the new
> > controller
> > > > > >> >> listener? If
> > > > > >> >> > > > not,
> > > > > >> >> > > > > do
> > > > > >> >> > > > > > we need additional metrics to measure the
> utilization
> > > of
> > > > > the
> > > > > >> io
> > > > > >> >> > > thread
> > > > > >> >> > > > > for
> > > > > >> >> > > > > > the control plane?
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Jun
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > > >> >> lucasatucla@gmail.com
> > > > > >> >> > >
> > > > > >> >> > > > > wrote:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > > > >> >> > > > > > inter.broker.listener.name"
> > > > > >> >> > > > > > > are also used
> > > > > >> >> > > > > > > for replication traffic, and merging these two
> > types
> > > of
> > > > > >> >> request
> > > > > >> >> > to
> > > > > >> >> > > > the
> > > > > >> >> > > > > > > single threaded tunnel
> > > > > >> >> > > > > > > would defeat the purpose of this KIP and also
> hurt
> > > > > >> replication
> > > > > >> >> > > > > > throughput.
> > > > > >> >> > > > > > > So I think that means
> > > > > >> >> > > > > > > we should validate to make sure when the new
> config
> > > is
> > > > > set,
> > > > > >> >> it's
> > > > > >> >> > > > > > different
> > > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > > >> >> > > > > > > or "security.inter.broker.protocol", whichever
> is
> > > set.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster
> > are
> > > > > >> changed
> > > > > >> >> at
> > > > > >> >> > > the
> > > > > >> >> > > > > same
> > > > > >> >> > > > > > > time. If there is a typo in the
> > > > > >> >> > > > > > > controller.listener.name and it's not available
> in
> > > the
> > > > > >> >> endpoints
> > > > > >> >> > > > list,
> > > > > >> >> > > > > > we
> > > > > >> >> > > > > > > could catch it, give an error
> > > > > >> >> > > > > > > and block restart of the first broker in the
> > cluster.
> > > > > With
> > > > > >> >> that,
> > > > > >> >> > we
> > > > > >> >> > > > > could
> > > > > >> >> > > > > > > keep the current behavior
> > > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > > >> >> > "inter.broker.listener.nam"
> > > > > >> >> > > > when
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > "controller.listener.name"
> > > > > >> >> > > > > > > is not found during the migration phase of this
> > KIP.
> > > > > >> Thoughts?
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > Thanks,
> > > > > >> >> > > > > > > Lucas
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > > jun@confluent.io>
> > > > > >> >> > wrote:
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > > Hi, Lucas,
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks
> good
> > to
> > > > me
> > > > > >> >> > overall. A
> > > > > >> >> > > > few
> > > > > >> >> > > > > > > minor
> > > > > >> >> > > > > > > > comments below.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 1. It's possible that
> > listener.name.for.controller
> > > is
> > > > > >> set,
> > > > > >> >> but
> > > > > >> >> > > set
> > > > > >> >> > > > to
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > > same value as inter.broker.listener.name. In
> > that
> > > > > case,
> > > > > >> >> should
> > > > > >> >> > > we
> > > > > >> >> > > > > > have a
> > > > > >> >> > > > > > > > single network thread and the request handling
> > > thread
> > > > > for
> > > > > >> >> that
> > > > > >> >> > > > > > listener?
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > > > listener
> > > > > >> >> > specified
> > > > > >> >> > > by
> > > > > >> >> > > > > > > > inter.broker.listener.name even if the
> listener
> > > name
> > > > > is
> > > > > >> not
> > > > > >> >> > > > present
> > > > > >> >> > > > > in
> > > > > >> >> > > > > > > the
> > > > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > > > > different
> > > > > >> >> > approach
> > > > > >> >> > > > for
> > > > > >> >> > > > > > > > picking listener.name.for.controller only when
> > the
> > > > > >> receiving
> > > > > >> >> > end
> > > > > >> >> > > > has
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > > listener and switches
> > listener.name.for.controller
> > > > > >> >> otherwise.
> > > > > >> >> > > There
> > > > > >> >> > > > > are
> > > > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> > > change
> > > > > the
> > > > > >> >> inter
> > > > > >> >> > > > broker
> > > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
> adding
> > > the
> > > > > new
> > > > > >> >> > > listener
> > > > > >> >> > > > to
> > > > > >> >> > > > > > > > listener list in every broker and (2) changing
> > > > > >> >> > > > > > > > listener.name.for.controller.
> > > > > >> >> > > > > > > > The latter can do both changes in 1 step. On
> the
> > > > hand,
> > > > > if
> > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > >> >> > > > > > > > is mis-configured, the former will report an
> > error
> > > > and
> > > > > >> the
> > > > > >> >> > latter
> > > > > >> >> > > > > will
> > > > > >> >> > > > > > > hide
> > > > > >> >> > > > > > > > it (so the user may not know the
> > misconfiguration).
> > > > It
> > > > > >> seems
> > > > > >> >> > that
> > > > > >> >> > > > we
> > > > > >> >> > > > > > > should
> > > > > >> >> > > > > > > > pick one approach to handle both
> > > > > >> >> listener.name.for.controller
> > > > > >> >> > and
> > > > > >> >> > > > > > > > inter.broker.listener.name consistently. To
> me,
> > > the
> > > > > >> former
> > > > > >> >> > seems
> > > > > >> >> > > > > > > slightly
> > > > > >> >> > > > > > > > better.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> > > should
> > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > >> >> > > > > > > > be controller.listener.name?
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Thanks,
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Jun
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > > > >> >> > > lucasatucla@gmail.com>
> > > > > >> >> > > > > > > wrote:
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit
> > since
> > > > your
> > > > > >> +1.
> > > > > >> >> > > > > > > > > Can you please take another review? Thanks a
> > lot
> > > > for
> > > > > >> your
> > > > > >> >> > time!
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > Lucas
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy
> <
> > > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > > >> >> > > > > > > > wrote:
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > > introduce
> > > > > the
> > > > > >> >> > > condition
> > > > > >> >> > > > > > > > variables
> > > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's
> an
> > > > > >> >> implementation
> > > > > >> >> > > > > detail
> > > > > >> >> > > > > > > that
> > > > > >> >> > > > > > > > > we
> > > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas
> Wang
> > <
> > > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > > >> >> > > > > > > > > > wrote:
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > > Hi Jun,
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > > > I agree by using the conditional
> variables,
> > > > there
> > > > > >> is
> > > > > >> >> no
> > > > > >> >> > > need
> > > > > >> >> > > > to
> > > > > >> >> > > > > > add
> > > > > >> >> > > > > > > > > such
> > > > > >> >> > > > > > > > > > a
> > > > > >> >> > > > > > > > > > > new config.
> > > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > > > Lucas
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Joel Koshy <jj...@gmail.com>.
+1
Thanks for the updated KIP.

On Tue, Oct 9, 2018 at 3:28 PM Lucas Wang <lu...@gmail.com> wrote:

> Thanks Jun, I've updated the KIP with the new names.
>
> Hi Joel, Becket, Dong, Ismael,
> Since you've reviewed this KIP in the past, can you please review it again?
> Thanks a lot!
>
> Lucas
>
> On Mon, Oct 8, 2018 at 6:10 PM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Lucas,
>>
>> Yes, the new names sound good to me.
>>
>> Thanks,
>>
>> Jun
>>
>> On Fri, Oct 5, 2018 at 1:12 PM, Lucas Wang <lu...@gmail.com> wrote:
>>
>> > Thanks for the suggestion, Ismael. I like it.
>> >
>> > Jun,
>> > I'm excited to get the +1, thanks a lot!
>> > Meanwhile what do you feel about renaming the metrics and config to
>> >
>> > ControlPlaneRequestQueueSize
>> >
>> > ControlPlaneNetworkProcessorIdlePercent
>> >
>> > ControlPlaneRequestHandlerIdlePercent
>> >
>> > control.plane.listener.name
>> >
>> > ?
>> >
>> >
>> > Thanks,
>> >
>> > Lucas
>> >
>> > On Thu, Oct 4, 2018 at 11:38 AM Ismael Juma <is...@gmail.com> wrote:
>> >
>> > > Have we considered control plane if we think control by itself is
>> > > ambiguous? I agree with the original concern that "controller" may be
>> > > confusing for something that affects all brokers.
>> > >
>> > > Ismael
>> > >
>> > >
>> > > On 4 Oct 2018 11:08 am, "Lucas Wang" <lu...@gmail.com> wrote:
>> > >
>> > > Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
>> > > Please take a look again when you have time.
>> > >
>> > > Regards,
>> > > Lucas
>> > >
>> > >
>> > > On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:
>> > >
>> > > > Hi, Lucas,
>> > > >
>> > > > 200. That's a valid concern. So, we can probably just keep the
>> current
>> > > > name.
>> > > >
>> > > > 201. I am thinking that you would upgrade in the same way as
>> changing
>> > > > inter.broker.listener.name. This requires 2 rounds of rolling
>> restart.
>> > > In
>> > > > the first round, we add the controller endpoint to the listeners w/o
>> > > > setting controller.listener.name. In the second round, every broker
>> > sets
>> > > > controller.listener.name. At that point, the controller listener is
>> > > ready
>> > > > in every broker.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Thanks for the further comments, Jun.
>> > > > >
>> > > > > 200. Currently in the code base, we have the term of
>> "ControlBatch"
>> > > > related
>> > > > > to
>> > > > > idempotent/transactional producing. Do you think it's a concern
>> for
>> > > > reusing
>> > > > > the term "control"?
>> > > > >
>> > > > > 201. It's not clear to me how it would work by following the same
>> > > > strategy
>> > > > > for "controller.listener.name".
>> > > > > Say the new controller has its "controller.listener.name" set to
>> the
>> > > > value
>> > > > > "CONTROLLER", and broker 1
>> > > > > has picked up this KIP by announcing
>> > > > > "endpoints": [
>> > > > >         "CONTROLLER://broker1.example.com:9091",
>> > > > >         "INTERNAL://broker1.example.com:9092",
>> > > > >         "EXTERNAL://host1.example.com:9093"
>> > > > >     ],
>> > > > >
>> > > > > while broker2 has not picked up the change, and is announcing
>> > > > > "endpoints": [
>> > > > >         "INTERNAL://broker2.example.com:9092",
>> > > > >         "EXTERNAL://host2.example.com:9093"
>> > > > >     ],
>> > > > > to support both broker 1 for the new behavior and broker 2 for the
>> > old
>> > > > > behavior, it seems the controller must
>> > > > > check their published endpoints. Am I missing something?
>> > > > >
>> > > > > Thanks!
>> > > > > Lucas
>> > > > >
>> > > > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
>> > > > >
>> > > > > > Hi, Lucas,
>> > > > > >
>> > > > > > Sorry for the delay. The updated wiki looks good to me overall.
>> > Just
>> > > a
>> > > > > > couple more minor comments.
>> > > > > >
>> > > > > > 200.
>> > > kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
>> > > > > The
>> > > > > > name ControllerRequestQueueSize gives the impression that it's
>> only
>> > > for
>> > > > > the
>> > > > > > controller broker. Perhaps we can just rename all metrics and
>> > configs
>> > > > > from
>> > > > > > controller to control. This indicates that the threads and the
>> > queues
>> > > > are
>> > > > > > for the control requests (as oppose to data requests).
>> > > > > >
>> > > > > > 201. "<new controller, old broker>: In this scenario, the
>> > controller
>> > > > will
>> > > > > > have the "controller.listener.name" config set to a value like
>> > > > > > "CONTROLLER", however the broker's exposed endpoints do not
>> have an
>> > > > entry
>> > > > > > corresponding to the new listener name. Hence the controller
>> should
>> > > > > > preserve the existing behavior by determining the endpoint using
>> > > > > > *inter-broker-listener-name *value. The end result should be the
>> > same
>> > > > > > behavior as today." Currently, the controller makes connections
>> > based
>> > > > on
>> > > > > > its local inter.broker.listener.name config without checking
>> the
>> > > > target
>> > > > > > broker's ZK registration. For consistency, perhaps we can just
>> > follow
>> > > > the
>> > > > > > same strategy for controller.listener.name. This existing
>> behavior
>> > > > seems
>> > > > > > simpler to understand and has the benefit of catching
>> inconsistent
>> > > > > configs
>> > > > > > across brokers.
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <
>> lucasatucla@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi Jun,
>> > > > > > >
>> > > > > > > Sorry to bother you again. Can you please take a look at the
>> wiki
>> > > > again
>> > > > > > > when you have time?
>> > > > > > >
>> > > > > > > Thanks a lot!
>> > > > > > > Lucas
>> > > > > > >
>> > > > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <
>> > lucasatucla@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Jun,
>> > > > > > > >
>> > > > > > > > Thanks a lot for the detailed explanation.
>> > > > > > > > I've restored the wiki to a previous version that does not
>> > > require
>> > > > > > config
>> > > > > > > > changes,
>> > > > > > > > and keeps the current behavior with the proposed changes
>> turned
>> > > off
>> > > > > by
>> > > > > > > > default.
>> > > > > > > > I'd appreciate it if you can review it again.
>> > > > > > > >
>> > > > > > > > Thanks!
>> > > > > > > > Lucas
>> > > > > > > >
>> > > > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io>
>> > > wrote:
>> > > > > > > >
>> > > > > > > >> Hi, Lucas,
>> > > > > > > >>
>> > > > > > > >> When upgrading to a minor release, I think the expectation
>> is
>> > > > that a
>> > > > > > > user
>> > > > > > > >> wouldn't need to make any config changes, other than the
>> usual
>> > > > > > > >> inter.broker.protocol. If we require other config changes
>> > during
>> > > > an
>> > > > > > > >> upgrade, then it's probably better to do that in a major
>> > > release.
>> > > > > > > >>
>> > > > > > > >> Regarding your proposal, I think removing
>> host/advertised_host
>> > > in
>> > > > > > favor
>> > > > > > > of
>> > > > > > > >> listeners:advertised_listeners seems useful regardless of
>> this
>> > > > KIP.
>> > > > > > > >> However, that can probably wait until a major release.
>> > > > > > > >>
>> > > > > > > >> As for the controller listener, I am not sure if one has to
>> > set
>> > > > it.
>> > > > > To
>> > > > > > > >> make
>> > > > > > > >> a cluster healthy, one sort of have to make sure that the
>> > > request
>> > > > > > queue
>> > > > > > > is
>> > > > > > > >> never full and no request will be sitting in the request
>> queue
>> > > for
>> > > > > > long.
>> > > > > > > >> If
>> > > > > > > >> one does that, setting the controller listener may not be
>> > > > necessary.
>> > > > > > On
>> > > > > > > >> the
>> > > > > > > >> flip side, even if one sets the controller listener, but
>> the
>> > > > request
>> > > > > > > queue
>> > > > > > > >> and the request time for the data part are still high, the
>> > > cluster
>> > > > > may
>> > > > > > > >> still not be healthy. Given that we have already started
>> the
>> > 2.1
>> > > > > > release
>> > > > > > > >> planning, perhaps we can start with not requiring the
>> > controller
>> > > > > > > listener.
>> > > > > > > >> If this is indeed something that everyone wants to set, we
>> can
>> > > > make
>> > > > > > it a
>> > > > > > > >> required config in a major release.
>> > > > > > > >>
>> > > > > > > >> Thanks,
>> > > > > > > >>
>> > > > > > > >> Jun
>> > > > > > > >>
>> > > > > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
>> > > > lucasatucla@gmail.com>
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >> > @Jun Rao <ju...@confluent.io>
>> > > > > > > >> >
>> > > > > > > >> > I made the recent config changes after thinking about the
>> > > > default
>> > > > > > > >> behavior
>> > > > > > > >> > for adopting this KIP.
>> > > > > > > >> > I think there are basically two options:
>> > > > > > > >> > 1. By default, the behavior proposed in this KIP is
>> turned
>> > > off,
>> > > > > and
>> > > > > > > >> > operators can turn it
>> > > > > > > >> > on by adding the "controller.listener.name" config and
>> > > entries
>> > > > in
>> > > > > > the
>> > > > > > > >> > "listeners" and "advertised.listeners" list.
>> > > > > > > >> > If no "controller.listener.name" is added, it'll be the
>> > *same
>> > > > as*
>> > > > > > > the "
>> > > > > > > >> > inter.broker.listener.name",
>> > > > > > > >> > and the proposed feature is effectively turned off.
>> > > > > > > >> > This has been the assumption in the KIP writeup before.
>> > > > > > > >> >
>> > > > > > > >> > 2. By default, the behavior proposed in this KIP is
>> turned
>> > on,
>> > > > and
>> > > > > > > >> > operators are forced to
>> > > > > > > >> > recognize the proposed change if their "listeners"
>> config is
>> > > set
>> > > > > > (this
>> > > > > > > >> is
>> > > > > > > >> > most likely in production environments),
>> > > > > > > >> > by allocating a new port for controller connections, and
>> > > adding
>> > > > a
>> > > > > > new
>> > > > > > > >> > endpoint to the "listeners" config.
>> > > > > > > >> > For cases where "listeners" is not set explicitly,
>> > > > > > > >> > there needs to be a default value for it that includes
>> the
>> > > > > > controller
>> > > > > > > >> > listener name,
>> > > > > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
>> > > > > > > >> >
>> > > > > > > >> > I chose to go with option 2 since as author of this KIP,
>> > > > > > > >> > I naturally think in the long run, it's worth the effort
>> to
>> > > > adopt
>> > > > > > this
>> > > > > > > >> > feature,
>> > > > > > > >> > in order to prevent issues under circumstances listed in
>> the
>> > > > > > > motivation
>> > > > > > > >> > section.
>> > > > > > > >> >
>> > > > > > > >> > 100, following the argument above, I want to enforce the
>> > > > > separation
>> > > > > > > >> > between controller
>> > > > > > > >> > and data plane requests. Hence the "
>> > controller.listener.name"
>> > > > > > should
>> > > > > > > >> > never be the same
>> > > > > > > >> > as the "inter.broker.listener.name", which is intended
>> for
>> > > data
>> > > > > > plane
>> > > > > > > >> > requests.
>> > > > > > > >> >
>> > > > > > > >> > 101, the default value for "listeners" will be
>> > > > > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
>> > > > > > > >> > making values of "host", and "port" not being used under
>> any
>> > > > > config
>> > > > > > > >> > settings.
>> > > > > > > >> > And the default value for "advertised.listeners" will be
>> > > derived
>> > > > > > from
>> > > > > > > >> > "listeners",
>> > > > > > > >> > making the values of "advertised.host", and
>> > "advertised.port"
>> > > > not
>> > > > > > > being
>> > > > > > > >> > used any more.
>> > > > > > > >> >
>> > > > > > > >> > 102, for upgrading, a single broker that has "listeners"
>> > > and/or
>> > > > > > > >> > "advertised.listeners" set,
>> > > > > > > >> > must add a new endpoint for the CONTROLLER listener
>> name, or
>> > > end
>> > > > > up
>> > > > > > > >> using
>> > > > > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER:
>> > //:9091".
>> > > > > > > >> > During rolling upgrade, in cases of <old controller> +
>> <new
>> > > > > broker>
>> > > > > > or
>> > > > > > > >> > <new controller>  + <old broker>
>> > > > > > > >> > we still need to fall back to the current behavior.
>> However
>> > > > after
>> > > > > > the
>> > > > > > > >> > rolling upgrade is done,
>> > > > > > > >> > it is guaranteed that the controller plane and data plane
>> > are
>> > > > > > > separated,
>> > > > > > > >> > given
>> > > > > > > >> > the "controller.listener.name" must be different from "
>> > > > > > > >> > inter.broker.listener.name".
>> > > > > > > >> >
>> > > > > > > >> > @Ismael Juma <is...@juma.me.uk>
>> > > > > > > >> > Thanks for pointing that out. I did not know that.
>> > > > > > > >> > However my question is if the argument above makes sense,
>> > and
>> > > my
>> > > > > > code
>> > > > > > > >> > change
>> > > > > > > >> > causes the configs "host", "port", "advertised.host",
>> > > > > > > "advertised.port"
>> > > > > > > >> to
>> > > > > > > >> > be not used under any circumstance,
>> > > > > > > >> > then it's no different from removing them.
>> > > > > > > >> > Anyway if there is still a concern about removing them,
>> is
>> > > > there a
>> > > > > > new
>> > > > > > > >> > major new version
>> > > > > > > >> > now or in the future where I can remove them?
>> > > > > > > >> >
>> > > > > > > >> > Thanks!
>> > > > > > > >> > Lucas
>> > > > > > > >> >
>> > > > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <
>> > > ismael@juma.me.uk>
>> > > > > > > wrote:
>> > > > > > > >> >
>> > > > > > > >> >> To be clear, we can only remove configs in major new
>> > > versions.
>> > > > > > > >> Otherwise,
>> > > > > > > >> >> we can only deprecate them.
>> > > > > > > >> >>
>> > > > > > > >> >> Ismael
>> > > > > > > >> >>
>> > > > > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <
>> jun@confluent.io
>> > >
>> > > > > wrote:
>> > > > > > > >> >>
>> > > > > > > >> >> > Hi, Lucas,
>> > > > > > > >> >> >
>> > > > > > > >> >> > For the network idlePct, your understanding is
>> correct.
>> > > > > > Currently,
>> > > > > > > >> >> > networkIdlePct metric is calculated as the average of
>> (1
>> > -
>> > > > > > > io-ratio)
>> > > > > > > >> in
>> > > > > > > >> >> the
>> > > > > > > >> >> > selector of all network threads.
>> > > > > > > >> >> >
>> > > > > > > >> >> > The metrics part looks good to me in the updated KIP.
>> > > > > > > >> >> >
>> > > > > > > >> >> > I am not still not quite sure about the configs.
>> > > > > > > >> >> >
>> > > > > > > >> >> > 100. "Whenever the "controller.listener.name" is set,
>> > upon
>> > > > > > broker
>> > > > > > > >> >> startup,
>> > > > > > > >> >> > we will validate its value and make sure it's
>> different
>> > > from
>> > > > > the
>> > > > > > > >> >> > *inter-broker-listener-name *value." Does that mean
>> that
>> > > > > > > >> >> > controller.listener.name has to be different from
>> > > > > > > >> >> > inter.broker.listener.name?
>> > > > > > > >> >> > That seems limiting.
>> > > > > > > >> >> >
>> > > > > > > >> >> > 101. The KIP says that advertised.listeners and
>> listeners
>> > > > will
>> > > > > > now
>> > > > > > > >> have
>> > > > > > > >> >> a
>> > > > > > > >> >> > different default value including controller. Could
>> you
>> > > > > document
>> > > > > > > what
>> > > > > > > >> >> the
>> > > > > > > >> >> > default value looks like?
>> > > > > > > >> >> >
>> > > > > > > >> >> > 102. About removing the the following configs. How
>> does
>> > > that
>> > > > > > affect
>> > > > > > > >> the
>> > > > > > > >> >> > upgrade path? Do we now expect a user to add a new
>> config
>> > > > when
>> > > > > > > >> upgrading
>> > > > > > > >> >> > from an old version to a new one?
>> > > > > > > >> >> > host
>> > > > > > > >> >> > port
>> > > > > > > >> >> > advertised.host
>> > > > > > > >> >> > advertised.port
>> > > > > > > >> >> >
>> > > > > > > >> >> > Thanks,
>> > > > > > > >> >> >
>> > > > > > > >> >> > Jun
>> > > > > > > >> >> >
>> > > > > > > >> >> >
>> > > > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
>> > > > > > lucasatucla@gmail.com>
>> > > > > > > >> >> wrote:
>> > > > > > > >> >> >
>> > > > > > > >> >> > > @Jun Rao <ju...@confluent.io>
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > One clarification, currently on the selector level,
>> we
>> > > have
>> > > > > the
>> > > > > > > >> >> > > io-wait-ratio metric.
>> > > > > > > >> >> > > For the new controller *network* thread, we can use
>> it
>> > > > > directly
>> > > > > > > for
>> > > > > > > >> >> > > IdlePct, instead of using 1- io-ratio,
>> > > > > > > >> >> > > so that the logic is similar to the current average
>> > > IdlePct
>> > > > > for
>> > > > > > > >> >> network
>> > > > > > > >> >> > > threads. Is that correct?
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > I've revised the KIP by adding two new metrics for
>> > > > measuring
>> > > > > > the
>> > > > > > > >> >> IdlePct
>> > > > > > > >> >> > > for the two additional threads.
>> > > > > > > >> >> > > Please take a look again. Thanks!
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > Lucas
>> > > > > > > >> >> > >
>> > > > > > > >> >> > >
>> > > > > > > >> >> > >
>> > > > > > > >> >> > >
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <
>> > jun@confluent.io
>> > > >
>> > > > > > wrote:
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > > Hi, Lucas,
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > Thanks for the updated KIP.
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > For monitoring the network thread utilization for
>> the
>> > > > > control
>> > > > > > > >> >> plane, we
>> > > > > > > >> >> > > > already have the metric io-ratio at the selector
>> > level
>> > > > > > (idlePct
>> > > > > > > >> is
>> > > > > > > >> >> 1 -
>> > > > > > > >> >> > > > io-ratio). So, we just need to give that selector
>> a
>> > > > > > meaningful
>> > > > > > > >> name.
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > For monitoring the io thread utilization for the
>> > > control
>> > > > > > plane,
>> > > > > > > >> it's
>> > > > > > > >> >> > > > probably useful to have a separate metric for
>> that.
>> > The
>> > > > > > > >> controller
>> > > > > > > >> >> > > request
>> > > > > > > >> >> > > > queue size may not reflect the history in a
>> window.
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > Jun
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
>> > > > > > > >> lucasatucla@gmail.com>
>> > > > > > > >> >> > > wrote:
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > > > > Thanks Jun for your quick response. It looks
>> like I
>> > > > > forgot
>> > > > > > to
>> > > > > > > >> >> click
>> > > > > > > >> >> > the
>> > > > > > > >> >> > > > > "Update" button, :)
>> > > > > > > >> >> > > > > It's updated now.
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > > > Regarding the idle ratio metrics for the
>> additional
>> > > > > > threads,
>> > > > > > > I
>> > > > > > > >> >> > > discussed
>> > > > > > > >> >> > > > > with Joel,
>> > > > > > > >> >> > > > > and think they are not as useful, and I added
>> our
>> > > > > reasoning
>> > > > > > > in
>> > > > > > > >> the
>> > > > > > > >> >> > last
>> > > > > > > >> >> > > > > paragraph of the
>> > > > > > > >> >> > > > > "How are controller requests handled over the
>> > > dedicated
>> > > > > > > >> >> connections?"
>> > > > > > > >> >> > > > > section.
>> > > > > > > >> >> > > > > On the other hand, we don't strongly oppose
>> adding
>> > > them
>> > > > > if
>> > > > > > > you
>> > > > > > > >> >> think
>> > > > > > > >> >> > > they
>> > > > > > > >> >> > > > > are necessary.
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > > > Thanks,
>> > > > > > > >> >> > > > > Lucas
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
>> > > > jun@confluent.io
>> > > > > >
>> > > > > > > >> wrote:
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > > > > Hi, Lucas,
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > > > Thanks for the reply. Have you actually
>> updated
>> > the
>> > > > > KIP?
>> > > > > > > The
>> > > > > > > >> >> wiki
>> > > > > > > >> >> > > says
>> > > > > > > >> >> > > > > that
>> > > > > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
>> > > changes
>> > > > > > that
>> > > > > > > >> you
>> > > > > > > >> >> > > > mentioned
>> > > > > > > >> >> > > > > > (#1 and #3) are not there.
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > > > Also, regarding Joel's comment on
>> network/request
>> > > > idle
>> > > > > > > ratio
>> > > > > > > >> >> > metrics,
>> > > > > > > >> >> > > > > could
>> > > > > > > >> >> > > > > > you comment on whether they include the new
>> > > > controller
>> > > > > > > >> >> listener? If
>> > > > > > > >> >> > > > not,
>> > > > > > > >> >> > > > > do
>> > > > > > > >> >> > > > > > we need additional metrics to measure the
>> > > utilization
>> > > > > of
>> > > > > > > the
>> > > > > > > >> io
>> > > > > > > >> >> > > thread
>> > > > > > > >> >> > > > > for
>> > > > > > > >> >> > > > > > the control plane?
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > > > Jun
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
>> > > > > > > >> >> lucasatucla@gmail.com
>> > > > > > > >> >> > >
>> > > > > > > >> >> > > > > wrote:
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > > > > Thanks for the comments, Jun.
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > 1. I think the answer should be no, since
>> the "
>> > > > > > > >> >> > > > > > inter.broker.listener.name"
>> > > > > > > >> >> > > > > > > are also used
>> > > > > > > >> >> > > > > > > for replication traffic, and merging these
>> two
>> > > > types
>> > > > > of
>> > > > > > > >> >> request
>> > > > > > > >> >> > to
>> > > > > > > >> >> > > > the
>> > > > > > > >> >> > > > > > > single threaded tunnel
>> > > > > > > >> >> > > > > > > would defeat the purpose of this KIP and
>> also
>> > > hurt
>> > > > > > > >> replication
>> > > > > > > >> >> > > > > > throughput.
>> > > > > > > >> >> > > > > > > So I think that means
>> > > > > > > >> >> > > > > > > we should validate to make sure when the new
>> > > config
>> > > > > is
>> > > > > > > set,
>> > > > > > > >> >> it's
>> > > > > > > >> >> > > > > > different
>> > > > > > > >> >> > > > > > > from "inter.broker.listener.name"
>> > > > > > > >> >> > > > > > > or "security.inter.broker.protocol",
>> > whichever is
>> > > > > set.
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > 2. Normally all broker configs in a given
>> > cluster
>> > > > are
>> > > > > > > >> changed
>> > > > > > > >> >> at
>> > > > > > > >> >> > > the
>> > > > > > > >> >> > > > > same
>> > > > > > > >> >> > > > > > > time. If there is a typo in the
>> > > > > > > >> >> > > > > > > controller.listener.name and it's not
>> > available
>> > > in
>> > > > > the
>> > > > > > > >> >> endpoints
>> > > > > > > >> >> > > > list,
>> > > > > > > >> >> > > > > > we
>> > > > > > > >> >> > > > > > > could catch it, give an error
>> > > > > > > >> >> > > > > > > and block restart of the first broker in the
>> > > > cluster.
>> > > > > > > With
>> > > > > > > >> >> that,
>> > > > > > > >> >> > we
>> > > > > > > >> >> > > > > could
>> > > > > > > >> >> > > > > > > keep the current behavior
>> > > > > > > >> >> > > > > > > in the KIP write up that falls back to
>> > > > > > > >> >> > "inter.broker.listener.nam"
>> > > > > > > >> >> > > > when
>> > > > > > > >> >> > > > > > the
>> > > > > > > >> >> > > > > > > "controller.listener.name"
>> > > > > > > >> >> > > > > > > is not found during the migration phase of
>> this
>> > > > KIP.
>> > > > > > > >> Thoughts?
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > Thanks,
>> > > > > > > >> >> > > > > > > Lucas
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
>> > > > > > > jun@confluent.io>
>> > > > > > > >> >> > wrote:
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > > > > Hi, Lucas,
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > Sorry for the delay. The new proposal
>> looks
>> > > good
>> > > > to
>> > > > > > me
>> > > > > > > >> >> > overall. A
>> > > > > > > >> >> > > > few
>> > > > > > > >> >> > > > > > > minor
>> > > > > > > >> >> > > > > > > > comments below.
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > 1. It's possible that
>> > > > listener.name.for.controller
>> > > > > is
>> > > > > > > >> set,
>> > > > > > > >> >> but
>> > > > > > > >> >> > > set
>> > > > > > > >> >> > > > to
>> > > > > > > >> >> > > > > > the
>> > > > > > > >> >> > > > > > > > same value as inter.broker.listener.name.
>> In
>> > > > that
>> > > > > > > case,
>> > > > > > > >> >> should
>> > > > > > > >> >> > > we
>> > > > > > > >> >> > > > > > have a
>> > > > > > > >> >> > > > > > > > single network thread and the request
>> > handling
>> > > > > thread
>> > > > > > > for
>> > > > > > > >> >> that
>> > > > > > > >> >> > > > > > listener?
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > 2. Currently, the controller always picks
>> the
>> > > > > > listener
>> > > > > > > >> >> > specified
>> > > > > > > >> >> > > by
>> > > > > > > >> >> > > > > > > > inter.broker.listener.name even if the
>> > > listener
>> > > > > name
>> > > > > > > is
>> > > > > > > >> not
>> > > > > > > >> >> > > > present
>> > > > > > > >> >> > > > > in
>> > > > > > > >> >> > > > > > > the
>> > > > > > > >> >> > > > > > > > receiving broker. This KIP proposes a
>> > slightly
>> > > > > > > different
>> > > > > > > >> >> > approach
>> > > > > > > >> >> > > > for
>> > > > > > > >> >> > > > > > > > picking listener.name.for.controller only
>> > when
>> > > > the
>> > > > > > > >> receiving
>> > > > > > > >> >> > end
>> > > > > > > >> >> > > > has
>> > > > > > > >> >> > > > > > the
>> > > > > > > >> >> > > > > > > > listener and switches
>> > > > listener.name.for.controller
>> > > > > > > >> >> otherwise.
>> > > > > > > >> >> > > There
>> > > > > > > >> >> > > > > are
>> > > > > > > >> >> > > > > > > > some tradeoffs between the two
>> approaches. To
>> > > > > change
>> > > > > > > the
>> > > > > > > >> >> inter
>> > > > > > > >> >> > > > broker
>> > > > > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
>> > > adding
>> > > > > the
>> > > > > > > new
>> > > > > > > >> >> > > listener
>> > > > > > > >> >> > > > to
>> > > > > > > >> >> > > > > > > > listener list in every broker and (2)
>> > changing
>> > > > > > > >> >> > > > > > > > listener.name.for.controller.
>> > > > > > > >> >> > > > > > > > The latter can do both changes in 1 step.
>> On
>> > > the
>> > > > > > hand,
>> > > > > > > if
>> > > > > > > >> >> > > > > > > > listener.name.for.controller
>> > > > > > > >> >> > > > > > > > is mis-configured, the former will report
>> an
>> > > > error
>> > > > > > and
>> > > > > > > >> the
>> > > > > > > >> >> > latter
>> > > > > > > >> >> > > > > will
>> > > > > > > >> >> > > > > > > hide
>> > > > > > > >> >> > > > > > > > it (so the user may not know the
>> > > > misconfiguration).
>> > > > > > It
>> > > > > > > >> seems
>> > > > > > > >> >> > that
>> > > > > > > >> >> > > > we
>> > > > > > > >> >> > > > > > > should
>> > > > > > > >> >> > > > > > > > pick one approach to handle both
>> > > > > > > >> >> listener.name.for.controller
>> > > > > > > >> >> > and
>> > > > > > > >> >> > > > > > > > inter.broker.listener.name consistently.
>> To
>> > > me,
>> > > > > the
>> > > > > > > >> former
>> > > > > > > >> >> > seems
>> > > > > > > >> >> > > > > > > slightly
>> > > > > > > >> >> > > > > > > > better.
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > 3. To be consistent with the existing
>> naming,
>> > > > > should
>> > > > > > > >> >> > > > > > > > listener.name.for.controller
>> > > > > > > >> >> > > > > > > > be controller.listener.name?
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > Thanks,
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > Jun
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas
>> Wang <
>> > > > > > > >> >> > > lucasatucla@gmail.com>
>> > > > > > > >> >> > > > > > > wrote:
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > > > > Hi Jun and Joel,
>> > > > > > > >> >> > > > > > > > >
>> > > > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a
>> bit
>> > > > since
>> > > > > > your
>> > > > > > > >> +1.
>> > > > > > > >> >> > > > > > > > > Can you please take another review?
>> Thanks
>> > a
>> > > > lot
>> > > > > > for
>> > > > > > > >> your
>> > > > > > > >> >> > time!
>> > > > > > > >> >> > > > > > > > >
>> > > > > > > >> >> > > > > > > > > Lucas
>> > > > > > > >> >> > > > > > > > >
>> > > > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel
>> > Koshy
>> > > <
>> > > > > > > >> >> > > > jjkoshy.w@gmail.com>
>> > > > > > > >> >> > > > > > > > wrote:
>> > > > > > > >> >> > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > +1 on the KIP.
>> > > > > > > >> >> > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
>> > > > > introduce
>> > > > > > > the
>> > > > > > > >> >> > > condition
>> > > > > > > >> >> > > > > > > > variables
>> > > > > > > >> >> > > > > > > > > > for the concern that Jun raised, but
>> it's
>> > > an
>> > > > > > > >> >> implementation
>> > > > > > > >> >> > > > > detail
>> > > > > > > >> >> > > > > > > that
>> > > > > > > >> >> > > > > > > > > we
>> > > > > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
>> > > > > > > >> >> > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM,
>> Lucas
>> > > Wang
>> > > > <
>> > > > > > > >> >> > > > > > lucasatucla@gmail.com>
>> > > > > > > >> >> > > > > > > > > > wrote:
>> > > > > > > >> >> > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > > Hi Jun,
>> > > > > > > >> >> > > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > > I agree by using the conditional
>> > > variables,
>> > > > > > there
>> > > > > > > >> is
>> > > > > > > >> >> no
>> > > > > > > >> >> > > need
>> > > > > > > >> >> > > > to
>> > > > > > > >> >> > > > > > add
>> > > > > > > >> >> > > > > > > > > such
>> > > > > > > >> >> > > > > > > > > > a
>> > > > > > > >> >> > > > > > > > > > > new config.
>> > > > > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
>> > > > > > > >> >> > > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > > > Lucas
>> > > > > > > >> >> > > > > > > > > > >
>> > > > > > > >> >> > > > > > > > > >
>> > > > > > > >> >> > > > > > > > >
>> > > > > > > >> >> > > > > > > >
>> > > > > > > >> >> > > > > > >
>> > > > > > > >> >> > > > > >
>> > > > > > > >> >> > > > >
>> > > > > > > >> >> > > >
>> > > > > > > >> >> > >
>> > > > > > > >> >> >
>> > > > > > > >> >>
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Thanks Jun, I've updated the KIP with the new names.

Hi Joel, Becket, Dong, Ismael,
Since you've reviewed this KIP in the past, can you please review it again?
Thanks a lot!

Lucas

On Mon, Oct 8, 2018 at 6:10 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> Yes, the new names sound good to me.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 5, 2018 at 1:12 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the suggestion, Ismael. I like it.
> >
> > Jun,
> > I'm excited to get the +1, thanks a lot!
> > Meanwhile what do you feel about renaming the metrics and config to
> >
> > ControlPlaneRequestQueueSize
> >
> > ControlPlaneNetworkProcessorIdlePercent
> >
> > ControlPlaneRequestHandlerIdlePercent
> >
> > control.plane.listener.name
> >
> > ?
> >
> >
> > Thanks,
> >
> > Lucas
> >
> > On Thu, Oct 4, 2018 at 11:38 AM Ismael Juma <is...@gmail.com> wrote:
> >
> > > Have we considered control plane if we think control by itself is
> > > ambiguous? I agree with the original concern that "controller" may be
> > > confusing for something that affects all brokers.
> > >
> > > Ismael
> > >
> > >
> > > On 4 Oct 2018 11:08 am, "Lucas Wang" <lu...@gmail.com> wrote:
> > >
> > > Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
> > > Please take a look again when you have time.
> > >
> > > Regards,
> > > Lucas
> > >
> > >
> > > On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > 200. That's a valid concern. So, we can probably just keep the
> current
> > > > name.
> > > >
> > > > 201. I am thinking that you would upgrade in the same way as changing
> > > > inter.broker.listener.name. This requires 2 rounds of rolling
> restart.
> > > In
> > > > the first round, we add the controller endpoint to the listeners w/o
> > > > setting controller.listener.name. In the second round, every broker
> > sets
> > > > controller.listener.name. At that point, the controller listener is
> > > ready
> > > > in every broker.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for the further comments, Jun.
> > > > >
> > > > > 200. Currently in the code base, we have the term of "ControlBatch"
> > > > related
> > > > > to
> > > > > idempotent/transactional producing. Do you think it's a concern for
> > > > reusing
> > > > > the term "control"?
> > > > >
> > > > > 201. It's not clear to me how it would work by following the same
> > > > strategy
> > > > > for "controller.listener.name".
> > > > > Say the new controller has its "controller.listener.name" set to
> the
> > > > value
> > > > > "CONTROLLER", and broker 1
> > > > > has picked up this KIP by announcing
> > > > > "endpoints": [
> > > > >         "CONTROLLER://broker1.example.com:9091",
> > > > >         "INTERNAL://broker1.example.com:9092",
> > > > >         "EXTERNAL://host1.example.com:9093"
> > > > >     ],
> > > > >
> > > > > while broker2 has not picked up the change, and is announcing
> > > > > "endpoints": [
> > > > >         "INTERNAL://broker2.example.com:9092",
> > > > >         "EXTERNAL://host2.example.com:9093"
> > > > >     ],
> > > > > to support both broker 1 for the new behavior and broker 2 for the
> > old
> > > > > behavior, it seems the controller must
> > > > > check their published endpoints. Am I missing something?
> > > > >
> > > > > Thanks!
> > > > > Lucas
> > > > >
> > > > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Lucas,
> > > > > >
> > > > > > Sorry for the delay. The updated wiki looks good to me overall.
> > Just
> > > a
> > > > > > couple more minor comments.
> > > > > >
> > > > > > 200.
> > > kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> > > > > The
> > > > > > name ControllerRequestQueueSize gives the impression that it's
> only
> > > for
> > > > > the
> > > > > > controller broker. Perhaps we can just rename all metrics and
> > configs
> > > > > from
> > > > > > controller to control. This indicates that the threads and the
> > queues
> > > > are
> > > > > > for the control requests (as oppose to data requests).
> > > > > >
> > > > > > 201. "<new controller, old broker>: In this scenario, the
> > controller
> > > > will
> > > > > > have the "controller.listener.name" config set to a value like
> > > > > > "CONTROLLER", however the broker's exposed endpoints do not have
> an
> > > > entry
> > > > > > corresponding to the new listener name. Hence the controller
> should
> > > > > > preserve the existing behavior by determining the endpoint using
> > > > > > *inter-broker-listener-name *value. The end result should be the
> > same
> > > > > > behavior as today." Currently, the controller makes connections
> > based
> > > > on
> > > > > > its local inter.broker.listener.name config without checking the
> > > > target
> > > > > > broker's ZK registration. For consistency, perhaps we can just
> > follow
> > > > the
> > > > > > same strategy for controller.listener.name. This existing
> behavior
> > > > seems
> > > > > > simpler to understand and has the benefit of catching
> inconsistent
> > > > > configs
> > > > > > across brokers.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Sorry to bother you again. Can you please take a look at the
> wiki
> > > > again
> > > > > > > when you have time?
> > > > > > >
> > > > > > > Thanks a lot!
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Thanks a lot for the detailed explanation.
> > > > > > > > I've restored the wiki to a previous version that does not
> > > require
> > > > > > config
> > > > > > > > changes,
> > > > > > > > and keeps the current behavior with the proposed changes
> turned
> > > off
> > > > > by
> > > > > > > > default.
> > > > > > > > I'd appreciate it if you can review it again.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > >> Hi, Lucas,
> > > > > > > >>
> > > > > > > >> When upgrading to a minor release, I think the expectation
> is
> > > > that a
> > > > > > > user
> > > > > > > >> wouldn't need to make any config changes, other than the
> usual
> > > > > > > >> inter.broker.protocol. If we require other config changes
> > during
> > > > an
> > > > > > > >> upgrade, then it's probably better to do that in a major
> > > release.
> > > > > > > >>
> > > > > > > >> Regarding your proposal, I think removing
> host/advertised_host
> > > in
> > > > > > favor
> > > > > > > of
> > > > > > > >> listeners:advertised_listeners seems useful regardless of
> this
> > > > KIP.
> > > > > > > >> However, that can probably wait until a major release.
> > > > > > > >>
> > > > > > > >> As for the controller listener, I am not sure if one has to
> > set
> > > > it.
> > > > > To
> > > > > > > >> make
> > > > > > > >> a cluster healthy, one sort of have to make sure that the
> > > request
> > > > > > queue
> > > > > > > is
> > > > > > > >> never full and no request will be sitting in the request
> queue
> > > for
> > > > > > long.
> > > > > > > >> If
> > > > > > > >> one does that, setting the controller listener may not be
> > > > necessary.
> > > > > > On
> > > > > > > >> the
> > > > > > > >> flip side, even if one sets the controller listener, but the
> > > > request
> > > > > > > queue
> > > > > > > >> and the request time for the data part are still high, the
> > > cluster
> > > > > may
> > > > > > > >> still not be healthy. Given that we have already started the
> > 2.1
> > > > > > release
> > > > > > > >> planning, perhaps we can start with not requiring the
> > controller
> > > > > > > listener.
> > > > > > > >> If this is indeed something that everyone wants to set, we
> can
> > > > make
> > > > > > it a
> > > > > > > >> required config in a major release.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Jun
> > > > > > > >>
> > > > > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > @Jun Rao <ju...@confluent.io>
> > > > > > > >> >
> > > > > > > >> > I made the recent config changes after thinking about the
> > > > default
> > > > > > > >> behavior
> > > > > > > >> > for adopting this KIP.
> > > > > > > >> > I think there are basically two options:
> > > > > > > >> > 1. By default, the behavior proposed in this KIP is turned
> > > off,
> > > > > and
> > > > > > > >> > operators can turn it
> > > > > > > >> > on by adding the "controller.listener.name" config and
> > > entries
> > > > in
> > > > > > the
> > > > > > > >> > "listeners" and "advertised.listeners" list.
> > > > > > > >> > If no "controller.listener.name" is added, it'll be the
> > *same
> > > > as*
> > > > > > > the "
> > > > > > > >> > inter.broker.listener.name",
> > > > > > > >> > and the proposed feature is effectively turned off.
> > > > > > > >> > This has been the assumption in the KIP writeup before.
> > > > > > > >> >
> > > > > > > >> > 2. By default, the behavior proposed in this KIP is turned
> > on,
> > > > and
> > > > > > > >> > operators are forced to
> > > > > > > >> > recognize the proposed change if their "listeners" config
> is
> > > set
> > > > > > (this
> > > > > > > >> is
> > > > > > > >> > most likely in production environments),
> > > > > > > >> > by allocating a new port for controller connections, and
> > > adding
> > > > a
> > > > > > new
> > > > > > > >> > endpoint to the "listeners" config.
> > > > > > > >> > For cases where "listeners" is not set explicitly,
> > > > > > > >> > there needs to be a default value for it that includes the
> > > > > > controller
> > > > > > > >> > listener name,
> > > > > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > > > > >> >
> > > > > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > > > > >> > I naturally think in the long run, it's worth the effort
> to
> > > > adopt
> > > > > > this
> > > > > > > >> > feature,
> > > > > > > >> > in order to prevent issues under circumstances listed in
> the
> > > > > > > motivation
> > > > > > > >> > section.
> > > > > > > >> >
> > > > > > > >> > 100, following the argument above, I want to enforce the
> > > > > separation
> > > > > > > >> > between controller
> > > > > > > >> > and data plane requests. Hence the "
> > controller.listener.name"
> > > > > > should
> > > > > > > >> > never be the same
> > > > > > > >> > as the "inter.broker.listener.name", which is intended
> for
> > > data
> > > > > > plane
> > > > > > > >> > requests.
> > > > > > > >> >
> > > > > > > >> > 101, the default value for "listeners" will be
> > > > > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > > > > >> > making values of "host", and "port" not being used under
> any
> > > > > config
> > > > > > > >> > settings.
> > > > > > > >> > And the default value for "advertised.listeners" will be
> > > derived
> > > > > > from
> > > > > > > >> > "listeners",
> > > > > > > >> > making the values of "advertised.host", and
> > "advertised.port"
> > > > not
> > > > > > > being
> > > > > > > >> > used any more.
> > > > > > > >> >
> > > > > > > >> > 102, for upgrading, a single broker that has "listeners"
> > > and/or
> > > > > > > >> > "advertised.listeners" set,
> > > > > > > >> > must add a new endpoint for the CONTROLLER listener name,
> or
> > > end
> > > > > up
> > > > > > > >> using
> > > > > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER:
> > //:9091".
> > > > > > > >> > During rolling upgrade, in cases of <old controller> +
> <new
> > > > > broker>
> > > > > > or
> > > > > > > >> > <new controller>  + <old broker>
> > > > > > > >> > we still need to fall back to the current behavior.
> However
> > > > after
> > > > > > the
> > > > > > > >> > rolling upgrade is done,
> > > > > > > >> > it is guaranteed that the controller plane and data plane
> > are
> > > > > > > separated,
> > > > > > > >> > given
> > > > > > > >> > the "controller.listener.name" must be different from "
> > > > > > > >> > inter.broker.listener.name".
> > > > > > > >> >
> > > > > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > > > > >> > Thanks for pointing that out. I did not know that.
> > > > > > > >> > However my question is if the argument above makes sense,
> > and
> > > my
> > > > > > code
> > > > > > > >> > change
> > > > > > > >> > causes the configs "host", "port", "advertised.host",
> > > > > > > "advertised.port"
> > > > > > > >> to
> > > > > > > >> > be not used under any circumstance,
> > > > > > > >> > then it's no different from removing them.
> > > > > > > >> > Anyway if there is still a concern about removing them, is
> > > > there a
> > > > > > new
> > > > > > > >> > major new version
> > > > > > > >> > now or in the future where I can remove them?
> > > > > > > >> >
> > > > > > > >> > Thanks!
> > > > > > > >> > Lucas
> > > > > > > >> >
> > > > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <
> > > ismael@juma.me.uk>
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> >> To be clear, we can only remove configs in major new
> > > versions.
> > > > > > > >> Otherwise,
> > > > > > > >> >> we can only deprecate them.
> > > > > > > >> >>
> > > > > > > >> >> Ismael
> > > > > > > >> >>
> > > > > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <
> jun@confluent.io
> > >
> > > > > wrote:
> > > > > > > >> >>
> > > > > > > >> >> > Hi, Lucas,
> > > > > > > >> >> >
> > > > > > > >> >> > For the network idlePct, your understanding is correct.
> > > > > > Currently,
> > > > > > > >> >> > networkIdlePct metric is calculated as the average of
> (1
> > -
> > > > > > > io-ratio)
> > > > > > > >> in
> > > > > > > >> >> the
> > > > > > > >> >> > selector of all network threads.
> > > > > > > >> >> >
> > > > > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > > > > >> >> >
> > > > > > > >> >> > I am not still not quite sure about the configs.
> > > > > > > >> >> >
> > > > > > > >> >> > 100. "Whenever the "controller.listener.name" is set,
> > upon
> > > > > > broker
> > > > > > > >> >> startup,
> > > > > > > >> >> > we will validate its value and make sure it's different
> > > from
> > > > > the
> > > > > > > >> >> > *inter-broker-listener-name *value." Does that mean
> that
> > > > > > > >> >> > controller.listener.name has to be different from
> > > > > > > >> >> > inter.broker.listener.name?
> > > > > > > >> >> > That seems limiting.
> > > > > > > >> >> >
> > > > > > > >> >> > 101. The KIP says that advertised.listeners and
> listeners
> > > > will
> > > > > > now
> > > > > > > >> have
> > > > > > > >> >> a
> > > > > > > >> >> > different default value including controller. Could you
> > > > > document
> > > > > > > what
> > > > > > > >> >> the
> > > > > > > >> >> > default value looks like?
> > > > > > > >> >> >
> > > > > > > >> >> > 102. About removing the the following configs. How does
> > > that
> > > > > > affect
> > > > > > > >> the
> > > > > > > >> >> > upgrade path? Do we now expect a user to add a new
> config
> > > > when
> > > > > > > >> upgrading
> > > > > > > >> >> > from an old version to a new one?
> > > > > > > >> >> > host
> > > > > > > >> >> > port
> > > > > > > >> >> > advertised.host
> > > > > > > >> >> > advertised.port
> > > > > > > >> >> >
> > > > > > > >> >> > Thanks,
> > > > > > > >> >> >
> > > > > > > >> >> > Jun
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > > >> >> wrote:
> > > > > > > >> >> >
> > > > > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > > > > >> >> > >
> > > > > > > >> >> > > One clarification, currently on the selector level,
> we
> > > have
> > > > > the
> > > > > > > >> >> > > io-wait-ratio metric.
> > > > > > > >> >> > > For the new controller *network* thread, we can use
> it
> > > > > directly
> > > > > > > for
> > > > > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > > > > >> >> > > so that the logic is similar to the current average
> > > IdlePct
> > > > > for
> > > > > > > >> >> network
> > > > > > > >> >> > > threads. Is that correct?
> > > > > > > >> >> > >
> > > > > > > >> >> > > I've revised the KIP by adding two new metrics for
> > > > measuring
> > > > > > the
> > > > > > > >> >> IdlePct
> > > > > > > >> >> > > for the two additional threads.
> > > > > > > >> >> > > Please take a look again. Thanks!
> > > > > > > >> >> > >
> > > > > > > >> >> > > Lucas
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <
> > jun@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > >> >> > >
> > > > > > > >> >> > > > Hi, Lucas,
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Thanks for the updated KIP.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > For monitoring the network thread utilization for
> the
> > > > > control
> > > > > > > >> >> plane, we
> > > > > > > >> >> > > > already have the metric io-ratio at the selector
> > level
> > > > > > (idlePct
> > > > > > > >> is
> > > > > > > >> >> 1 -
> > > > > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > > > > meaningful
> > > > > > > >> name.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > For monitoring the io thread utilization for the
> > > control
> > > > > > plane,
> > > > > > > >> it's
> > > > > > > >> >> > > > probably useful to have a separate metric for that.
> > The
> > > > > > > >> controller
> > > > > > > >> >> > > request
> > > > > > > >> >> > > > queue size may not reflect the history in a window.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Jun
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > > > > >> lucasatucla@gmail.com>
> > > > > > > >> >> > > wrote:
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > > Thanks Jun for your quick response. It looks
> like I
> > > > > forgot
> > > > > > to
> > > > > > > >> >> click
> > > > > > > >> >> > the
> > > > > > > >> >> > > > > "Update" button, :)
> > > > > > > >> >> > > > > It's updated now.
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > Regarding the idle ratio metrics for the
> additional
> > > > > > threads,
> > > > > > > I
> > > > > > > >> >> > > discussed
> > > > > > > >> >> > > > > with Joel,
> > > > > > > >> >> > > > > and think they are not as useful, and I added our
> > > > > reasoning
> > > > > > > in
> > > > > > > >> the
> > > > > > > >> >> > last
> > > > > > > >> >> > > > > paragraph of the
> > > > > > > >> >> > > > > "How are controller requests handled over the
> > > dedicated
> > > > > > > >> >> connections?"
> > > > > > > >> >> > > > > section.
> > > > > > > >> >> > > > > On the other hand, we don't strongly oppose
> adding
> > > them
> > > > > if
> > > > > > > you
> > > > > > > >> >> think
> > > > > > > >> >> > > they
> > > > > > > >> >> > > > > are necessary.
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > Thanks,
> > > > > > > >> >> > > > > Lucas
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> > > > jun@confluent.io
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > > Hi, Lucas,
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > Thanks for the reply. Have you actually updated
> > the
> > > > > KIP?
> > > > > > > The
> > > > > > > >> >> wiki
> > > > > > > >> >> > > says
> > > > > > > >> >> > > > > that
> > > > > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
> > > changes
> > > > > > that
> > > > > > > >> you
> > > > > > > >> >> > > > mentioned
> > > > > > > >> >> > > > > > (#1 and #3) are not there.
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > Also, regarding Joel's comment on
> network/request
> > > > idle
> > > > > > > ratio
> > > > > > > >> >> > metrics,
> > > > > > > >> >> > > > > could
> > > > > > > >> >> > > > > > you comment on whether they include the new
> > > > controller
> > > > > > > >> >> listener? If
> > > > > > > >> >> > > > not,
> > > > > > > >> >> > > > > do
> > > > > > > >> >> > > > > > we need additional metrics to measure the
> > > utilization
> > > > > of
> > > > > > > the
> > > > > > > >> io
> > > > > > > >> >> > > thread
> > > > > > > >> >> > > > > for
> > > > > > > >> >> > > > > > the control plane?
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > Jun
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > > > > >> >> lucasatucla@gmail.com
> > > > > > > >> >> > >
> > > > > > > >> >> > > > > wrote:
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > 1. I think the answer should be no, since
> the "
> > > > > > > >> >> > > > > > inter.broker.listener.name"
> > > > > > > >> >> > > > > > > are also used
> > > > > > > >> >> > > > > > > for replication traffic, and merging these
> two
> > > > types
> > > > > of
> > > > > > > >> >> request
> > > > > > > >> >> > to
> > > > > > > >> >> > > > the
> > > > > > > >> >> > > > > > > single threaded tunnel
> > > > > > > >> >> > > > > > > would defeat the purpose of this KIP and also
> > > hurt
> > > > > > > >> replication
> > > > > > > >> >> > > > > > throughput.
> > > > > > > >> >> > > > > > > So I think that means
> > > > > > > >> >> > > > > > > we should validate to make sure when the new
> > > config
> > > > > is
> > > > > > > set,
> > > > > > > >> >> it's
> > > > > > > >> >> > > > > > different
> > > > > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > > > > >> >> > > > > > > or "security.inter.broker.protocol",
> > whichever is
> > > > > set.
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > 2. Normally all broker configs in a given
> > cluster
> > > > are
> > > > > > > >> changed
> > > > > > > >> >> at
> > > > > > > >> >> > > the
> > > > > > > >> >> > > > > same
> > > > > > > >> >> > > > > > > time. If there is a typo in the
> > > > > > > >> >> > > > > > > controller.listener.name and it's not
> > available
> > > in
> > > > > the
> > > > > > > >> >> endpoints
> > > > > > > >> >> > > > list,
> > > > > > > >> >> > > > > > we
> > > > > > > >> >> > > > > > > could catch it, give an error
> > > > > > > >> >> > > > > > > and block restart of the first broker in the
> > > > cluster.
> > > > > > > With
> > > > > > > >> >> that,
> > > > > > > >> >> > we
> > > > > > > >> >> > > > > could
> > > > > > > >> >> > > > > > > keep the current behavior
> > > > > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > > > > >> >> > "inter.broker.listener.nam"
> > > > > > > >> >> > > > when
> > > > > > > >> >> > > > > > the
> > > > > > > >> >> > > > > > > "controller.listener.name"
> > > > > > > >> >> > > > > > > is not found during the migration phase of
> this
> > > > KIP.
> > > > > > > >> Thoughts?
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > Thanks,
> > > > > > > >> >> > > > > > > Lucas
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > > > > jun@confluent.io>
> > > > > > > >> >> > wrote:
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > > Hi, Lucas,
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks
> > > good
> > > > to
> > > > > > me
> > > > > > > >> >> > overall. A
> > > > > > > >> >> > > > few
> > > > > > > >> >> > > > > > > minor
> > > > > > > >> >> > > > > > > > comments below.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > 1. It's possible that
> > > > listener.name.for.controller
> > > > > is
> > > > > > > >> set,
> > > > > > > >> >> but
> > > > > > > >> >> > > set
> > > > > > > >> >> > > > to
> > > > > > > >> >> > > > > > the
> > > > > > > >> >> > > > > > > > same value as inter.broker.listener.name.
> In
> > > > that
> > > > > > > case,
> > > > > > > >> >> should
> > > > > > > >> >> > > we
> > > > > > > >> >> > > > > > have a
> > > > > > > >> >> > > > > > > > single network thread and the request
> > handling
> > > > > thread
> > > > > > > for
> > > > > > > >> >> that
> > > > > > > >> >> > > > > > listener?
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > 2. Currently, the controller always picks
> the
> > > > > > listener
> > > > > > > >> >> > specified
> > > > > > > >> >> > > by
> > > > > > > >> >> > > > > > > > inter.broker.listener.name even if the
> > > listener
> > > > > name
> > > > > > > is
> > > > > > > >> not
> > > > > > > >> >> > > > present
> > > > > > > >> >> > > > > in
> > > > > > > >> >> > > > > > > the
> > > > > > > >> >> > > > > > > > receiving broker. This KIP proposes a
> > slightly
> > > > > > > different
> > > > > > > >> >> > approach
> > > > > > > >> >> > > > for
> > > > > > > >> >> > > > > > > > picking listener.name.for.controller only
> > when
> > > > the
> > > > > > > >> receiving
> > > > > > > >> >> > end
> > > > > > > >> >> > > > has
> > > > > > > >> >> > > > > > the
> > > > > > > >> >> > > > > > > > listener and switches
> > > > listener.name.for.controller
> > > > > > > >> >> otherwise.
> > > > > > > >> >> > > There
> > > > > > > >> >> > > > > are
> > > > > > > >> >> > > > > > > > some tradeoffs between the two approaches.
> To
> > > > > change
> > > > > > > the
> > > > > > > >> >> inter
> > > > > > > >> >> > > > broker
> > > > > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
> > > adding
> > > > > the
> > > > > > > new
> > > > > > > >> >> > > listener
> > > > > > > >> >> > > > to
> > > > > > > >> >> > > > > > > > listener list in every broker and (2)
> > changing
> > > > > > > >> >> > > > > > > > listener.name.for.controller.
> > > > > > > >> >> > > > > > > > The latter can do both changes in 1 step.
> On
> > > the
> > > > > > hand,
> > > > > > > if
> > > > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > > > >> >> > > > > > > > is mis-configured, the former will report
> an
> > > > error
> > > > > > and
> > > > > > > >> the
> > > > > > > >> >> > latter
> > > > > > > >> >> > > > > will
> > > > > > > >> >> > > > > > > hide
> > > > > > > >> >> > > > > > > > it (so the user may not know the
> > > > misconfiguration).
> > > > > > It
> > > > > > > >> seems
> > > > > > > >> >> > that
> > > > > > > >> >> > > > we
> > > > > > > >> >> > > > > > > should
> > > > > > > >> >> > > > > > > > pick one approach to handle both
> > > > > > > >> >> listener.name.for.controller
> > > > > > > >> >> > and
> > > > > > > >> >> > > > > > > > inter.broker.listener.name consistently.
> To
> > > me,
> > > > > the
> > > > > > > >> former
> > > > > > > >> >> > seems
> > > > > > > >> >> > > > > > > slightly
> > > > > > > >> >> > > > > > > > better.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > 3. To be consistent with the existing
> naming,
> > > > > should
> > > > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > > > >> >> > > > > > > > be controller.listener.name?
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Thanks,
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Jun
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang
> <
> > > > > > > >> >> > > lucasatucla@gmail.com>
> > > > > > > >> >> > > > > > > wrote:
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > > > > >> >> > > > > > > > >
> > > > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a
> bit
> > > > since
> > > > > > your
> > > > > > > >> +1.
> > > > > > > >> >> > > > > > > > > Can you please take another review?
> Thanks
> > a
> > > > lot
> > > > > > for
> > > > > > > >> your
> > > > > > > >> >> > time!
> > > > > > > >> >> > > > > > > > >
> > > > > > > >> >> > > > > > > > > Lucas
> > > > > > > >> >> > > > > > > > >
> > > > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel
> > Koshy
> > > <
> > > > > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > > > > >> >> > > > > > > > wrote:
> > > > > > > >> >> > > > > > > > >
> > > > > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > > > > >> >> > > > > > > > > >
> > > > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > > > > introduce
> > > > > > > the
> > > > > > > >> >> > > condition
> > > > > > > >> >> > > > > > > > variables
> > > > > > > >> >> > > > > > > > > > for the concern that Jun raised, but
> it's
> > > an
> > > > > > > >> >> implementation
> > > > > > > >> >> > > > > detail
> > > > > > > >> >> > > > > > > that
> > > > > > > >> >> > > > > > > > > we
> > > > > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > > > > >> >> > > > > > > > > >
> > > > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas
> > > Wang
> > > > <
> > > > > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > > > > >> >> > > > > > > > > > wrote:
> > > > > > > >> >> > > > > > > > > >
> > > > > > > >> >> > > > > > > > > > > Hi Jun,
> > > > > > > >> >> > > > > > > > > > >
> > > > > > > >> >> > > > > > > > > > > I agree by using the conditional
> > > variables,
> > > > > > there
> > > > > > > >> is
> > > > > > > >> >> no
> > > > > > > >> >> > > need
> > > > > > > >> >> > > > to
> > > > > > > >> >> > > > > > add
> > > > > > > >> >> > > > > > > > > such
> > > > > > > >> >> > > > > > > > > > a
> > > > > > > >> >> > > > > > > > > > > new config.
> > > > > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > > > > >> >> > > > > > > > > > >
> > > > > > > >> >> > > > > > > > > > > Lucas
> > > > > > > >> >> > > > > > > > > > >
> > > > > > > >> >> > > > > > > > > >
> > > > > > > >> >> > > > > > > > >
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > >
> > > > > > > >> >> > >
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

Yes, the new names sound good to me.

Thanks,

Jun

On Fri, Oct 5, 2018 at 1:12 PM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the suggestion, Ismael. I like it.
>
> Jun,
> I'm excited to get the +1, thanks a lot!
> Meanwhile what do you feel about renaming the metrics and config to
>
> ControlPlaneRequestQueueSize
>
> ControlPlaneNetworkProcessorIdlePercent
>
> ControlPlaneRequestHandlerIdlePercent
>
> control.plane.listener.name
>
> ?
>
>
> Thanks,
>
> Lucas
>
> On Thu, Oct 4, 2018 at 11:38 AM Ismael Juma <is...@gmail.com> wrote:
>
> > Have we considered control plane if we think control by itself is
> > ambiguous? I agree with the original concern that "controller" may be
> > confusing for something that affects all brokers.
> >
> > Ismael
> >
> >
> > On 4 Oct 2018 11:08 am, "Lucas Wang" <lu...@gmail.com> wrote:
> >
> > Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
> > Please take a look again when you have time.
> >
> > Regards,
> > Lucas
> >
> >
> > On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > 200. That's a valid concern. So, we can probably just keep the current
> > > name.
> > >
> > > 201. I am thinking that you would upgrade in the same way as changing
> > > inter.broker.listener.name. This requires 2 rounds of rolling restart.
> > In
> > > the first round, we add the controller endpoint to the listeners w/o
> > > setting controller.listener.name. In the second round, every broker
> sets
> > > controller.listener.name. At that point, the controller listener is
> > ready
> > > in every broker.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the further comments, Jun.
> > > >
> > > > 200. Currently in the code base, we have the term of "ControlBatch"
> > > related
> > > > to
> > > > idempotent/transactional producing. Do you think it's a concern for
> > > reusing
> > > > the term "control"?
> > > >
> > > > 201. It's not clear to me how it would work by following the same
> > > strategy
> > > > for "controller.listener.name".
> > > > Say the new controller has its "controller.listener.name" set to the
> > > value
> > > > "CONTROLLER", and broker 1
> > > > has picked up this KIP by announcing
> > > > "endpoints": [
> > > >         "CONTROLLER://broker1.example.com:9091",
> > > >         "INTERNAL://broker1.example.com:9092",
> > > >         "EXTERNAL://host1.example.com:9093"
> > > >     ],
> > > >
> > > > while broker2 has not picked up the change, and is announcing
> > > > "endpoints": [
> > > >         "INTERNAL://broker2.example.com:9092",
> > > >         "EXTERNAL://host2.example.com:9093"
> > > >     ],
> > > > to support both broker 1 for the new behavior and broker 2 for the
> old
> > > > behavior, it seems the controller must
> > > > check their published endpoints. Am I missing something?
> > > >
> > > > Thanks!
> > > > Lucas
> > > >
> > > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Lucas,
> > > > >
> > > > > Sorry for the delay. The updated wiki looks good to me overall.
> Just
> > a
> > > > > couple more minor comments.
> > > > >
> > > > > 200.
> > kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> > > > The
> > > > > name ControllerRequestQueueSize gives the impression that it's only
> > for
> > > > the
> > > > > controller broker. Perhaps we can just rename all metrics and
> configs
> > > > from
> > > > > controller to control. This indicates that the threads and the
> queues
> > > are
> > > > > for the control requests (as oppose to data requests).
> > > > >
> > > > > 201. "<new controller, old broker>: In this scenario, the
> controller
> > > will
> > > > > have the "controller.listener.name" config set to a value like
> > > > > "CONTROLLER", however the broker's exposed endpoints do not have an
> > > entry
> > > > > corresponding to the new listener name. Hence the controller should
> > > > > preserve the existing behavior by determining the endpoint using
> > > > > *inter-broker-listener-name *value. The end result should be the
> same
> > > > > behavior as today." Currently, the controller makes connections
> based
> > > on
> > > > > its local inter.broker.listener.name config without checking the
> > > target
> > > > > broker's ZK registration. For consistency, perhaps we can just
> follow
> > > the
> > > > > same strategy for controller.listener.name. This existing behavior
> > > seems
> > > > > simpler to understand and has the benefit of catching inconsistent
> > > > configs
> > > > > across brokers.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Sorry to bother you again. Can you please take a look at the wiki
> > > again
> > > > > > when you have time?
> > > > > >
> > > > > > Thanks a lot!
> > > > > > Lucas
> > > > > >
> > > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks a lot for the detailed explanation.
> > > > > > > I've restored the wiki to a previous version that does not
> > require
> > > > > config
> > > > > > > changes,
> > > > > > > and keeps the current behavior with the proposed changes turned
> > off
> > > > by
> > > > > > > default.
> > > > > > > I'd appreciate it if you can review it again.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > >> Hi, Lucas,
> > > > > > >>
> > > > > > >> When upgrading to a minor release, I think the expectation is
> > > that a
> > > > > > user
> > > > > > >> wouldn't need to make any config changes, other than the usual
> > > > > > >> inter.broker.protocol. If we require other config changes
> during
> > > an
> > > > > > >> upgrade, then it's probably better to do that in a major
> > release.
> > > > > > >>
> > > > > > >> Regarding your proposal, I think removing host/advertised_host
> > in
> > > > > favor
> > > > > > of
> > > > > > >> listeners:advertised_listeners seems useful regardless of this
> > > KIP.
> > > > > > >> However, that can probably wait until a major release.
> > > > > > >>
> > > > > > >> As for the controller listener, I am not sure if one has to
> set
> > > it.
> > > > To
> > > > > > >> make
> > > > > > >> a cluster healthy, one sort of have to make sure that the
> > request
> > > > > queue
> > > > > > is
> > > > > > >> never full and no request will be sitting in the request queue
> > for
> > > > > long.
> > > > > > >> If
> > > > > > >> one does that, setting the controller listener may not be
> > > necessary.
> > > > > On
> > > > > > >> the
> > > > > > >> flip side, even if one sets the controller listener, but the
> > > request
> > > > > > queue
> > > > > > >> and the request time for the data part are still high, the
> > cluster
> > > > may
> > > > > > >> still not be healthy. Given that we have already started the
> 2.1
> > > > > release
> > > > > > >> planning, perhaps we can start with not requiring the
> controller
> > > > > > listener.
> > > > > > >> If this is indeed something that everyone wants to set, we can
> > > make
> > > > > it a
> > > > > > >> required config in a major release.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Jun
> > > > > > >>
> > > > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > @Jun Rao <ju...@confluent.io>
> > > > > > >> >
> > > > > > >> > I made the recent config changes after thinking about the
> > > default
> > > > > > >> behavior
> > > > > > >> > for adopting this KIP.
> > > > > > >> > I think there are basically two options:
> > > > > > >> > 1. By default, the behavior proposed in this KIP is turned
> > off,
> > > > and
> > > > > > >> > operators can turn it
> > > > > > >> > on by adding the "controller.listener.name" config and
> > entries
> > > in
> > > > > the
> > > > > > >> > "listeners" and "advertised.listeners" list.
> > > > > > >> > If no "controller.listener.name" is added, it'll be the
> *same
> > > as*
> > > > > > the "
> > > > > > >> > inter.broker.listener.name",
> > > > > > >> > and the proposed feature is effectively turned off.
> > > > > > >> > This has been the assumption in the KIP writeup before.
> > > > > > >> >
> > > > > > >> > 2. By default, the behavior proposed in this KIP is turned
> on,
> > > and
> > > > > > >> > operators are forced to
> > > > > > >> > recognize the proposed change if their "listeners" config is
> > set
> > > > > (this
> > > > > > >> is
> > > > > > >> > most likely in production environments),
> > > > > > >> > by allocating a new port for controller connections, and
> > adding
> > > a
> > > > > new
> > > > > > >> > endpoint to the "listeners" config.
> > > > > > >> > For cases where "listeners" is not set explicitly,
> > > > > > >> > there needs to be a default value for it that includes the
> > > > > controller
> > > > > > >> > listener name,
> > > > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > > > >> >
> > > > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > > > >> > I naturally think in the long run, it's worth the effort to
> > > adopt
> > > > > this
> > > > > > >> > feature,
> > > > > > >> > in order to prevent issues under circumstances listed in the
> > > > > > motivation
> > > > > > >> > section.
> > > > > > >> >
> > > > > > >> > 100, following the argument above, I want to enforce the
> > > > separation
> > > > > > >> > between controller
> > > > > > >> > and data plane requests. Hence the "
> controller.listener.name"
> > > > > should
> > > > > > >> > never be the same
> > > > > > >> > as the "inter.broker.listener.name", which is intended for
> > data
> > > > > plane
> > > > > > >> > requests.
> > > > > > >> >
> > > > > > >> > 101, the default value for "listeners" will be
> > > > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > > > >> > making values of "host", and "port" not being used under any
> > > > config
> > > > > > >> > settings.
> > > > > > >> > And the default value for "advertised.listeners" will be
> > derived
> > > > > from
> > > > > > >> > "listeners",
> > > > > > >> > making the values of "advertised.host", and
> "advertised.port"
> > > not
> > > > > > being
> > > > > > >> > used any more.
> > > > > > >> >
> > > > > > >> > 102, for upgrading, a single broker that has "listeners"
> > and/or
> > > > > > >> > "advertised.listeners" set,
> > > > > > >> > must add a new endpoint for the CONTROLLER listener name, or
> > end
> > > > up
> > > > > > >> using
> > > > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER:
> //:9091".
> > > > > > >> > During rolling upgrade, in cases of <old controller> + <new
> > > > broker>
> > > > > or
> > > > > > >> > <new controller>  + <old broker>
> > > > > > >> > we still need to fall back to the current behavior. However
> > > after
> > > > > the
> > > > > > >> > rolling upgrade is done,
> > > > > > >> > it is guaranteed that the controller plane and data plane
> are
> > > > > > separated,
> > > > > > >> > given
> > > > > > >> > the "controller.listener.name" must be different from "
> > > > > > >> > inter.broker.listener.name".
> > > > > > >> >
> > > > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > > > >> > Thanks for pointing that out. I did not know that.
> > > > > > >> > However my question is if the argument above makes sense,
> and
> > my
> > > > > code
> > > > > > >> > change
> > > > > > >> > causes the configs "host", "port", "advertised.host",
> > > > > > "advertised.port"
> > > > > > >> to
> > > > > > >> > be not used under any circumstance,
> > > > > > >> > then it's no different from removing them.
> > > > > > >> > Anyway if there is still a concern about removing them, is
> > > there a
> > > > > new
> > > > > > >> > major new version
> > > > > > >> > now or in the future where I can remove them?
> > > > > > >> >
> > > > > > >> > Thanks!
> > > > > > >> > Lucas
> > > > > > >> >
> > > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <
> > ismael@juma.me.uk>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> >> To be clear, we can only remove configs in major new
> > versions.
> > > > > > >> Otherwise,
> > > > > > >> >> we can only deprecate them.
> > > > > > >> >>
> > > > > > >> >> Ismael
> > > > > > >> >>
> > > > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <jun@confluent.io
> >
> > > > wrote:
> > > > > > >> >>
> > > > > > >> >> > Hi, Lucas,
> > > > > > >> >> >
> > > > > > >> >> > For the network idlePct, your understanding is correct.
> > > > > Currently,
> > > > > > >> >> > networkIdlePct metric is calculated as the average of (1
> -
> > > > > > io-ratio)
> > > > > > >> in
> > > > > > >> >> the
> > > > > > >> >> > selector of all network threads.
> > > > > > >> >> >
> > > > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > > > >> >> >
> > > > > > >> >> > I am not still not quite sure about the configs.
> > > > > > >> >> >
> > > > > > >> >> > 100. "Whenever the "controller.listener.name" is set,
> upon
> > > > > broker
> > > > > > >> >> startup,
> > > > > > >> >> > we will validate its value and make sure it's different
> > from
> > > > the
> > > > > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > > > > >> >> > controller.listener.name has to be different from
> > > > > > >> >> > inter.broker.listener.name?
> > > > > > >> >> > That seems limiting.
> > > > > > >> >> >
> > > > > > >> >> > 101. The KIP says that advertised.listeners and listeners
> > > will
> > > > > now
> > > > > > >> have
> > > > > > >> >> a
> > > > > > >> >> > different default value including controller. Could you
> > > > document
> > > > > > what
> > > > > > >> >> the
> > > > > > >> >> > default value looks like?
> > > > > > >> >> >
> > > > > > >> >> > 102. About removing the the following configs. How does
> > that
> > > > > affect
> > > > > > >> the
> > > > > > >> >> > upgrade path? Do we now expect a user to add a new config
> > > when
> > > > > > >> upgrading
> > > > > > >> >> > from an old version to a new one?
> > > > > > >> >> > host
> > > > > > >> >> > port
> > > > > > >> >> > advertised.host
> > > > > > >> >> > advertised.port
> > > > > > >> >> >
> > > > > > >> >> > Thanks,
> > > > > > >> >> >
> > > > > > >> >> > Jun
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > > >> >> wrote:
> > > > > > >> >> >
> > > > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > > > >> >> > >
> > > > > > >> >> > > One clarification, currently on the selector level, we
> > have
> > > > the
> > > > > > >> >> > > io-wait-ratio metric.
> > > > > > >> >> > > For the new controller *network* thread, we can use it
> > > > directly
> > > > > > for
> > > > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > > > >> >> > > so that the logic is similar to the current average
> > IdlePct
> > > > for
> > > > > > >> >> network
> > > > > > >> >> > > threads. Is that correct?
> > > > > > >> >> > >
> > > > > > >> >> > > I've revised the KIP by adding two new metrics for
> > > measuring
> > > > > the
> > > > > > >> >> IdlePct
> > > > > > >> >> > > for the two additional threads.
> > > > > > >> >> > > Please take a look again. Thanks!
> > > > > > >> >> > >
> > > > > > >> >> > > Lucas
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <
> jun@confluent.io
> > >
> > > > > wrote:
> > > > > > >> >> > >
> > > > > > >> >> > > > Hi, Lucas,
> > > > > > >> >> > > >
> > > > > > >> >> > > > Thanks for the updated KIP.
> > > > > > >> >> > > >
> > > > > > >> >> > > > For monitoring the network thread utilization for the
> > > > control
> > > > > > >> >> plane, we
> > > > > > >> >> > > > already have the metric io-ratio at the selector
> level
> > > > > (idlePct
> > > > > > >> is
> > > > > > >> >> 1 -
> > > > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > > > meaningful
> > > > > > >> name.
> > > > > > >> >> > > >
> > > > > > >> >> > > > For monitoring the io thread utilization for the
> > control
> > > > > plane,
> > > > > > >> it's
> > > > > > >> >> > > > probably useful to have a separate metric for that.
> The
> > > > > > >> controller
> > > > > > >> >> > > request
> > > > > > >> >> > > > queue size may not reflect the history in a window.
> > > > > > >> >> > > >
> > > > > > >> >> > > > Jun
> > > > > > >> >> > > >
> > > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > > > >> lucasatucla@gmail.com>
> > > > > > >> >> > > wrote:
> > > > > > >> >> > > >
> > > > > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> > > > forgot
> > > > > to
> > > > > > >> >> click
> > > > > > >> >> > the
> > > > > > >> >> > > > > "Update" button, :)
> > > > > > >> >> > > > > It's updated now.
> > > > > > >> >> > > > >
> > > > > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > > > > threads,
> > > > > > I
> > > > > > >> >> > > discussed
> > > > > > >> >> > > > > with Joel,
> > > > > > >> >> > > > > and think they are not as useful, and I added our
> > > > reasoning
> > > > > > in
> > > > > > >> the
> > > > > > >> >> > last
> > > > > > >> >> > > > > paragraph of the
> > > > > > >> >> > > > > "How are controller requests handled over the
> > dedicated
> > > > > > >> >> connections?"
> > > > > > >> >> > > > > section.
> > > > > > >> >> > > > > On the other hand, we don't strongly oppose adding
> > them
> > > > if
> > > > > > you
> > > > > > >> >> think
> > > > > > >> >> > > they
> > > > > > >> >> > > > > are necessary.
> > > > > > >> >> > > > >
> > > > > > >> >> > > > > Thanks,
> > > > > > >> >> > > > > Lucas
> > > > > > >> >> > > > >
> > > > > > >> >> > > > >
> > > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> > > jun@confluent.io
> > > > >
> > > > > > >> wrote:
> > > > > > >> >> > > > >
> > > > > > >> >> > > > > > Hi, Lucas,
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > > > Thanks for the reply. Have you actually updated
> the
> > > > KIP?
> > > > > > The
> > > > > > >> >> wiki
> > > > > > >> >> > > says
> > > > > > >> >> > > > > that
> > > > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
> > changes
> > > > > that
> > > > > > >> you
> > > > > > >> >> > > > mentioned
> > > > > > >> >> > > > > > (#1 and #3) are not there.
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > > > Also, regarding Joel's comment on network/request
> > > idle
> > > > > > ratio
> > > > > > >> >> > metrics,
> > > > > > >> >> > > > > could
> > > > > > >> >> > > > > > you comment on whether they include the new
> > > controller
> > > > > > >> >> listener? If
> > > > > > >> >> > > > not,
> > > > > > >> >> > > > > do
> > > > > > >> >> > > > > > we need additional metrics to measure the
> > utilization
> > > > of
> > > > > > the
> > > > > > >> io
> > > > > > >> >> > > thread
> > > > > > >> >> > > > > for
> > > > > > >> >> > > > > > the control plane?
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > > > Jun
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > > > >> >> lucasatucla@gmail.com
> > > > > > >> >> > >
> > > > > > >> >> > > > > wrote:
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > > > > >> >> > > > > > inter.broker.listener.name"
> > > > > > >> >> > > > > > > are also used
> > > > > > >> >> > > > > > > for replication traffic, and merging these two
> > > types
> > > > of
> > > > > > >> >> request
> > > > > > >> >> > to
> > > > > > >> >> > > > the
> > > > > > >> >> > > > > > > single threaded tunnel
> > > > > > >> >> > > > > > > would defeat the purpose of this KIP and also
> > hurt
> > > > > > >> replication
> > > > > > >> >> > > > > > throughput.
> > > > > > >> >> > > > > > > So I think that means
> > > > > > >> >> > > > > > > we should validate to make sure when the new
> > config
> > > > is
> > > > > > set,
> > > > > > >> >> it's
> > > > > > >> >> > > > > > different
> > > > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > > > >> >> > > > > > > or "security.inter.broker.protocol",
> whichever is
> > > > set.
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > 2. Normally all broker configs in a given
> cluster
> > > are
> > > > > > >> changed
> > > > > > >> >> at
> > > > > > >> >> > > the
> > > > > > >> >> > > > > same
> > > > > > >> >> > > > > > > time. If there is a typo in the
> > > > > > >> >> > > > > > > controller.listener.name and it's not
> available
> > in
> > > > the
> > > > > > >> >> endpoints
> > > > > > >> >> > > > list,
> > > > > > >> >> > > > > > we
> > > > > > >> >> > > > > > > could catch it, give an error
> > > > > > >> >> > > > > > > and block restart of the first broker in the
> > > cluster.
> > > > > > With
> > > > > > >> >> that,
> > > > > > >> >> > we
> > > > > > >> >> > > > > could
> > > > > > >> >> > > > > > > keep the current behavior
> > > > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > > > >> >> > "inter.broker.listener.nam"
> > > > > > >> >> > > > when
> > > > > > >> >> > > > > > the
> > > > > > >> >> > > > > > > "controller.listener.name"
> > > > > > >> >> > > > > > > is not found during the migration phase of this
> > > KIP.
> > > > > > >> Thoughts?
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > Thanks,
> > > > > > >> >> > > > > > > Lucas
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > > > jun@confluent.io>
> > > > > > >> >> > wrote:
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > > > > Hi, Lucas,
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks
> > good
> > > to
> > > > > me
> > > > > > >> >> > overall. A
> > > > > > >> >> > > > few
> > > > > > >> >> > > > > > > minor
> > > > > > >> >> > > > > > > > comments below.
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > 1. It's possible that
> > > listener.name.for.controller
> > > > is
> > > > > > >> set,
> > > > > > >> >> but
> > > > > > >> >> > > set
> > > > > > >> >> > > > to
> > > > > > >> >> > > > > > the
> > > > > > >> >> > > > > > > > same value as inter.broker.listener.name. In
> > > that
> > > > > > case,
> > > > > > >> >> should
> > > > > > >> >> > > we
> > > > > > >> >> > > > > > have a
> > > > > > >> >> > > > > > > > single network thread and the request
> handling
> > > > thread
> > > > > > for
> > > > > > >> >> that
> > > > > > >> >> > > > > > listener?
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > > > > listener
> > > > > > >> >> > specified
> > > > > > >> >> > > by
> > > > > > >> >> > > > > > > > inter.broker.listener.name even if the
> > listener
> > > > name
> > > > > > is
> > > > > > >> not
> > > > > > >> >> > > > present
> > > > > > >> >> > > > > in
> > > > > > >> >> > > > > > > the
> > > > > > >> >> > > > > > > > receiving broker. This KIP proposes a
> slightly
> > > > > > different
> > > > > > >> >> > approach
> > > > > > >> >> > > > for
> > > > > > >> >> > > > > > > > picking listener.name.for.controller only
> when
> > > the
> > > > > > >> receiving
> > > > > > >> >> > end
> > > > > > >> >> > > > has
> > > > > > >> >> > > > > > the
> > > > > > >> >> > > > > > > > listener and switches
> > > listener.name.for.controller
> > > > > > >> >> otherwise.
> > > > > > >> >> > > There
> > > > > > >> >> > > > > are
> > > > > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> > > > change
> > > > > > the
> > > > > > >> >> inter
> > > > > > >> >> > > > broker
> > > > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
> > adding
> > > > the
> > > > > > new
> > > > > > >> >> > > listener
> > > > > > >> >> > > > to
> > > > > > >> >> > > > > > > > listener list in every broker and (2)
> changing
> > > > > > >> >> > > > > > > > listener.name.for.controller.
> > > > > > >> >> > > > > > > > The latter can do both changes in 1 step. On
> > the
> > > > > hand,
> > > > > > if
> > > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > > >> >> > > > > > > > is mis-configured, the former will report an
> > > error
> > > > > and
> > > > > > >> the
> > > > > > >> >> > latter
> > > > > > >> >> > > > > will
> > > > > > >> >> > > > > > > hide
> > > > > > >> >> > > > > > > > it (so the user may not know the
> > > misconfiguration).
> > > > > It
> > > > > > >> seems
> > > > > > >> >> > that
> > > > > > >> >> > > > we
> > > > > > >> >> > > > > > > should
> > > > > > >> >> > > > > > > > pick one approach to handle both
> > > > > > >> >> listener.name.for.controller
> > > > > > >> >> > and
> > > > > > >> >> > > > > > > > inter.broker.listener.name consistently. To
> > me,
> > > > the
> > > > > > >> former
> > > > > > >> >> > seems
> > > > > > >> >> > > > > > > slightly
> > > > > > >> >> > > > > > > > better.
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> > > > should
> > > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > > >> >> > > > > > > > be controller.listener.name?
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > Thanks,
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > Jun
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > > > > >> >> > > lucasatucla@gmail.com>
> > > > > > >> >> > > > > > > wrote:
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > > > >> >> > > > > > > > >
> > > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit
> > > since
> > > > > your
> > > > > > >> +1.
> > > > > > >> >> > > > > > > > > Can you please take another review? Thanks
> a
> > > lot
> > > > > for
> > > > > > >> your
> > > > > > >> >> > time!
> > > > > > >> >> > > > > > > > >
> > > > > > >> >> > > > > > > > > Lucas
> > > > > > >> >> > > > > > > > >
> > > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel
> Koshy
> > <
> > > > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > > > >> >> > > > > > > > wrote:
> > > > > > >> >> > > > > > > > >
> > > > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > > > >> >> > > > > > > > > >
> > > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > > > introduce
> > > > > > the
> > > > > > >> >> > > condition
> > > > > > >> >> > > > > > > > variables
> > > > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's
> > an
> > > > > > >> >> implementation
> > > > > > >> >> > > > > detail
> > > > > > >> >> > > > > > > that
> > > > > > >> >> > > > > > > > > we
> > > > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > > > >> >> > > > > > > > > >
> > > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas
> > Wang
> > > <
> > > > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > > > >> >> > > > > > > > > > wrote:
> > > > > > >> >> > > > > > > > > >
> > > > > > >> >> > > > > > > > > > > Hi Jun,
> > > > > > >> >> > > > > > > > > > >
> > > > > > >> >> > > > > > > > > > > I agree by using the conditional
> > variables,
> > > > > there
> > > > > > >> is
> > > > > > >> >> no
> > > > > > >> >> > > need
> > > > > > >> >> > > > to
> > > > > > >> >> > > > > > add
> > > > > > >> >> > > > > > > > > such
> > > > > > >> >> > > > > > > > > > a
> > > > > > >> >> > > > > > > > > > > new config.
> > > > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > > > >> >> > > > > > > > > > >
> > > > > > >> >> > > > > > > > > > > Lucas
> > > > > > >> >> > > > > > > > > > >
> > > > > > >> >> > > > > > > > > >
> > > > > > >> >> > > > > > > > >
> > > > > > >> >> > > > > > > >
> > > > > > >> >> > > > > > >
> > > > > > >> >> > > > > >
> > > > > > >> >> > > > >
> > > > > > >> >> > > >
> > > > > > >> >> > >
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Thanks for the suggestion, Ismael. I like it.

Jun,
I'm excited to get the +1, thanks a lot!
Meanwhile what do you feel about renaming the metrics and config to

ControlPlaneRequestQueueSize

ControlPlaneNetworkProcessorIdlePercent

ControlPlaneRequestHandlerIdlePercent

control.plane.listener.name

?


Thanks,

Lucas

On Thu, Oct 4, 2018 at 11:38 AM Ismael Juma <is...@gmail.com> wrote:

> Have we considered control plane if we think control by itself is
> ambiguous? I agree with the original concern that "controller" may be
> confusing for something that affects all brokers.
>
> Ismael
>
>
> On 4 Oct 2018 11:08 am, "Lucas Wang" <lu...@gmail.com> wrote:
>
> Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
> Please take a look again when you have time.
>
> Regards,
> Lucas
>
>
> On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > 200. That's a valid concern. So, we can probably just keep the current
> > name.
> >
> > 201. I am thinking that you would upgrade in the same way as changing
> > inter.broker.listener.name. This requires 2 rounds of rolling restart.
> In
> > the first round, we add the controller endpoint to the listeners w/o
> > setting controller.listener.name. In the second round, every broker sets
> > controller.listener.name. At that point, the controller listener is
> ready
> > in every broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks for the further comments, Jun.
> > >
> > > 200. Currently in the code base, we have the term of "ControlBatch"
> > related
> > > to
> > > idempotent/transactional producing. Do you think it's a concern for
> > reusing
> > > the term "control"?
> > >
> > > 201. It's not clear to me how it would work by following the same
> > strategy
> > > for "controller.listener.name".
> > > Say the new controller has its "controller.listener.name" set to the
> > value
> > > "CONTROLLER", and broker 1
> > > has picked up this KIP by announcing
> > > "endpoints": [
> > >         "CONTROLLER://broker1.example.com:9091",
> > >         "INTERNAL://broker1.example.com:9092",
> > >         "EXTERNAL://host1.example.com:9093"
> > >     ],
> > >
> > > while broker2 has not picked up the change, and is announcing
> > > "endpoints": [
> > >         "INTERNAL://broker2.example.com:9092",
> > >         "EXTERNAL://host2.example.com:9093"
> > >     ],
> > > to support both broker 1 for the new behavior and broker 2 for the old
> > > behavior, it seems the controller must
> > > check their published endpoints. Am I missing something?
> > >
> > > Thanks!
> > > Lucas
> > >
> > > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > Sorry for the delay. The updated wiki looks good to me overall. Just
> a
> > > > couple more minor comments.
> > > >
> > > > 200.
> kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> > > The
> > > > name ControllerRequestQueueSize gives the impression that it's only
> for
> > > the
> > > > controller broker. Perhaps we can just rename all metrics and configs
> > > from
> > > > controller to control. This indicates that the threads and the queues
> > are
> > > > for the control requests (as oppose to data requests).
> > > >
> > > > 201. "<new controller, old broker>: In this scenario, the controller
> > will
> > > > have the "controller.listener.name" config set to a value like
> > > > "CONTROLLER", however the broker's exposed endpoints do not have an
> > entry
> > > > corresponding to the new listener name. Hence the controller should
> > > > preserve the existing behavior by determining the endpoint using
> > > > *inter-broker-listener-name *value. The end result should be the same
> > > > behavior as today." Currently, the controller makes connections based
> > on
> > > > its local inter.broker.listener.name config without checking the
> > target
> > > > broker's ZK registration. For consistency, perhaps we can just follow
> > the
> > > > same strategy for controller.listener.name. This existing behavior
> > seems
> > > > simpler to understand and has the benefit of catching inconsistent
> > > configs
> > > > across brokers.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Sorry to bother you again. Can you please take a look at the wiki
> > again
> > > > > when you have time?
> > > > >
> > > > > Thanks a lot!
> > > > > Lucas
> > > > >
> > > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks a lot for the detailed explanation.
> > > > > > I've restored the wiki to a previous version that does not
> require
> > > > config
> > > > > > changes,
> > > > > > and keeps the current behavior with the proposed changes turned
> off
> > > by
> > > > > > default.
> > > > > > I'd appreciate it if you can review it again.
> > > > > >
> > > > > > Thanks!
> > > > > > Lucas
> > > > > >
> > > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > >> Hi, Lucas,
> > > > > >>
> > > > > >> When upgrading to a minor release, I think the expectation is
> > that a
> > > > > user
> > > > > >> wouldn't need to make any config changes, other than the usual
> > > > > >> inter.broker.protocol. If we require other config changes during
> > an
> > > > > >> upgrade, then it's probably better to do that in a major
> release.
> > > > > >>
> > > > > >> Regarding your proposal, I think removing host/advertised_host
> in
> > > > favor
> > > > > of
> > > > > >> listeners:advertised_listeners seems useful regardless of this
> > KIP.
> > > > > >> However, that can probably wait until a major release.
> > > > > >>
> > > > > >> As for the controller listener, I am not sure if one has to set
> > it.
> > > To
> > > > > >> make
> > > > > >> a cluster healthy, one sort of have to make sure that the
> request
> > > > queue
> > > > > is
> > > > > >> never full and no request will be sitting in the request queue
> for
> > > > long.
> > > > > >> If
> > > > > >> one does that, setting the controller listener may not be
> > necessary.
> > > > On
> > > > > >> the
> > > > > >> flip side, even if one sets the controller listener, but the
> > request
> > > > > queue
> > > > > >> and the request time for the data part are still high, the
> cluster
> > > may
> > > > > >> still not be healthy. Given that we have already started the 2.1
> > > > release
> > > > > >> planning, perhaps we can start with not requiring the controller
> > > > > listener.
> > > > > >> If this is indeed something that everyone wants to set, we can
> > make
> > > > it a
> > > > > >> required config in a major release.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > @Jun Rao <ju...@confluent.io>
> > > > > >> >
> > > > > >> > I made the recent config changes after thinking about the
> > default
> > > > > >> behavior
> > > > > >> > for adopting this KIP.
> > > > > >> > I think there are basically two options:
> > > > > >> > 1. By default, the behavior proposed in this KIP is turned
> off,
> > > and
> > > > > >> > operators can turn it
> > > > > >> > on by adding the "controller.listener.name" config and
> entries
> > in
> > > > the
> > > > > >> > "listeners" and "advertised.listeners" list.
> > > > > >> > If no "controller.listener.name" is added, it'll be the *same
> > as*
> > > > > the "
> > > > > >> > inter.broker.listener.name",
> > > > > >> > and the proposed feature is effectively turned off.
> > > > > >> > This has been the assumption in the KIP writeup before.
> > > > > >> >
> > > > > >> > 2. By default, the behavior proposed in this KIP is turned on,
> > and
> > > > > >> > operators are forced to
> > > > > >> > recognize the proposed change if their "listeners" config is
> set
> > > > (this
> > > > > >> is
> > > > > >> > most likely in production environments),
> > > > > >> > by allocating a new port for controller connections, and
> adding
> > a
> > > > new
> > > > > >> > endpoint to the "listeners" config.
> > > > > >> > For cases where "listeners" is not set explicitly,
> > > > > >> > there needs to be a default value for it that includes the
> > > > controller
> > > > > >> > listener name,
> > > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > > >> >
> > > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > > >> > I naturally think in the long run, it's worth the effort to
> > adopt
> > > > this
> > > > > >> > feature,
> > > > > >> > in order to prevent issues under circumstances listed in the
> > > > > motivation
> > > > > >> > section.
> > > > > >> >
> > > > > >> > 100, following the argument above, I want to enforce the
> > > separation
> > > > > >> > between controller
> > > > > >> > and data plane requests. Hence the "controller.listener.name"
> > > > should
> > > > > >> > never be the same
> > > > > >> > as the "inter.broker.listener.name", which is intended for
> data
> > > > plane
> > > > > >> > requests.
> > > > > >> >
> > > > > >> > 101, the default value for "listeners" will be
> > > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > > >> > making values of "host", and "port" not being used under any
> > > config
> > > > > >> > settings.
> > > > > >> > And the default value for "advertised.listeners" will be
> derived
> > > > from
> > > > > >> > "listeners",
> > > > > >> > making the values of "advertised.host", and "advertised.port"
> > not
> > > > > being
> > > > > >> > used any more.
> > > > > >> >
> > > > > >> > 102, for upgrading, a single broker that has "listeners"
> and/or
> > > > > >> > "advertised.listeners" set,
> > > > > >> > must add a new endpoint for the CONTROLLER listener name, or
> end
> > > up
> > > > > >> using
> > > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > > > > >> > During rolling upgrade, in cases of <old controller> + <new
> > > broker>
> > > > or
> > > > > >> > <new controller>  + <old broker>
> > > > > >> > we still need to fall back to the current behavior. However
> > after
> > > > the
> > > > > >> > rolling upgrade is done,
> > > > > >> > it is guaranteed that the controller plane and data plane are
> > > > > separated,
> > > > > >> > given
> > > > > >> > the "controller.listener.name" must be different from "
> > > > > >> > inter.broker.listener.name".
> > > > > >> >
> > > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > > >> > Thanks for pointing that out. I did not know that.
> > > > > >> > However my question is if the argument above makes sense, and
> my
> > > > code
> > > > > >> > change
> > > > > >> > causes the configs "host", "port", "advertised.host",
> > > > > "advertised.port"
> > > > > >> to
> > > > > >> > be not used under any circumstance,
> > > > > >> > then it's no different from removing them.
> > > > > >> > Anyway if there is still a concern about removing them, is
> > there a
> > > > new
> > > > > >> > major new version
> > > > > >> > now or in the future where I can remove them?
> > > > > >> >
> > > > > >> > Thanks!
> > > > > >> > Lucas
> > > > > >> >
> > > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <
> ismael@juma.me.uk>
> > > > > wrote:
> > > > > >> >
> > > > > >> >> To be clear, we can only remove configs in major new
> versions.
> > > > > >> Otherwise,
> > > > > >> >> we can only deprecate them.
> > > > > >> >>
> > > > > >> >> Ismael
> > > > > >> >>
> > > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > >> >>
> > > > > >> >> > Hi, Lucas,
> > > > > >> >> >
> > > > > >> >> > For the network idlePct, your understanding is correct.
> > > > Currently,
> > > > > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > > > > io-ratio)
> > > > > >> in
> > > > > >> >> the
> > > > > >> >> > selector of all network threads.
> > > > > >> >> >
> > > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > > >> >> >
> > > > > >> >> > I am not still not quite sure about the configs.
> > > > > >> >> >
> > > > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> > > > broker
> > > > > >> >> startup,
> > > > > >> >> > we will validate its value and make sure it's different
> from
> > > the
> > > > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > > > >> >> > controller.listener.name has to be different from
> > > > > >> >> > inter.broker.listener.name?
> > > > > >> >> > That seems limiting.
> > > > > >> >> >
> > > > > >> >> > 101. The KIP says that advertised.listeners and listeners
> > will
> > > > now
> > > > > >> have
> > > > > >> >> a
> > > > > >> >> > different default value including controller. Could you
> > > document
> > > > > what
> > > > > >> >> the
> > > > > >> >> > default value looks like?
> > > > > >> >> >
> > > > > >> >> > 102. About removing the the following configs. How does
> that
> > > > affect
> > > > > >> the
> > > > > >> >> > upgrade path? Do we now expect a user to add a new config
> > when
> > > > > >> upgrading
> > > > > >> >> > from an old version to a new one?
> > > > > >> >> > host
> > > > > >> >> > port
> > > > > >> >> > advertised.host
> > > > > >> >> > advertised.port
> > > > > >> >> >
> > > > > >> >> > Thanks,
> > > > > >> >> >
> > > > > >> >> > Jun
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > > >> >> > >
> > > > > >> >> > > One clarification, currently on the selector level, we
> have
> > > the
> > > > > >> >> > > io-wait-ratio metric.
> > > > > >> >> > > For the new controller *network* thread, we can use it
> > > directly
> > > > > for
> > > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > > >> >> > > so that the logic is similar to the current average
> IdlePct
> > > for
> > > > > >> >> network
> > > > > >> >> > > threads. Is that correct?
> > > > > >> >> > >
> > > > > >> >> > > I've revised the KIP by adding two new metrics for
> > measuring
> > > > the
> > > > > >> >> IdlePct
> > > > > >> >> > > for the two additional threads.
> > > > > >> >> > > Please take a look again. Thanks!
> > > > > >> >> > >
> > > > > >> >> > > Lucas
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <jun@confluent.io
> >
> > > > wrote:
> > > > > >> >> > >
> > > > > >> >> > > > Hi, Lucas,
> > > > > >> >> > > >
> > > > > >> >> > > > Thanks for the updated KIP.
> > > > > >> >> > > >
> > > > > >> >> > > > For monitoring the network thread utilization for the
> > > control
> > > > > >> >> plane, we
> > > > > >> >> > > > already have the metric io-ratio at the selector level
> > > > (idlePct
> > > > > >> is
> > > > > >> >> 1 -
> > > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > > meaningful
> > > > > >> name.
> > > > > >> >> > > >
> > > > > >> >> > > > For monitoring the io thread utilization for the
> control
> > > > plane,
> > > > > >> it's
> > > > > >> >> > > > probably useful to have a separate metric for that. The
> > > > > >> controller
> > > > > >> >> > > request
> > > > > >> >> > > > queue size may not reflect the history in a window.
> > > > > >> >> > > >
> > > > > >> >> > > > Jun
> > > > > >> >> > > >
> > > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com>
> > > > > >> >> > > wrote:
> > > > > >> >> > > >
> > > > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> > > forgot
> > > > to
> > > > > >> >> click
> > > > > >> >> > the
> > > > > >> >> > > > > "Update" button, :)
> > > > > >> >> > > > > It's updated now.
> > > > > >> >> > > > >
> > > > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > > > threads,
> > > > > I
> > > > > >> >> > > discussed
> > > > > >> >> > > > > with Joel,
> > > > > >> >> > > > > and think they are not as useful, and I added our
> > > reasoning
> > > > > in
> > > > > >> the
> > > > > >> >> > last
> > > > > >> >> > > > > paragraph of the
> > > > > >> >> > > > > "How are controller requests handled over the
> dedicated
> > > > > >> >> connections?"
> > > > > >> >> > > > > section.
> > > > > >> >> > > > > On the other hand, we don't strongly oppose adding
> them
> > > if
> > > > > you
> > > > > >> >> think
> > > > > >> >> > > they
> > > > > >> >> > > > > are necessary.
> > > > > >> >> > > > >
> > > > > >> >> > > > > Thanks,
> > > > > >> >> > > > > Lucas
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> > jun@confluent.io
> > > >
> > > > > >> wrote:
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Hi, Lucas,
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Thanks for the reply. Have you actually updated the
> > > KIP?
> > > > > The
> > > > > >> >> wiki
> > > > > >> >> > > says
> > > > > >> >> > > > > that
> > > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
> changes
> > > > that
> > > > > >> you
> > > > > >> >> > > > mentioned
> > > > > >> >> > > > > > (#1 and #3) are not there.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Also, regarding Joel's comment on network/request
> > idle
> > > > > ratio
> > > > > >> >> > metrics,
> > > > > >> >> > > > > could
> > > > > >> >> > > > > > you comment on whether they include the new
> > controller
> > > > > >> >> listener? If
> > > > > >> >> > > > not,
> > > > > >> >> > > > > do
> > > > > >> >> > > > > > we need additional metrics to measure the
> utilization
> > > of
> > > > > the
> > > > > >> io
> > > > > >> >> > > thread
> > > > > >> >> > > > > for
> > > > > >> >> > > > > > the control plane?
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Jun
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > > >> >> lucasatucla@gmail.com
> > > > > >> >> > >
> > > > > >> >> > > > > wrote:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > > > >> >> > > > > > inter.broker.listener.name"
> > > > > >> >> > > > > > > are also used
> > > > > >> >> > > > > > > for replication traffic, and merging these two
> > types
> > > of
> > > > > >> >> request
> > > > > >> >> > to
> > > > > >> >> > > > the
> > > > > >> >> > > > > > > single threaded tunnel
> > > > > >> >> > > > > > > would defeat the purpose of this KIP and also
> hurt
> > > > > >> replication
> > > > > >> >> > > > > > throughput.
> > > > > >> >> > > > > > > So I think that means
> > > > > >> >> > > > > > > we should validate to make sure when the new
> config
> > > is
> > > > > set,
> > > > > >> >> it's
> > > > > >> >> > > > > > different
> > > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > > >> >> > > > > > > or "security.inter.broker.protocol", whichever is
> > > set.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster
> > are
> > > > > >> changed
> > > > > >> >> at
> > > > > >> >> > > the
> > > > > >> >> > > > > same
> > > > > >> >> > > > > > > time. If there is a typo in the
> > > > > >> >> > > > > > > controller.listener.name and it's not available
> in
> > > the
> > > > > >> >> endpoints
> > > > > >> >> > > > list,
> > > > > >> >> > > > > > we
> > > > > >> >> > > > > > > could catch it, give an error
> > > > > >> >> > > > > > > and block restart of the first broker in the
> > cluster.
> > > > > With
> > > > > >> >> that,
> > > > > >> >> > we
> > > > > >> >> > > > > could
> > > > > >> >> > > > > > > keep the current behavior
> > > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > > >> >> > "inter.broker.listener.nam"
> > > > > >> >> > > > when
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > "controller.listener.name"
> > > > > >> >> > > > > > > is not found during the migration phase of this
> > KIP.
> > > > > >> Thoughts?
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > Thanks,
> > > > > >> >> > > > > > > Lucas
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > > jun@confluent.io>
> > > > > >> >> > wrote:
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > > Hi, Lucas,
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks
> good
> > to
> > > > me
> > > > > >> >> > overall. A
> > > > > >> >> > > > few
> > > > > >> >> > > > > > > minor
> > > > > >> >> > > > > > > > comments below.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 1. It's possible that
> > listener.name.for.controller
> > > is
> > > > > >> set,
> > > > > >> >> but
> > > > > >> >> > > set
> > > > > >> >> > > > to
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > > same value as inter.broker.listener.name. In
> > that
> > > > > case,
> > > > > >> >> should
> > > > > >> >> > > we
> > > > > >> >> > > > > > have a
> > > > > >> >> > > > > > > > single network thread and the request handling
> > > thread
> > > > > for
> > > > > >> >> that
> > > > > >> >> > > > > > listener?
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > > > listener
> > > > > >> >> > specified
> > > > > >> >> > > by
> > > > > >> >> > > > > > > > inter.broker.listener.name even if the
> listener
> > > name
> > > > > is
> > > > > >> not
> > > > > >> >> > > > present
> > > > > >> >> > > > > in
> > > > > >> >> > > > > > > the
> > > > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > > > > different
> > > > > >> >> > approach
> > > > > >> >> > > > for
> > > > > >> >> > > > > > > > picking listener.name.for.controller only when
> > the
> > > > > >> receiving
> > > > > >> >> > end
> > > > > >> >> > > > has
> > > > > >> >> > > > > > the
> > > > > >> >> > > > > > > > listener and switches
> > listener.name.for.controller
> > > > > >> >> otherwise.
> > > > > >> >> > > There
> > > > > >> >> > > > > are
> > > > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> > > change
> > > > > the
> > > > > >> >> inter
> > > > > >> >> > > > broker
> > > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
> adding
> > > the
> > > > > new
> > > > > >> >> > > listener
> > > > > >> >> > > > to
> > > > > >> >> > > > > > > > listener list in every broker and (2) changing
> > > > > >> >> > > > > > > > listener.name.for.controller.
> > > > > >> >> > > > > > > > The latter can do both changes in 1 step. On
> the
> > > > hand,
> > > > > if
> > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > >> >> > > > > > > > is mis-configured, the former will report an
> > error
> > > > and
> > > > > >> the
> > > > > >> >> > latter
> > > > > >> >> > > > > will
> > > > > >> >> > > > > > > hide
> > > > > >> >> > > > > > > > it (so the user may not know the
> > misconfiguration).
> > > > It
> > > > > >> seems
> > > > > >> >> > that
> > > > > >> >> > > > we
> > > > > >> >> > > > > > > should
> > > > > >> >> > > > > > > > pick one approach to handle both
> > > > > >> >> listener.name.for.controller
> > > > > >> >> > and
> > > > > >> >> > > > > > > > inter.broker.listener.name consistently. To
> me,
> > > the
> > > > > >> former
> > > > > >> >> > seems
> > > > > >> >> > > > > > > slightly
> > > > > >> >> > > > > > > > better.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> > > should
> > > > > >> >> > > > > > > > listener.name.for.controller
> > > > > >> >> > > > > > > > be controller.listener.name?
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Thanks,
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Jun
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > > > >> >> > > lucasatucla@gmail.com>
> > > > > >> >> > > > > > > wrote:
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit
> > since
> > > > your
> > > > > >> +1.
> > > > > >> >> > > > > > > > > Can you please take another review? Thanks a
> > lot
> > > > for
> > > > > >> your
> > > > > >> >> > time!
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > Lucas
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy
> <
> > > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > > >> >> > > > > > > > wrote:
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > > introduce
> > > > > the
> > > > > >> >> > > condition
> > > > > >> >> > > > > > > > variables
> > > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's
> an
> > > > > >> >> implementation
> > > > > >> >> > > > > detail
> > > > > >> >> > > > > > > that
> > > > > >> >> > > > > > > > > we
> > > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas
> Wang
> > <
> > > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > > >> >> > > > > > > > > > wrote:
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > > > > Hi Jun,
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > > > I agree by using the conditional
> variables,
> > > > there
> > > > > >> is
> > > > > >> >> no
> > > > > >> >> > > need
> > > > > >> >> > > > to
> > > > > >> >> > > > > > add
> > > > > >> >> > > > > > > > > such
> > > > > >> >> > > > > > > > > > a
> > > > > >> >> > > > > > > > > > > new config.
> > > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > > > Lucas
> > > > > >> >> > > > > > > > > > >
> > > > > >> >> > > > > > > > > >
> > > > > >> >> > > > > > > > >
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Ismael Juma <is...@gmail.com>.
Have we considered control plane if we think control by itself is
ambiguous? I agree with the original concern that "controller" may be
confusing for something that affects all brokers.

Ismael


On 4 Oct 2018 11:08 am, "Lucas Wang" <lu...@gmail.com> wrote:

Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
Please take a look again when you have time.

Regards,
Lucas


On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> 200. That's a valid concern. So, we can probably just keep the current
> name.
>
> 201. I am thinking that you would upgrade in the same way as changing
> inter.broker.listener.name. This requires 2 rounds of rolling restart. In
> the first round, we add the controller endpoint to the listeners w/o
> setting controller.listener.name. In the second round, every broker sets
> controller.listener.name. At that point, the controller listener is ready
> in every broker.
>
> Thanks,
>
> Jun
>
> On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the further comments, Jun.
> >
> > 200. Currently in the code base, we have the term of "ControlBatch"
> related
> > to
> > idempotent/transactional producing. Do you think it's a concern for
> reusing
> > the term "control"?
> >
> > 201. It's not clear to me how it would work by following the same
> strategy
> > for "controller.listener.name".
> > Say the new controller has its "controller.listener.name" set to the
> value
> > "CONTROLLER", and broker 1
> > has picked up this KIP by announcing
> > "endpoints": [
> >         "CONTROLLER://broker1.example.com:9091",
> >         "INTERNAL://broker1.example.com:9092",
> >         "EXTERNAL://host1.example.com:9093"
> >     ],
> >
> > while broker2 has not picked up the change, and is announcing
> > "endpoints": [
> >         "INTERNAL://broker2.example.com:9092",
> >         "EXTERNAL://host2.example.com:9093"
> >     ],
> > to support both broker 1 for the new behavior and broker 2 for the old
> > behavior, it seems the controller must
> > check their published endpoints. Am I missing something?
> >
> > Thanks!
> > Lucas
> >
> > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > Sorry for the delay. The updated wiki looks good to me overall. Just a
> > > couple more minor comments.
> > >
> > > 200.
kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> > The
> > > name ControllerRequestQueueSize gives the impression that it's only
for
> > the
> > > controller broker. Perhaps we can just rename all metrics and configs
> > from
> > > controller to control. This indicates that the threads and the queues
> are
> > > for the control requests (as oppose to data requests).
> > >
> > > 201. "<new controller, old broker>: In this scenario, the controller
> will
> > > have the "controller.listener.name" config set to a value like
> > > "CONTROLLER", however the broker's exposed endpoints do not have an
> entry
> > > corresponding to the new listener name. Hence the controller should
> > > preserve the existing behavior by determining the endpoint using
> > > *inter-broker-listener-name *value. The end result should be the same
> > > behavior as today." Currently, the controller makes connections based
> on
> > > its local inter.broker.listener.name config without checking the
> target
> > > broker's ZK registration. For consistency, perhaps we can just follow
> the
> > > same strategy for controller.listener.name. This existing behavior
> seems
> > > simpler to understand and has the benefit of catching inconsistent
> > configs
> > > across brokers.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Sorry to bother you again. Can you please take a look at the wiki
> again
> > > > when you have time?
> > > >
> > > > Thanks a lot!
> > > > Lucas
> > > >
> > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks a lot for the detailed explanation.
> > > > > I've restored the wiki to a previous version that does not require
> > > config
> > > > > changes,
> > > > > and keeps the current behavior with the proposed changes turned
off
> > by
> > > > > default.
> > > > > I'd appreciate it if you can review it again.
> > > > >
> > > > > Thanks!
> > > > > Lucas
> > > > >
> > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > >> Hi, Lucas,
> > > > >>
> > > > >> When upgrading to a minor release, I think the expectation is
> that a
> > > > user
> > > > >> wouldn't need to make any config changes, other than the usual
> > > > >> inter.broker.protocol. If we require other config changes during
> an
> > > > >> upgrade, then it's probably better to do that in a major release.
> > > > >>
> > > > >> Regarding your proposal, I think removing host/advertised_host in
> > > favor
> > > > of
> > > > >> listeners:advertised_listeners seems useful regardless of this
> KIP.
> > > > >> However, that can probably wait until a major release.
> > > > >>
> > > > >> As for the controller listener, I am not sure if one has to set
> it.
> > To
> > > > >> make
> > > > >> a cluster healthy, one sort of have to make sure that the request
> > > queue
> > > > is
> > > > >> never full and no request will be sitting in the request queue
for
> > > long.
> > > > >> If
> > > > >> one does that, setting the controller listener may not be
> necessary.
> > > On
> > > > >> the
> > > > >> flip side, even if one sets the controller listener, but the
> request
> > > > queue
> > > > >> and the request time for the data part are still high, the
cluster
> > may
> > > > >> still not be healthy. Given that we have already started the 2.1
> > > release
> > > > >> planning, perhaps we can start with not requiring the controller
> > > > listener.
> > > > >> If this is indeed something that everyone wants to set, we can
> make
> > > it a
> > > > >> required config in a major release.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > @Jun Rao <ju...@confluent.io>
> > > > >> >
> > > > >> > I made the recent config changes after thinking about the
> default
> > > > >> behavior
> > > > >> > for adopting this KIP.
> > > > >> > I think there are basically two options:
> > > > >> > 1. By default, the behavior proposed in this KIP is turned off,
> > and
> > > > >> > operators can turn it
> > > > >> > on by adding the "controller.listener.name" config and entries
> in
> > > the
> > > > >> > "listeners" and "advertised.listeners" list.
> > > > >> > If no "controller.listener.name" is added, it'll be the *same
> as*
> > > > the "
> > > > >> > inter.broker.listener.name",
> > > > >> > and the proposed feature is effectively turned off.
> > > > >> > This has been the assumption in the KIP writeup before.
> > > > >> >
> > > > >> > 2. By default, the behavior proposed in this KIP is turned on,
> and
> > > > >> > operators are forced to
> > > > >> > recognize the proposed change if their "listeners" config is
set
> > > (this
> > > > >> is
> > > > >> > most likely in production environments),
> > > > >> > by allocating a new port for controller connections, and adding
> a
> > > new
> > > > >> > endpoint to the "listeners" config.
> > > > >> > For cases where "listeners" is not set explicitly,
> > > > >> > there needs to be a default value for it that includes the
> > > controller
> > > > >> > listener name,
> > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > >> >
> > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > >> > I naturally think in the long run, it's worth the effort to
> adopt
> > > this
> > > > >> > feature,
> > > > >> > in order to prevent issues under circumstances listed in the
> > > > motivation
> > > > >> > section.
> > > > >> >
> > > > >> > 100, following the argument above, I want to enforce the
> > separation
> > > > >> > between controller
> > > > >> > and data plane requests. Hence the "controller.listener.name"
> > > should
> > > > >> > never be the same
> > > > >> > as the "inter.broker.listener.name", which is intended for data
> > > plane
> > > > >> > requests.
> > > > >> >
> > > > >> > 101, the default value for "listeners" will be
> > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > >> > making values of "host", and "port" not being used under any
> > config
> > > > >> > settings.
> > > > >> > And the default value for "advertised.listeners" will be
derived
> > > from
> > > > >> > "listeners",
> > > > >> > making the values of "advertised.host", and "advertised.port"
> not
> > > > being
> > > > >> > used any more.
> > > > >> >
> > > > >> > 102, for upgrading, a single broker that has "listeners" and/or
> > > > >> > "advertised.listeners" set,
> > > > >> > must add a new endpoint for the CONTROLLER listener name, or
end
> > up
> > > > >> using
> > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > > > >> > During rolling upgrade, in cases of <old controller> + <new
> > broker>
> > > or
> > > > >> > <new controller>  + <old broker>
> > > > >> > we still need to fall back to the current behavior. However
> after
> > > the
> > > > >> > rolling upgrade is done,
> > > > >> > it is guaranteed that the controller plane and data plane are
> > > > separated,
> > > > >> > given
> > > > >> > the "controller.listener.name" must be different from "
> > > > >> > inter.broker.listener.name".
> > > > >> >
> > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > >> > Thanks for pointing that out. I did not know that.
> > > > >> > However my question is if the argument above makes sense, and
my
> > > code
> > > > >> > change
> > > > >> > causes the configs "host", "port", "advertised.host",
> > > > "advertised.port"
> > > > >> to
> > > > >> > be not used under any circumstance,
> > > > >> > then it's no different from removing them.
> > > > >> > Anyway if there is still a concern about removing them, is
> there a
> > > new
> > > > >> > major new version
> > > > >> > now or in the future where I can remove them?
> > > > >> >
> > > > >> > Thanks!
> > > > >> > Lucas
> > > > >> >
> > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk>
> > > > wrote:
> > > > >> >
> > > > >> >> To be clear, we can only remove configs in major new versions.
> > > > >> Otherwise,
> > > > >> >> we can only deprecate them.
> > > > >> >>
> > > > >> >> Ismael
> > > > >> >>
> > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > >> >>
> > > > >> >> > Hi, Lucas,
> > > > >> >> >
> > > > >> >> > For the network idlePct, your understanding is correct.
> > > Currently,
> > > > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > > > io-ratio)
> > > > >> in
> > > > >> >> the
> > > > >> >> > selector of all network threads.
> > > > >> >> >
> > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > >> >> >
> > > > >> >> > I am not still not quite sure about the configs.
> > > > >> >> >
> > > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> > > broker
> > > > >> >> startup,
> > > > >> >> > we will validate its value and make sure it's different from
> > the
> > > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > > >> >> > controller.listener.name has to be different from
> > > > >> >> > inter.broker.listener.name?
> > > > >> >> > That seems limiting.
> > > > >> >> >
> > > > >> >> > 101. The KIP says that advertised.listeners and listeners
> will
> > > now
> > > > >> have
> > > > >> >> a
> > > > >> >> > different default value including controller. Could you
> > document
> > > > what
> > > > >> >> the
> > > > >> >> > default value looks like?
> > > > >> >> >
> > > > >> >> > 102. About removing the the following configs. How does that
> > > affect
> > > > >> the
> > > > >> >> > upgrade path? Do we now expect a user to add a new config
> when
> > > > >> upgrading
> > > > >> >> > from an old version to a new one?
> > > > >> >> > host
> > > > >> >> > port
> > > > >> >> > advertised.host
> > > > >> >> > advertised.port
> > > > >> >> >
> > > > >> >> > Thanks,
> > > > >> >> >
> > > > >> >> > Jun
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > >> >> > >
> > > > >> >> > > One clarification, currently on the selector level, we
have
> > the
> > > > >> >> > > io-wait-ratio metric.
> > > > >> >> > > For the new controller *network* thread, we can use it
> > directly
> > > > for
> > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > >> >> > > so that the logic is similar to the current average
IdlePct
> > for
> > > > >> >> network
> > > > >> >> > > threads. Is that correct?
> > > > >> >> > >
> > > > >> >> > > I've revised the KIP by adding two new metrics for
> measuring
> > > the
> > > > >> >> IdlePct
> > > > >> >> > > for the two additional threads.
> > > > >> >> > > Please take a look again. Thanks!
> > > > >> >> > >
> > > > >> >> > > Lucas
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > >> >> > >
> > > > >> >> > > > Hi, Lucas,
> > > > >> >> > > >
> > > > >> >> > > > Thanks for the updated KIP.
> > > > >> >> > > >
> > > > >> >> > > > For monitoring the network thread utilization for the
> > control
> > > > >> >> plane, we
> > > > >> >> > > > already have the metric io-ratio at the selector level
> > > (idlePct
> > > > >> is
> > > > >> >> 1 -
> > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > meaningful
> > > > >> name.
> > > > >> >> > > >
> > > > >> >> > > > For monitoring the io thread utilization for the control
> > > plane,
> > > > >> it's
> > > > >> >> > > > probably useful to have a separate metric for that. The
> > > > >> controller
> > > > >> >> > > request
> > > > >> >> > > > queue size may not reflect the history in a window.
> > > > >> >> > > >
> > > > >> >> > > > Jun
> > > > >> >> > > >
> > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > >> lucasatucla@gmail.com>
> > > > >> >> > > wrote:
> > > > >> >> > > >
> > > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> > forgot
> > > to
> > > > >> >> click
> > > > >> >> > the
> > > > >> >> > > > > "Update" button, :)
> > > > >> >> > > > > It's updated now.
> > > > >> >> > > > >
> > > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > > threads,
> > > > I
> > > > >> >> > > discussed
> > > > >> >> > > > > with Joel,
> > > > >> >> > > > > and think they are not as useful, and I added our
> > reasoning
> > > > in
> > > > >> the
> > > > >> >> > last
> > > > >> >> > > > > paragraph of the
> > > > >> >> > > > > "How are controller requests handled over the
dedicated
> > > > >> >> connections?"
> > > > >> >> > > > > section.
> > > > >> >> > > > > On the other hand, we don't strongly oppose adding
them
> > if
> > > > you
> > > > >> >> think
> > > > >> >> > > they
> > > > >> >> > > > > are necessary.
> > > > >> >> > > > >
> > > > >> >> > > > > Thanks,
> > > > >> >> > > > > Lucas
> > > > >> >> > > > >
> > > > >> >> > > > >
> > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> jun@confluent.io
> > >
> > > > >> wrote:
> > > > >> >> > > > >
> > > > >> >> > > > > > Hi, Lucas,
> > > > >> >> > > > > >
> > > > >> >> > > > > > Thanks for the reply. Have you actually updated the
> > KIP?
> > > > The
> > > > >> >> wiki
> > > > >> >> > > says
> > > > >> >> > > > > that
> > > > >> >> > > > > > it's last updated on Aug. 22. and some of the
changes
> > > that
> > > > >> you
> > > > >> >> > > > mentioned
> > > > >> >> > > > > > (#1 and #3) are not there.
> > > > >> >> > > > > >
> > > > >> >> > > > > > Also, regarding Joel's comment on network/request
> idle
> > > > ratio
> > > > >> >> > metrics,
> > > > >> >> > > > > could
> > > > >> >> > > > > > you comment on whether they include the new
> controller
> > > > >> >> listener? If
> > > > >> >> > > > not,
> > > > >> >> > > > > do
> > > > >> >> > > > > > we need additional metrics to measure the
utilization
> > of
> > > > the
> > > > >> io
> > > > >> >> > > thread
> > > > >> >> > > > > for
> > > > >> >> > > > > > the control plane?
> > > > >> >> > > > > >
> > > > >> >> > > > > > Jun
> > > > >> >> > > > > >
> > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > >> >> lucasatucla@gmail.com
> > > > >> >> > >
> > > > >> >> > > > > wrote:
> > > > >> >> > > > > >
> > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > > >> >> > > > > > inter.broker.listener.name"
> > > > >> >> > > > > > > are also used
> > > > >> >> > > > > > > for replication traffic, and merging these two
> types
> > of
> > > > >> >> request
> > > > >> >> > to
> > > > >> >> > > > the
> > > > >> >> > > > > > > single threaded tunnel
> > > > >> >> > > > > > > would defeat the purpose of this KIP and also hurt
> > > > >> replication
> > > > >> >> > > > > > throughput.
> > > > >> >> > > > > > > So I think that means
> > > > >> >> > > > > > > we should validate to make sure when the new
config
> > is
> > > > set,
> > > > >> >> it's
> > > > >> >> > > > > > different
> > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > >> >> > > > > > > or "security.inter.broker.protocol", whichever is
> > set.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster
> are
> > > > >> changed
> > > > >> >> at
> > > > >> >> > > the
> > > > >> >> > > > > same
> > > > >> >> > > > > > > time. If there is a typo in the
> > > > >> >> > > > > > > controller.listener.name and it's not available in
> > the
> > > > >> >> endpoints
> > > > >> >> > > > list,
> > > > >> >> > > > > > we
> > > > >> >> > > > > > > could catch it, give an error
> > > > >> >> > > > > > > and block restart of the first broker in the
> cluster.
> > > > With
> > > > >> >> that,
> > > > >> >> > we
> > > > >> >> > > > > could
> > > > >> >> > > > > > > keep the current behavior
> > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > >> >> > "inter.broker.listener.nam"
> > > > >> >> > > > when
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > "controller.listener.name"
> > > > >> >> > > > > > > is not found during the migration phase of this
> KIP.
> > > > >> Thoughts?
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > Thanks,
> > > > >> >> > > > > > > Lucas
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > jun@confluent.io>
> > > > >> >> > wrote:
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > > Hi, Lucas,
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks good
> to
> > > me
> > > > >> >> > overall. A
> > > > >> >> > > > few
> > > > >> >> > > > > > > minor
> > > > >> >> > > > > > > > comments below.
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 1. It's possible that
> listener.name.for.controller
> > is
> > > > >> set,
> > > > >> >> but
> > > > >> >> > > set
> > > > >> >> > > > to
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > > same value as inter.broker.listener.name. In
> that
> > > > case,
> > > > >> >> should
> > > > >> >> > > we
> > > > >> >> > > > > > have a
> > > > >> >> > > > > > > > single network thread and the request handling
> > thread
> > > > for
> > > > >> >> that
> > > > >> >> > > > > > listener?
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > > listener
> > > > >> >> > specified
> > > > >> >> > > by
> > > > >> >> > > > > > > > inter.broker.listener.name even if the listener
> > name
> > > > is
> > > > >> not
> > > > >> >> > > > present
> > > > >> >> > > > > in
> > > > >> >> > > > > > > the
> > > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > > > different
> > > > >> >> > approach
> > > > >> >> > > > for
> > > > >> >> > > > > > > > picking listener.name.for.controller only when
> the
> > > > >> receiving
> > > > >> >> > end
> > > > >> >> > > > has
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > > listener and switches
> listener.name.for.controller
> > > > >> >> otherwise.
> > > > >> >> > > There
> > > > >> >> > > > > are
> > > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> > change
> > > > the
> > > > >> >> inter
> > > > >> >> > > > broker
> > > > >> >> > > > > > > > listener, the former requires 2 steps: (1)
adding
> > the
> > > > new
> > > > >> >> > > listener
> > > > >> >> > > > to
> > > > >> >> > > > > > > > listener list in every broker and (2) changing
> > > > >> >> > > > > > > > listener.name.for.controller.
> > > > >> >> > > > > > > > The latter can do both changes in 1 step. On the
> > > hand,
> > > > if
> > > > >> >> > > > > > > > listener.name.for.controller
> > > > >> >> > > > > > > > is mis-configured, the former will report an
> error
> > > and
> > > > >> the
> > > > >> >> > latter
> > > > >> >> > > > > will
> > > > >> >> > > > > > > hide
> > > > >> >> > > > > > > > it (so the user may not know the
> misconfiguration).
> > > It
> > > > >> seems
> > > > >> >> > that
> > > > >> >> > > > we
> > > > >> >> > > > > > > should
> > > > >> >> > > > > > > > pick one approach to handle both
> > > > >> >> listener.name.for.controller
> > > > >> >> > and
> > > > >> >> > > > > > > > inter.broker.listener.name consistently. To me,
> > the
> > > > >> former
> > > > >> >> > seems
> > > > >> >> > > > > > > slightly
> > > > >> >> > > > > > > > better.
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> > should
> > > > >> >> > > > > > > > listener.name.for.controller
> > > > >> >> > > > > > > > be controller.listener.name?
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Thanks,
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Jun
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > > >> >> > > lucasatucla@gmail.com>
> > > > >> >> > > > > > > wrote:
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit
> since
> > > your
> > > > >> +1.
> > > > >> >> > > > > > > > > Can you please take another review? Thanks a
> lot
> > > for
> > > > >> your
> > > > >> >> > time!
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > Lucas
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > >> >> > > > > > > > wrote:
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > introduce
> > > > the
> > > > >> >> > > condition
> > > > >> >> > > > > > > > variables
> > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's an
> > > > >> >> implementation
> > > > >> >> > > > > detail
> > > > >> >> > > > > > > that
> > > > >> >> > > > > > > > > we
> > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang
> <
> > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > >> >> > > > > > > > > > wrote:
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > > Hi Jun,
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > > > I agree by using the conditional
variables,
> > > there
> > > > >> is
> > > > >> >> no
> > > > >> >> > > need
> > > > >> >> > > > to
> > > > >> >> > > > > > add
> > > > >> >> > > > > > > > > such
> > > > >> >> > > > > > > > > > a
> > > > >> >> > > > > > > > > > > new config.
> > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > > > Lucas
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > >
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Thanks Jun. I've changed the KIP with the suggested 2 step upgrade.
Please take a look again when you have time.

Regards,
Lucas

On Thu, Oct 4, 2018 at 10:06 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> 200. That's a valid concern. So, we can probably just keep the current
> name.
>
> 201. I am thinking that you would upgrade in the same way as changing
> inter.broker.listener.name. This requires 2 rounds of rolling restart. In
> the first round, we add the controller endpoint to the listeners w/o
> setting controller.listener.name. In the second round, every broker sets
> controller.listener.name. At that point, the controller listener is ready
> in every broker.
>
> Thanks,
>
> Jun
>
> On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the further comments, Jun.
> >
> > 200. Currently in the code base, we have the term of "ControlBatch"
> related
> > to
> > idempotent/transactional producing. Do you think it's a concern for
> reusing
> > the term "control"?
> >
> > 201. It's not clear to me how it would work by following the same
> strategy
> > for "controller.listener.name".
> > Say the new controller has its "controller.listener.name" set to the
> value
> > "CONTROLLER", and broker 1
> > has picked up this KIP by announcing
> > "endpoints": [
> >         "CONTROLLER://broker1.example.com:9091",
> >         "INTERNAL://broker1.example.com:9092",
> >         "EXTERNAL://host1.example.com:9093"
> >     ],
> >
> > while broker2 has not picked up the change, and is announcing
> > "endpoints": [
> >         "INTERNAL://broker2.example.com:9092",
> >         "EXTERNAL://host2.example.com:9093"
> >     ],
> > to support both broker 1 for the new behavior and broker 2 for the old
> > behavior, it seems the controller must
> > check their published endpoints. Am I missing something?
> >
> > Thanks!
> > Lucas
> >
> > On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > Sorry for the delay. The updated wiki looks good to me overall. Just a
> > > couple more minor comments.
> > >
> > > 200. kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> > The
> > > name ControllerRequestQueueSize gives the impression that it's only for
> > the
> > > controller broker. Perhaps we can just rename all metrics and configs
> > from
> > > controller to control. This indicates that the threads and the queues
> are
> > > for the control requests (as oppose to data requests).
> > >
> > > 201. "<new controller, old broker>: In this scenario, the controller
> will
> > > have the "controller.listener.name" config set to a value like
> > > "CONTROLLER", however the broker's exposed endpoints do not have an
> entry
> > > corresponding to the new listener name. Hence the controller should
> > > preserve the existing behavior by determining the endpoint using
> > > *inter-broker-listener-name *value. The end result should be the same
> > > behavior as today." Currently, the controller makes connections based
> on
> > > its local inter.broker.listener.name config without checking the
> target
> > > broker's ZK registration. For consistency, perhaps we can just follow
> the
> > > same strategy for controller.listener.name. This existing behavior
> seems
> > > simpler to understand and has the benefit of catching inconsistent
> > configs
> > > across brokers.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Sorry to bother you again. Can you please take a look at the wiki
> again
> > > > when you have time?
> > > >
> > > > Thanks a lot!
> > > > Lucas
> > > >
> > > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks a lot for the detailed explanation.
> > > > > I've restored the wiki to a previous version that does not require
> > > config
> > > > > changes,
> > > > > and keeps the current behavior with the proposed changes turned off
> > by
> > > > > default.
> > > > > I'd appreciate it if you can review it again.
> > > > >
> > > > > Thanks!
> > > > > Lucas
> > > > >
> > > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > >> Hi, Lucas,
> > > > >>
> > > > >> When upgrading to a minor release, I think the expectation is
> that a
> > > > user
> > > > >> wouldn't need to make any config changes, other than the usual
> > > > >> inter.broker.protocol. If we require other config changes during
> an
> > > > >> upgrade, then it's probably better to do that in a major release.
> > > > >>
> > > > >> Regarding your proposal, I think removing host/advertised_host in
> > > favor
> > > > of
> > > > >> listeners:advertised_listeners seems useful regardless of this
> KIP.
> > > > >> However, that can probably wait until a major release.
> > > > >>
> > > > >> As for the controller listener, I am not sure if one has to set
> it.
> > To
> > > > >> make
> > > > >> a cluster healthy, one sort of have to make sure that the request
> > > queue
> > > > is
> > > > >> never full and no request will be sitting in the request queue for
> > > long.
> > > > >> If
> > > > >> one does that, setting the controller listener may not be
> necessary.
> > > On
> > > > >> the
> > > > >> flip side, even if one sets the controller listener, but the
> request
> > > > queue
> > > > >> and the request time for the data part are still high, the cluster
> > may
> > > > >> still not be healthy. Given that we have already started the 2.1
> > > release
> > > > >> planning, perhaps we can start with not requiring the controller
> > > > listener.
> > > > >> If this is indeed something that everyone wants to set, we can
> make
> > > it a
> > > > >> required config in a major release.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > @Jun Rao <ju...@confluent.io>
> > > > >> >
> > > > >> > I made the recent config changes after thinking about the
> default
> > > > >> behavior
> > > > >> > for adopting this KIP.
> > > > >> > I think there are basically two options:
> > > > >> > 1. By default, the behavior proposed in this KIP is turned off,
> > and
> > > > >> > operators can turn it
> > > > >> > on by adding the "controller.listener.name" config and entries
> in
> > > the
> > > > >> > "listeners" and "advertised.listeners" list.
> > > > >> > If no "controller.listener.name" is added, it'll be the *same
> as*
> > > > the "
> > > > >> > inter.broker.listener.name",
> > > > >> > and the proposed feature is effectively turned off.
> > > > >> > This has been the assumption in the KIP writeup before.
> > > > >> >
> > > > >> > 2. By default, the behavior proposed in this KIP is turned on,
> and
> > > > >> > operators are forced to
> > > > >> > recognize the proposed change if their "listeners" config is set
> > > (this
> > > > >> is
> > > > >> > most likely in production environments),
> > > > >> > by allocating a new port for controller connections, and adding
> a
> > > new
> > > > >> > endpoint to the "listeners" config.
> > > > >> > For cases where "listeners" is not set explicitly,
> > > > >> > there needs to be a default value for it that includes the
> > > controller
> > > > >> > listener name,
> > > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > > >> >
> > > > >> > I chose to go with option 2 since as author of this KIP,
> > > > >> > I naturally think in the long run, it's worth the effort to
> adopt
> > > this
> > > > >> > feature,
> > > > >> > in order to prevent issues under circumstances listed in the
> > > > motivation
> > > > >> > section.
> > > > >> >
> > > > >> > 100, following the argument above, I want to enforce the
> > separation
> > > > >> > between controller
> > > > >> > and data plane requests. Hence the "controller.listener.name"
> > > should
> > > > >> > never be the same
> > > > >> > as the "inter.broker.listener.name", which is intended for data
> > > plane
> > > > >> > requests.
> > > > >> >
> > > > >> > 101, the default value for "listeners" will be
> > > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > > >> > making values of "host", and "port" not being used under any
> > config
> > > > >> > settings.
> > > > >> > And the default value for "advertised.listeners" will be derived
> > > from
> > > > >> > "listeners",
> > > > >> > making the values of "advertised.host", and "advertised.port"
> not
> > > > being
> > > > >> > used any more.
> > > > >> >
> > > > >> > 102, for upgrading, a single broker that has "listeners" and/or
> > > > >> > "advertised.listeners" set,
> > > > >> > must add a new endpoint for the CONTROLLER listener name, or end
> > up
> > > > >> using
> > > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > > > >> > During rolling upgrade, in cases of <old controller> + <new
> > broker>
> > > or
> > > > >> > <new controller>  + <old broker>
> > > > >> > we still need to fall back to the current behavior. However
> after
> > > the
> > > > >> > rolling upgrade is done,
> > > > >> > it is guaranteed that the controller plane and data plane are
> > > > separated,
> > > > >> > given
> > > > >> > the "controller.listener.name" must be different from "
> > > > >> > inter.broker.listener.name".
> > > > >> >
> > > > >> > @Ismael Juma <is...@juma.me.uk>
> > > > >> > Thanks for pointing that out. I did not know that.
> > > > >> > However my question is if the argument above makes sense, and my
> > > code
> > > > >> > change
> > > > >> > causes the configs "host", "port", "advertised.host",
> > > > "advertised.port"
> > > > >> to
> > > > >> > be not used under any circumstance,
> > > > >> > then it's no different from removing them.
> > > > >> > Anyway if there is still a concern about removing them, is
> there a
> > > new
> > > > >> > major new version
> > > > >> > now or in the future where I can remove them?
> > > > >> >
> > > > >> > Thanks!
> > > > >> > Lucas
> > > > >> >
> > > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk>
> > > > wrote:
> > > > >> >
> > > > >> >> To be clear, we can only remove configs in major new versions.
> > > > >> Otherwise,
> > > > >> >> we can only deprecate them.
> > > > >> >>
> > > > >> >> Ismael
> > > > >> >>
> > > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > >> >>
> > > > >> >> > Hi, Lucas,
> > > > >> >> >
> > > > >> >> > For the network idlePct, your understanding is correct.
> > > Currently,
> > > > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > > > io-ratio)
> > > > >> in
> > > > >> >> the
> > > > >> >> > selector of all network threads.
> > > > >> >> >
> > > > >> >> > The metrics part looks good to me in the updated KIP.
> > > > >> >> >
> > > > >> >> > I am not still not quite sure about the configs.
> > > > >> >> >
> > > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> > > broker
> > > > >> >> startup,
> > > > >> >> > we will validate its value and make sure it's different from
> > the
> > > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > > >> >> > controller.listener.name has to be different from
> > > > >> >> > inter.broker.listener.name?
> > > > >> >> > That seems limiting.
> > > > >> >> >
> > > > >> >> > 101. The KIP says that advertised.listeners and listeners
> will
> > > now
> > > > >> have
> > > > >> >> a
> > > > >> >> > different default value including controller. Could you
> > document
> > > > what
> > > > >> >> the
> > > > >> >> > default value looks like?
> > > > >> >> >
> > > > >> >> > 102. About removing the the following configs. How does that
> > > affect
> > > > >> the
> > > > >> >> > upgrade path? Do we now expect a user to add a new config
> when
> > > > >> upgrading
> > > > >> >> > from an old version to a new one?
> > > > >> >> > host
> > > > >> >> > port
> > > > >> >> > advertised.host
> > > > >> >> > advertised.port
> > > > >> >> >
> > > > >> >> > Thanks,
> > > > >> >> >
> > > > >> >> > Jun
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > > >> >> > >
> > > > >> >> > > One clarification, currently on the selector level, we have
> > the
> > > > >> >> > > io-wait-ratio metric.
> > > > >> >> > > For the new controller *network* thread, we can use it
> > directly
> > > > for
> > > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > > >> >> > > so that the logic is similar to the current average IdlePct
> > for
> > > > >> >> network
> > > > >> >> > > threads. Is that correct?
> > > > >> >> > >
> > > > >> >> > > I've revised the KIP by adding two new metrics for
> measuring
> > > the
> > > > >> >> IdlePct
> > > > >> >> > > for the two additional threads.
> > > > >> >> > > Please take a look again. Thanks!
> > > > >> >> > >
> > > > >> >> > > Lucas
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > >> >> > >
> > > > >> >> > > > Hi, Lucas,
> > > > >> >> > > >
> > > > >> >> > > > Thanks for the updated KIP.
> > > > >> >> > > >
> > > > >> >> > > > For monitoring the network thread utilization for the
> > control
> > > > >> >> plane, we
> > > > >> >> > > > already have the metric io-ratio at the selector level
> > > (idlePct
> > > > >> is
> > > > >> >> 1 -
> > > > >> >> > > > io-ratio). So, we just need to give that selector a
> > > meaningful
> > > > >> name.
> > > > >> >> > > >
> > > > >> >> > > > For monitoring the io thread utilization for the control
> > > plane,
> > > > >> it's
> > > > >> >> > > > probably useful to have a separate metric for that. The
> > > > >> controller
> > > > >> >> > > request
> > > > >> >> > > > queue size may not reflect the history in a window.
> > > > >> >> > > >
> > > > >> >> > > > Jun
> > > > >> >> > > >
> > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > > >> lucasatucla@gmail.com>
> > > > >> >> > > wrote:
> > > > >> >> > > >
> > > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> > forgot
> > > to
> > > > >> >> click
> > > > >> >> > the
> > > > >> >> > > > > "Update" button, :)
> > > > >> >> > > > > It's updated now.
> > > > >> >> > > > >
> > > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > > threads,
> > > > I
> > > > >> >> > > discussed
> > > > >> >> > > > > with Joel,
> > > > >> >> > > > > and think they are not as useful, and I added our
> > reasoning
> > > > in
> > > > >> the
> > > > >> >> > last
> > > > >> >> > > > > paragraph of the
> > > > >> >> > > > > "How are controller requests handled over the dedicated
> > > > >> >> connections?"
> > > > >> >> > > > > section.
> > > > >> >> > > > > On the other hand, we don't strongly oppose adding them
> > if
> > > > you
> > > > >> >> think
> > > > >> >> > > they
> > > > >> >> > > > > are necessary.
> > > > >> >> > > > >
> > > > >> >> > > > > Thanks,
> > > > >> >> > > > > Lucas
> > > > >> >> > > > >
> > > > >> >> > > > >
> > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <
> jun@confluent.io
> > >
> > > > >> wrote:
> > > > >> >> > > > >
> > > > >> >> > > > > > Hi, Lucas,
> > > > >> >> > > > > >
> > > > >> >> > > > > > Thanks for the reply. Have you actually updated the
> > KIP?
> > > > The
> > > > >> >> wiki
> > > > >> >> > > says
> > > > >> >> > > > > that
> > > > >> >> > > > > > it's last updated on Aug. 22. and some of the changes
> > > that
> > > > >> you
> > > > >> >> > > > mentioned
> > > > >> >> > > > > > (#1 and #3) are not there.
> > > > >> >> > > > > >
> > > > >> >> > > > > > Also, regarding Joel's comment on network/request
> idle
> > > > ratio
> > > > >> >> > metrics,
> > > > >> >> > > > > could
> > > > >> >> > > > > > you comment on whether they include the new
> controller
> > > > >> >> listener? If
> > > > >> >> > > > not,
> > > > >> >> > > > > do
> > > > >> >> > > > > > we need additional metrics to measure the utilization
> > of
> > > > the
> > > > >> io
> > > > >> >> > > thread
> > > > >> >> > > > > for
> > > > >> >> > > > > > the control plane?
> > > > >> >> > > > > >
> > > > >> >> > > > > > Jun
> > > > >> >> > > > > >
> > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > > >> >> lucasatucla@gmail.com
> > > > >> >> > >
> > > > >> >> > > > > wrote:
> > > > >> >> > > > > >
> > > > >> >> > > > > > > Thanks for the comments, Jun.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > > >> >> > > > > > inter.broker.listener.name"
> > > > >> >> > > > > > > are also used
> > > > >> >> > > > > > > for replication traffic, and merging these two
> types
> > of
> > > > >> >> request
> > > > >> >> > to
> > > > >> >> > > > the
> > > > >> >> > > > > > > single threaded tunnel
> > > > >> >> > > > > > > would defeat the purpose of this KIP and also hurt
> > > > >> replication
> > > > >> >> > > > > > throughput.
> > > > >> >> > > > > > > So I think that means
> > > > >> >> > > > > > > we should validate to make sure when the new config
> > is
> > > > set,
> > > > >> >> it's
> > > > >> >> > > > > > different
> > > > >> >> > > > > > > from "inter.broker.listener.name"
> > > > >> >> > > > > > > or "security.inter.broker.protocol", whichever is
> > set.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster
> are
> > > > >> changed
> > > > >> >> at
> > > > >> >> > > the
> > > > >> >> > > > > same
> > > > >> >> > > > > > > time. If there is a typo in the
> > > > >> >> > > > > > > controller.listener.name and it's not available in
> > the
> > > > >> >> endpoints
> > > > >> >> > > > list,
> > > > >> >> > > > > > we
> > > > >> >> > > > > > > could catch it, give an error
> > > > >> >> > > > > > > and block restart of the first broker in the
> cluster.
> > > > With
> > > > >> >> that,
> > > > >> >> > we
> > > > >> >> > > > > could
> > > > >> >> > > > > > > keep the current behavior
> > > > >> >> > > > > > > in the KIP write up that falls back to
> > > > >> >> > "inter.broker.listener.nam"
> > > > >> >> > > > when
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > "controller.listener.name"
> > > > >> >> > > > > > > is not found during the migration phase of this
> KIP.
> > > > >> Thoughts?
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > Thanks,
> > > > >> >> > > > > > > Lucas
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > > jun@confluent.io>
> > > > >> >> > wrote:
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > > Hi, Lucas,
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks good
> to
> > > me
> > > > >> >> > overall. A
> > > > >> >> > > > few
> > > > >> >> > > > > > > minor
> > > > >> >> > > > > > > > comments below.
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 1. It's possible that
> listener.name.for.controller
> > is
> > > > >> set,
> > > > >> >> but
> > > > >> >> > > set
> > > > >> >> > > > to
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > > same value as inter.broker.listener.name. In
> that
> > > > case,
> > > > >> >> should
> > > > >> >> > > we
> > > > >> >> > > > > > have a
> > > > >> >> > > > > > > > single network thread and the request handling
> > thread
> > > > for
> > > > >> >> that
> > > > >> >> > > > > > listener?
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > > listener
> > > > >> >> > specified
> > > > >> >> > > by
> > > > >> >> > > > > > > > inter.broker.listener.name even if the listener
> > name
> > > > is
> > > > >> not
> > > > >> >> > > > present
> > > > >> >> > > > > in
> > > > >> >> > > > > > > the
> > > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > > > different
> > > > >> >> > approach
> > > > >> >> > > > for
> > > > >> >> > > > > > > > picking listener.name.for.controller only when
> the
> > > > >> receiving
> > > > >> >> > end
> > > > >> >> > > > has
> > > > >> >> > > > > > the
> > > > >> >> > > > > > > > listener and switches
> listener.name.for.controller
> > > > >> >> otherwise.
> > > > >> >> > > There
> > > > >> >> > > > > are
> > > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> > change
> > > > the
> > > > >> >> inter
> > > > >> >> > > > broker
> > > > >> >> > > > > > > > listener, the former requires 2 steps: (1) adding
> > the
> > > > new
> > > > >> >> > > listener
> > > > >> >> > > > to
> > > > >> >> > > > > > > > listener list in every broker and (2) changing
> > > > >> >> > > > > > > > listener.name.for.controller.
> > > > >> >> > > > > > > > The latter can do both changes in 1 step. On the
> > > hand,
> > > > if
> > > > >> >> > > > > > > > listener.name.for.controller
> > > > >> >> > > > > > > > is mis-configured, the former will report an
> error
> > > and
> > > > >> the
> > > > >> >> > latter
> > > > >> >> > > > > will
> > > > >> >> > > > > > > hide
> > > > >> >> > > > > > > > it (so the user may not know the
> misconfiguration).
> > > It
> > > > >> seems
> > > > >> >> > that
> > > > >> >> > > > we
> > > > >> >> > > > > > > should
> > > > >> >> > > > > > > > pick one approach to handle both
> > > > >> >> listener.name.for.controller
> > > > >> >> > and
> > > > >> >> > > > > > > > inter.broker.listener.name consistently. To me,
> > the
> > > > >> former
> > > > >> >> > seems
> > > > >> >> > > > > > > slightly
> > > > >> >> > > > > > > > better.
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> > should
> > > > >> >> > > > > > > > listener.name.for.controller
> > > > >> >> > > > > > > > be controller.listener.name?
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Thanks,
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > Jun
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > > >> >> > > lucasatucla@gmail.com>
> > > > >> >> > > > > > > wrote:
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > > > > Hi Jun and Joel,
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit
> since
> > > your
> > > > >> +1.
> > > > >> >> > > > > > > > > Can you please take another review? Thanks a
> lot
> > > for
> > > > >> your
> > > > >> >> > time!
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > Lucas
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > > > >> >> > > > jjkoshy.w@gmail.com>
> > > > >> >> > > > > > > > wrote:
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > > > > +1 on the KIP.
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> > introduce
> > > > the
> > > > >> >> > > condition
> > > > >> >> > > > > > > > variables
> > > > >> >> > > > > > > > > > for the concern that Jun raised, but it's an
> > > > >> >> implementation
> > > > >> >> > > > > detail
> > > > >> >> > > > > > > that
> > > > >> >> > > > > > > > > we
> > > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang
> <
> > > > >> >> > > > > > lucasatucla@gmail.com>
> > > > >> >> > > > > > > > > > wrote:
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > > > > Hi Jun,
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > > > I agree by using the conditional variables,
> > > there
> > > > >> is
> > > > >> >> no
> > > > >> >> > > need
> > > > >> >> > > > to
> > > > >> >> > > > > > add
> > > > >> >> > > > > > > > > such
> > > > >> >> > > > > > > > > > a
> > > > >> >> > > > > > > > > > > new config.
> > > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > > > Lucas
> > > > >> >> > > > > > > > > > >
> > > > >> >> > > > > > > > > >
> > > > >> >> > > > > > > > >
> > > > >> >> > > > > > > >
> > > > >> >> > > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > >
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

200. That's a valid concern. So, we can probably just keep the current name.

201. I am thinking that you would upgrade in the same way as changing
inter.broker.listener.name. This requires 2 rounds of rolling restart. In
the first round, we add the controller endpoint to the listeners w/o
setting controller.listener.name. In the second round, every broker sets
controller.listener.name. At that point, the controller listener is ready
in every broker.

Thanks,

Jun

On Tue, Oct 2, 2018 at 10:38 AM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the further comments, Jun.
>
> 200. Currently in the code base, we have the term of "ControlBatch" related
> to
> idempotent/transactional producing. Do you think it's a concern for reusing
> the term "control"?
>
> 201. It's not clear to me how it would work by following the same strategy
> for "controller.listener.name".
> Say the new controller has its "controller.listener.name" set to the value
> "CONTROLLER", and broker 1
> has picked up this KIP by announcing
> "endpoints": [
>         "CONTROLLER://broker1.example.com:9091",
>         "INTERNAL://broker1.example.com:9092",
>         "EXTERNAL://host1.example.com:9093"
>     ],
>
> while broker2 has not picked up the change, and is announcing
> "endpoints": [
>         "INTERNAL://broker2.example.com:9092",
>         "EXTERNAL://host2.example.com:9093"
>     ],
> to support both broker 1 for the new behavior and broker 2 for the old
> behavior, it seems the controller must
> check their published endpoints. Am I missing something?
>
> Thanks!
> Lucas
>
> On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > Sorry for the delay. The updated wiki looks good to me overall. Just a
> > couple more minor comments.
> >
> > 200. kafka.network:name=ControllerRequestQueueSize,type=RequestChannel:
> The
> > name ControllerRequestQueueSize gives the impression that it's only for
> the
> > controller broker. Perhaps we can just rename all metrics and configs
> from
> > controller to control. This indicates that the threads and the queues are
> > for the control requests (as oppose to data requests).
> >
> > 201. "<new controller, old broker>: In this scenario, the controller will
> > have the "controller.listener.name" config set to a value like
> > "CONTROLLER", however the broker's exposed endpoints do not have an entry
> > corresponding to the new listener name. Hence the controller should
> > preserve the existing behavior by determining the endpoint using
> > *inter-broker-listener-name *value. The end result should be the same
> > behavior as today." Currently, the controller makes connections based on
> > its local inter.broker.listener.name config without checking the target
> > broker's ZK registration. For consistency, perhaps we can just follow the
> > same strategy for controller.listener.name. This existing behavior seems
> > simpler to understand and has the benefit of catching inconsistent
> configs
> > across brokers.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > Sorry to bother you again. Can you please take a look at the wiki again
> > > when you have time?
> > >
> > > Thanks a lot!
> > > Lucas
> > >
> > > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks a lot for the detailed explanation.
> > > > I've restored the wiki to a previous version that does not require
> > config
> > > > changes,
> > > > and keeps the current behavior with the proposed changes turned off
> by
> > > > default.
> > > > I'd appreciate it if you can review it again.
> > > >
> > > > Thanks!
> > > > Lucas
> > > >
> > > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > >> Hi, Lucas,
> > > >>
> > > >> When upgrading to a minor release, I think the expectation is that a
> > > user
> > > >> wouldn't need to make any config changes, other than the usual
> > > >> inter.broker.protocol. If we require other config changes during an
> > > >> upgrade, then it's probably better to do that in a major release.
> > > >>
> > > >> Regarding your proposal, I think removing host/advertised_host in
> > favor
> > > of
> > > >> listeners:advertised_listeners seems useful regardless of this KIP.
> > > >> However, that can probably wait until a major release.
> > > >>
> > > >> As for the controller listener, I am not sure if one has to set it.
> To
> > > >> make
> > > >> a cluster healthy, one sort of have to make sure that the request
> > queue
> > > is
> > > >> never full and no request will be sitting in the request queue for
> > long.
> > > >> If
> > > >> one does that, setting the controller listener may not be necessary.
> > On
> > > >> the
> > > >> flip side, even if one sets the controller listener, but the request
> > > queue
> > > >> and the request time for the data part are still high, the cluster
> may
> > > >> still not be healthy. Given that we have already started the 2.1
> > release
> > > >> planning, perhaps we can start with not requiring the controller
> > > listener.
> > > >> If this is indeed something that everyone wants to set, we can make
> > it a
> > > >> required config in a major release.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > @Jun Rao <ju...@confluent.io>
> > > >> >
> > > >> > I made the recent config changes after thinking about the default
> > > >> behavior
> > > >> > for adopting this KIP.
> > > >> > I think there are basically two options:
> > > >> > 1. By default, the behavior proposed in this KIP is turned off,
> and
> > > >> > operators can turn it
> > > >> > on by adding the "controller.listener.name" config and entries in
> > the
> > > >> > "listeners" and "advertised.listeners" list.
> > > >> > If no "controller.listener.name" is added, it'll be the *same as*
> > > the "
> > > >> > inter.broker.listener.name",
> > > >> > and the proposed feature is effectively turned off.
> > > >> > This has been the assumption in the KIP writeup before.
> > > >> >
> > > >> > 2. By default, the behavior proposed in this KIP is turned on, and
> > > >> > operators are forced to
> > > >> > recognize the proposed change if their "listeners" config is set
> > (this
> > > >> is
> > > >> > most likely in production environments),
> > > >> > by allocating a new port for controller connections, and adding a
> > new
> > > >> > endpoint to the "listeners" config.
> > > >> > For cases where "listeners" is not set explicitly,
> > > >> > there needs to be a default value for it that includes the
> > controller
> > > >> > listener name,
> > > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > > >> >
> > > >> > I chose to go with option 2 since as author of this KIP,
> > > >> > I naturally think in the long run, it's worth the effort to adopt
> > this
> > > >> > feature,
> > > >> > in order to prevent issues under circumstances listed in the
> > > motivation
> > > >> > section.
> > > >> >
> > > >> > 100, following the argument above, I want to enforce the
> separation
> > > >> > between controller
> > > >> > and data plane requests. Hence the "controller.listener.name"
> > should
> > > >> > never be the same
> > > >> > as the "inter.broker.listener.name", which is intended for data
> > plane
> > > >> > requests.
> > > >> >
> > > >> > 101, the default value for "listeners" will be
> > > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > > >> > making values of "host", and "port" not being used under any
> config
> > > >> > settings.
> > > >> > And the default value for "advertised.listeners" will be derived
> > from
> > > >> > "listeners",
> > > >> > making the values of "advertised.host", and "advertised.port" not
> > > being
> > > >> > used any more.
> > > >> >
> > > >> > 102, for upgrading, a single broker that has "listeners" and/or
> > > >> > "advertised.listeners" set,
> > > >> > must add a new endpoint for the CONTROLLER listener name, or end
> up
> > > >> using
> > > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > > >> > During rolling upgrade, in cases of <old controller> + <new
> broker>
> > or
> > > >> > <new controller>  + <old broker>
> > > >> > we still need to fall back to the current behavior. However after
> > the
> > > >> > rolling upgrade is done,
> > > >> > it is guaranteed that the controller plane and data plane are
> > > separated,
> > > >> > given
> > > >> > the "controller.listener.name" must be different from "
> > > >> > inter.broker.listener.name".
> > > >> >
> > > >> > @Ismael Juma <is...@juma.me.uk>
> > > >> > Thanks for pointing that out. I did not know that.
> > > >> > However my question is if the argument above makes sense, and my
> > code
> > > >> > change
> > > >> > causes the configs "host", "port", "advertised.host",
> > > "advertised.port"
> > > >> to
> > > >> > be not used under any circumstance,
> > > >> > then it's no different from removing them.
> > > >> > Anyway if there is still a concern about removing them, is there a
> > new
> > > >> > major new version
> > > >> > now or in the future where I can remove them?
> > > >> >
> > > >> > Thanks!
> > > >> > Lucas
> > > >> >
> > > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > >> >
> > > >> >> To be clear, we can only remove configs in major new versions.
> > > >> Otherwise,
> > > >> >> we can only deprecate them.
> > > >> >>
> > > >> >> Ismael
> > > >> >>
> > > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io>
> wrote:
> > > >> >>
> > > >> >> > Hi, Lucas,
> > > >> >> >
> > > >> >> > For the network idlePct, your understanding is correct.
> > Currently,
> > > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > > io-ratio)
> > > >> in
> > > >> >> the
> > > >> >> > selector of all network threads.
> > > >> >> >
> > > >> >> > The metrics part looks good to me in the updated KIP.
> > > >> >> >
> > > >> >> > I am not still not quite sure about the configs.
> > > >> >> >
> > > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> > broker
> > > >> >> startup,
> > > >> >> > we will validate its value and make sure it's different from
> the
> > > >> >> > *inter-broker-listener-name *value." Does that mean that
> > > >> >> > controller.listener.name has to be different from
> > > >> >> > inter.broker.listener.name?
> > > >> >> > That seems limiting.
> > > >> >> >
> > > >> >> > 101. The KIP says that advertised.listeners and listeners will
> > now
> > > >> have
> > > >> >> a
> > > >> >> > different default value including controller. Could you
> document
> > > what
> > > >> >> the
> > > >> >> > default value looks like?
> > > >> >> >
> > > >> >> > 102. About removing the the following configs. How does that
> > affect
> > > >> the
> > > >> >> > upgrade path? Do we now expect a user to add a new config when
> > > >> upgrading
> > > >> >> > from an old version to a new one?
> > > >> >> > host
> > > >> >> > port
> > > >> >> > advertised.host
> > > >> >> > advertised.port
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> >
> > > >> >> > Jun
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> > > @Jun Rao <ju...@confluent.io>
> > > >> >> > >
> > > >> >> > > One clarification, currently on the selector level, we have
> the
> > > >> >> > > io-wait-ratio metric.
> > > >> >> > > For the new controller *network* thread, we can use it
> directly
> > > for
> > > >> >> > > IdlePct, instead of using 1- io-ratio,
> > > >> >> > > so that the logic is similar to the current average IdlePct
> for
> > > >> >> network
> > > >> >> > > threads. Is that correct?
> > > >> >> > >
> > > >> >> > > I've revised the KIP by adding two new metrics for measuring
> > the
> > > >> >> IdlePct
> > > >> >> > > for the two additional threads.
> > > >> >> > > Please take a look again. Thanks!
> > > >> >> > >
> > > >> >> > > Lucas
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > >> >> > >
> > > >> >> > > > Hi, Lucas,
> > > >> >> > > >
> > > >> >> > > > Thanks for the updated KIP.
> > > >> >> > > >
> > > >> >> > > > For monitoring the network thread utilization for the
> control
> > > >> >> plane, we
> > > >> >> > > > already have the metric io-ratio at the selector level
> > (idlePct
> > > >> is
> > > >> >> 1 -
> > > >> >> > > > io-ratio). So, we just need to give that selector a
> > meaningful
> > > >> name.
> > > >> >> > > >
> > > >> >> > > > For monitoring the io thread utilization for the control
> > plane,
> > > >> it's
> > > >> >> > > > probably useful to have a separate metric for that. The
> > > >> controller
> > > >> >> > > request
> > > >> >> > > > queue size may not reflect the history in a window.
> > > >> >> > > >
> > > >> >> > > > Jun
> > > >> >> > > >
> > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > > >> lucasatucla@gmail.com>
> > > >> >> > > wrote:
> > > >> >> > > >
> > > >> >> > > > > Thanks Jun for your quick response. It looks like I
> forgot
> > to
> > > >> >> click
> > > >> >> > the
> > > >> >> > > > > "Update" button, :)
> > > >> >> > > > > It's updated now.
> > > >> >> > > > >
> > > >> >> > > > > Regarding the idle ratio metrics for the additional
> > threads,
> > > I
> > > >> >> > > discussed
> > > >> >> > > > > with Joel,
> > > >> >> > > > > and think they are not as useful, and I added our
> reasoning
> > > in
> > > >> the
> > > >> >> > last
> > > >> >> > > > > paragraph of the
> > > >> >> > > > > "How are controller requests handled over the dedicated
> > > >> >> connections?"
> > > >> >> > > > > section.
> > > >> >> > > > > On the other hand, we don't strongly oppose adding them
> if
> > > you
> > > >> >> think
> > > >> >> > > they
> > > >> >> > > > > are necessary.
> > > >> >> > > > >
> > > >> >> > > > > Thanks,
> > > >> >> > > > > Lucas
> > > >> >> > > > >
> > > >> >> > > > >
> > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <jun@confluent.io
> >
> > > >> wrote:
> > > >> >> > > > >
> > > >> >> > > > > > Hi, Lucas,
> > > >> >> > > > > >
> > > >> >> > > > > > Thanks for the reply. Have you actually updated the
> KIP?
> > > The
> > > >> >> wiki
> > > >> >> > > says
> > > >> >> > > > > that
> > > >> >> > > > > > it's last updated on Aug. 22. and some of the changes
> > that
> > > >> you
> > > >> >> > > > mentioned
> > > >> >> > > > > > (#1 and #3) are not there.
> > > >> >> > > > > >
> > > >> >> > > > > > Also, regarding Joel's comment on network/request idle
> > > ratio
> > > >> >> > metrics,
> > > >> >> > > > > could
> > > >> >> > > > > > you comment on whether they include the new controller
> > > >> >> listener? If
> > > >> >> > > > not,
> > > >> >> > > > > do
> > > >> >> > > > > > we need additional metrics to measure the utilization
> of
> > > the
> > > >> io
> > > >> >> > > thread
> > > >> >> > > > > for
> > > >> >> > > > > > the control plane?
> > > >> >> > > > > >
> > > >> >> > > > > > Jun
> > > >> >> > > > > >
> > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > > >> >> lucasatucla@gmail.com
> > > >> >> > >
> > > >> >> > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > > > > > > Thanks for the comments, Jun.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 1. I think the answer should be no, since the "
> > > >> >> > > > > > inter.broker.listener.name"
> > > >> >> > > > > > > are also used
> > > >> >> > > > > > > for replication traffic, and merging these two types
> of
> > > >> >> request
> > > >> >> > to
> > > >> >> > > > the
> > > >> >> > > > > > > single threaded tunnel
> > > >> >> > > > > > > would defeat the purpose of this KIP and also hurt
> > > >> replication
> > > >> >> > > > > > throughput.
> > > >> >> > > > > > > So I think that means
> > > >> >> > > > > > > we should validate to make sure when the new config
> is
> > > set,
> > > >> >> it's
> > > >> >> > > > > > different
> > > >> >> > > > > > > from "inter.broker.listener.name"
> > > >> >> > > > > > > or "security.inter.broker.protocol", whichever is
> set.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 2. Normally all broker configs in a given cluster are
> > > >> changed
> > > >> >> at
> > > >> >> > > the
> > > >> >> > > > > same
> > > >> >> > > > > > > time. If there is a typo in the
> > > >> >> > > > > > > controller.listener.name and it's not available in
> the
> > > >> >> endpoints
> > > >> >> > > > list,
> > > >> >> > > > > > we
> > > >> >> > > > > > > could catch it, give an error
> > > >> >> > > > > > > and block restart of the first broker in the cluster.
> > > With
> > > >> >> that,
> > > >> >> > we
> > > >> >> > > > > could
> > > >> >> > > > > > > keep the current behavior
> > > >> >> > > > > > > in the KIP write up that falls back to
> > > >> >> > "inter.broker.listener.nam"
> > > >> >> > > > when
> > > >> >> > > > > > the
> > > >> >> > > > > > > "controller.listener.name"
> > > >> >> > > > > > > is not found during the migration phase of this KIP.
> > > >> Thoughts?
> > > >> >> > > > > > >
> > > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > > >> >> > > > > > >
> > > >> >> > > > > > > Thanks,
> > > >> >> > > > > > > Lucas
> > > >> >> > > > > > >
> > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > > jun@confluent.io>
> > > >> >> > wrote:
> > > >> >> > > > > > >
> > > >> >> > > > > > > > Hi, Lucas,
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Sorry for the delay. The new proposal looks good to
> > me
> > > >> >> > overall. A
> > > >> >> > > > few
> > > >> >> > > > > > > minor
> > > >> >> > > > > > > > comments below.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 1. It's possible that listener.name.for.controller
> is
> > > >> set,
> > > >> >> but
> > > >> >> > > set
> > > >> >> > > > to
> > > >> >> > > > > > the
> > > >> >> > > > > > > > same value as inter.broker.listener.name. In that
> > > case,
> > > >> >> should
> > > >> >> > > we
> > > >> >> > > > > > have a
> > > >> >> > > > > > > > single network thread and the request handling
> thread
> > > for
> > > >> >> that
> > > >> >> > > > > > listener?
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 2. Currently, the controller always picks the
> > listener
> > > >> >> > specified
> > > >> >> > > by
> > > >> >> > > > > > > > inter.broker.listener.name even if the listener
> name
> > > is
> > > >> not
> > > >> >> > > > present
> > > >> >> > > > > in
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > > different
> > > >> >> > approach
> > > >> >> > > > for
> > > >> >> > > > > > > > picking listener.name.for.controller only when the
> > > >> receiving
> > > >> >> > end
> > > >> >> > > > has
> > > >> >> > > > > > the
> > > >> >> > > > > > > > listener and switches listener.name.for.controller
> > > >> >> otherwise.
> > > >> >> > > There
> > > >> >> > > > > are
> > > >> >> > > > > > > > some tradeoffs between the two approaches. To
> change
> > > the
> > > >> >> inter
> > > >> >> > > > broker
> > > >> >> > > > > > > > listener, the former requires 2 steps: (1) adding
> the
> > > new
> > > >> >> > > listener
> > > >> >> > > > to
> > > >> >> > > > > > > > listener list in every broker and (2) changing
> > > >> >> > > > > > > > listener.name.for.controller.
> > > >> >> > > > > > > > The latter can do both changes in 1 step. On the
> > hand,
> > > if
> > > >> >> > > > > > > > listener.name.for.controller
> > > >> >> > > > > > > > is mis-configured, the former will report an error
> > and
> > > >> the
> > > >> >> > latter
> > > >> >> > > > > will
> > > >> >> > > > > > > hide
> > > >> >> > > > > > > > it (so the user may not know the misconfiguration).
> > It
> > > >> seems
> > > >> >> > that
> > > >> >> > > > we
> > > >> >> > > > > > > should
> > > >> >> > > > > > > > pick one approach to handle both
> > > >> >> listener.name.for.controller
> > > >> >> > and
> > > >> >> > > > > > > > inter.broker.listener.name consistently. To me,
> the
> > > >> former
> > > >> >> > seems
> > > >> >> > > > > > > slightly
> > > >> >> > > > > > > > better.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 3. To be consistent with the existing naming,
> should
> > > >> >> > > > > > > > listener.name.for.controller
> > > >> >> > > > > > > > be controller.listener.name?
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Thanks,
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Jun
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > >> >> > > lucasatucla@gmail.com>
> > > >> >> > > > > > > wrote:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > > Hi Jun and Joel,
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit since
> > your
> > > >> +1.
> > > >> >> > > > > > > > > Can you please take another review? Thanks a lot
> > for
> > > >> your
> > > >> >> > time!
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Lucas
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > > >> >> > > > jjkoshy.w@gmail.com>
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > > +1 on the KIP.
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > (I'm not sure we actually necessary to
> introduce
> > > the
> > > >> >> > > condition
> > > >> >> > > > > > > > variables
> > > >> >> > > > > > > > > > for the concern that Jun raised, but it's an
> > > >> >> implementation
> > > >> >> > > > > detail
> > > >> >> > > > > > > that
> > > >> >> > > > > > > > > we
> > > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > > >> >> > > > > > lucasatucla@gmail.com>
> > > >> >> > > > > > > > > > wrote:
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > > Hi Jun,
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > I agree by using the conditional variables,
> > there
> > > >> is
> > > >> >> no
> > > >> >> > > need
> > > >> >> > > > to
> > > >> >> > > > > > add
> > > >> >> > > > > > > > > such
> > > >> >> > > > > > > > > > a
> > > >> >> > > > > > > > > > > new config.
> > > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > Lucas
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > >
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Thanks for the further comments, Jun.

200. Currently in the code base, we have the term of "ControlBatch" related
to
idempotent/transactional producing. Do you think it's a concern for reusing
the term "control"?

201. It's not clear to me how it would work by following the same strategy
for "controller.listener.name".
Say the new controller has its "controller.listener.name" set to the value
"CONTROLLER", and broker 1
has picked up this KIP by announcing
"endpoints": [
        "CONTROLLER://broker1.example.com:9091",
        "INTERNAL://broker1.example.com:9092",
        "EXTERNAL://host1.example.com:9093"
    ],

while broker2 has not picked up the change, and is announcing
"endpoints": [
        "INTERNAL://broker2.example.com:9092",
        "EXTERNAL://host2.example.com:9093"
    ],
to support both broker 1 for the new behavior and broker 2 for the old
behavior, it seems the controller must
check their published endpoints. Am I missing something?

Thanks!
Lucas

On Mon, Oct 1, 2018 at 6:29 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> Sorry for the delay. The updated wiki looks good to me overall. Just a
> couple more minor comments.
>
> 200. kafka.network:name=ControllerRequestQueueSize,type=RequestChannel: The
> name ControllerRequestQueueSize gives the impression that it's only for the
> controller broker. Perhaps we can just rename all metrics and configs from
> controller to control. This indicates that the threads and the queues are
> for the control requests (as oppose to data requests).
>
> 201. "<new controller, old broker>: In this scenario, the controller will
> have the "controller.listener.name" config set to a value like
> "CONTROLLER", however the broker's exposed endpoints do not have an entry
> corresponding to the new listener name. Hence the controller should
> preserve the existing behavior by determining the endpoint using
> *inter-broker-listener-name *value. The end result should be the same
> behavior as today." Currently, the controller makes connections based on
> its local inter.broker.listener.name config without checking the target
> broker's ZK registration. For consistency, perhaps we can just follow the
> same strategy for controller.listener.name. This existing behavior seems
> simpler to understand and has the benefit of catching inconsistent configs
> across brokers.
>
> Thanks,
>
> Jun
>
> On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > Sorry to bother you again. Can you please take a look at the wiki again
> > when you have time?
> >
> > Thanks a lot!
> > Lucas
> >
> > On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks a lot for the detailed explanation.
> > > I've restored the wiki to a previous version that does not require
> config
> > > changes,
> > > and keeps the current behavior with the proposed changes turned off by
> > > default.
> > > I'd appreciate it if you can review it again.
> > >
> > > Thanks!
> > > Lucas
> > >
> > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Hi, Lucas,
> > >>
> > >> When upgrading to a minor release, I think the expectation is that a
> > user
> > >> wouldn't need to make any config changes, other than the usual
> > >> inter.broker.protocol. If we require other config changes during an
> > >> upgrade, then it's probably better to do that in a major release.
> > >>
> > >> Regarding your proposal, I think removing host/advertised_host in
> favor
> > of
> > >> listeners:advertised_listeners seems useful regardless of this KIP.
> > >> However, that can probably wait until a major release.
> > >>
> > >> As for the controller listener, I am not sure if one has to set it. To
> > >> make
> > >> a cluster healthy, one sort of have to make sure that the request
> queue
> > is
> > >> never full and no request will be sitting in the request queue for
> long.
> > >> If
> > >> one does that, setting the controller listener may not be necessary.
> On
> > >> the
> > >> flip side, even if one sets the controller listener, but the request
> > queue
> > >> and the request time for the data part are still high, the cluster may
> > >> still not be healthy. Given that we have already started the 2.1
> release
> > >> planning, perhaps we can start with not requiring the controller
> > listener.
> > >> If this is indeed something that everyone wants to set, we can make
> it a
> > >> required config in a major release.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com>
> > >> wrote:
> > >>
> > >> > @Jun Rao <ju...@confluent.io>
> > >> >
> > >> > I made the recent config changes after thinking about the default
> > >> behavior
> > >> > for adopting this KIP.
> > >> > I think there are basically two options:
> > >> > 1. By default, the behavior proposed in this KIP is turned off, and
> > >> > operators can turn it
> > >> > on by adding the "controller.listener.name" config and entries in
> the
> > >> > "listeners" and "advertised.listeners" list.
> > >> > If no "controller.listener.name" is added, it'll be the *same as*
> > the "
> > >> > inter.broker.listener.name",
> > >> > and the proposed feature is effectively turned off.
> > >> > This has been the assumption in the KIP writeup before.
> > >> >
> > >> > 2. By default, the behavior proposed in this KIP is turned on, and
> > >> > operators are forced to
> > >> > recognize the proposed change if their "listeners" config is set
> (this
> > >> is
> > >> > most likely in production environments),
> > >> > by allocating a new port for controller connections, and adding a
> new
> > >> > endpoint to the "listeners" config.
> > >> > For cases where "listeners" is not set explicitly,
> > >> > there needs to be a default value for it that includes the
> controller
> > >> > listener name,
> > >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> > >> >
> > >> > I chose to go with option 2 since as author of this KIP,
> > >> > I naturally think in the long run, it's worth the effort to adopt
> this
> > >> > feature,
> > >> > in order to prevent issues under circumstances listed in the
> > motivation
> > >> > section.
> > >> >
> > >> > 100, following the argument above, I want to enforce the separation
> > >> > between controller
> > >> > and data plane requests. Hence the "controller.listener.name"
> should
> > >> > never be the same
> > >> > as the "inter.broker.listener.name", which is intended for data
> plane
> > >> > requests.
> > >> >
> > >> > 101, the default value for "listeners" will be
> > >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > >> > making values of "host", and "port" not being used under any config
> > >> > settings.
> > >> > And the default value for "advertised.listeners" will be derived
> from
> > >> > "listeners",
> > >> > making the values of "advertised.host", and "advertised.port" not
> > being
> > >> > used any more.
> > >> >
> > >> > 102, for upgrading, a single broker that has "listeners" and/or
> > >> > "advertised.listeners" set,
> > >> > must add a new endpoint for the CONTROLLER listener name, or end up
> > >> using
> > >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > >> > During rolling upgrade, in cases of <old controller> + <new broker>
> or
> > >> > <new controller>  + <old broker>
> > >> > we still need to fall back to the current behavior. However after
> the
> > >> > rolling upgrade is done,
> > >> > it is guaranteed that the controller plane and data plane are
> > separated,
> > >> > given
> > >> > the "controller.listener.name" must be different from "
> > >> > inter.broker.listener.name".
> > >> >
> > >> > @Ismael Juma <is...@juma.me.uk>
> > >> > Thanks for pointing that out. I did not know that.
> > >> > However my question is if the argument above makes sense, and my
> code
> > >> > change
> > >> > causes the configs "host", "port", "advertised.host",
> > "advertised.port"
> > >> to
> > >> > be not used under any circumstance,
> > >> > then it's no different from removing them.
> > >> > Anyway if there is still a concern about removing them, is there a
> new
> > >> > major new version
> > >> > now or in the future where I can remove them?
> > >> >
> > >> > Thanks!
> > >> > Lucas
> > >> >
> > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > >> >
> > >> >> To be clear, we can only remove configs in major new versions.
> > >> Otherwise,
> > >> >> we can only deprecate them.
> > >> >>
> > >> >> Ismael
> > >> >>
> > >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
> > >> >>
> > >> >> > Hi, Lucas,
> > >> >> >
> > >> >> > For the network idlePct, your understanding is correct.
> Currently,
> > >> >> > networkIdlePct metric is calculated as the average of (1 -
> > io-ratio)
> > >> in
> > >> >> the
> > >> >> > selector of all network threads.
> > >> >> >
> > >> >> > The metrics part looks good to me in the updated KIP.
> > >> >> >
> > >> >> > I am not still not quite sure about the configs.
> > >> >> >
> > >> >> > 100. "Whenever the "controller.listener.name" is set, upon
> broker
> > >> >> startup,
> > >> >> > we will validate its value and make sure it's different from the
> > >> >> > *inter-broker-listener-name *value." Does that mean that
> > >> >> > controller.listener.name has to be different from
> > >> >> > inter.broker.listener.name?
> > >> >> > That seems limiting.
> > >> >> >
> > >> >> > 101. The KIP says that advertised.listeners and listeners will
> now
> > >> have
> > >> >> a
> > >> >> > different default value including controller. Could you document
> > what
> > >> >> the
> > >> >> > default value looks like?
> > >> >> >
> > >> >> > 102. About removing the the following configs. How does that
> affect
> > >> the
> > >> >> > upgrade path? Do we now expect a user to add a new config when
> > >> upgrading
> > >> >> > from an old version to a new one?
> > >> >> > host
> > >> >> > port
> > >> >> > advertised.host
> > >> >> > advertised.port
> > >> >> >
> > >> >> > Thanks,
> > >> >> >
> > >> >> > Jun
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> > > @Jun Rao <ju...@confluent.io>
> > >> >> > >
> > >> >> > > One clarification, currently on the selector level, we have the
> > >> >> > > io-wait-ratio metric.
> > >> >> > > For the new controller *network* thread, we can use it directly
> > for
> > >> >> > > IdlePct, instead of using 1- io-ratio,
> > >> >> > > so that the logic is similar to the current average IdlePct for
> > >> >> network
> > >> >> > > threads. Is that correct?
> > >> >> > >
> > >> >> > > I've revised the KIP by adding two new metrics for measuring
> the
> > >> >> IdlePct
> > >> >> > > for the two additional threads.
> > >> >> > > Please take a look again. Thanks!
> > >> >> > >
> > >> >> > > Lucas
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io>
> wrote:
> > >> >> > >
> > >> >> > > > Hi, Lucas,
> > >> >> > > >
> > >> >> > > > Thanks for the updated KIP.
> > >> >> > > >
> > >> >> > > > For monitoring the network thread utilization for the control
> > >> >> plane, we
> > >> >> > > > already have the metric io-ratio at the selector level
> (idlePct
> > >> is
> > >> >> 1 -
> > >> >> > > > io-ratio). So, we just need to give that selector a
> meaningful
> > >> name.
> > >> >> > > >
> > >> >> > > > For monitoring the io thread utilization for the control
> plane,
> > >> it's
> > >> >> > > > probably useful to have a separate metric for that. The
> > >> controller
> > >> >> > > request
> > >> >> > > > queue size may not reflect the history in a window.
> > >> >> > > >
> > >> >> > > > Jun
> > >> >> > > >
> > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> > >> lucasatucla@gmail.com>
> > >> >> > > wrote:
> > >> >> > > >
> > >> >> > > > > Thanks Jun for your quick response. It looks like I forgot
> to
> > >> >> click
> > >> >> > the
> > >> >> > > > > "Update" button, :)
> > >> >> > > > > It's updated now.
> > >> >> > > > >
> > >> >> > > > > Regarding the idle ratio metrics for the additional
> threads,
> > I
> > >> >> > > discussed
> > >> >> > > > > with Joel,
> > >> >> > > > > and think they are not as useful, and I added our reasoning
> > in
> > >> the
> > >> >> > last
> > >> >> > > > > paragraph of the
> > >> >> > > > > "How are controller requests handled over the dedicated
> > >> >> connections?"
> > >> >> > > > > section.
> > >> >> > > > > On the other hand, we don't strongly oppose adding them if
> > you
> > >> >> think
> > >> >> > > they
> > >> >> > > > > are necessary.
> > >> >> > > > >
> > >> >> > > > > Thanks,
> > >> >> > > > > Lucas
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io>
> > >> wrote:
> > >> >> > > > >
> > >> >> > > > > > Hi, Lucas,
> > >> >> > > > > >
> > >> >> > > > > > Thanks for the reply. Have you actually updated the KIP?
> > The
> > >> >> wiki
> > >> >> > > says
> > >> >> > > > > that
> > >> >> > > > > > it's last updated on Aug. 22. and some of the changes
> that
> > >> you
> > >> >> > > > mentioned
> > >> >> > > > > > (#1 and #3) are not there.
> > >> >> > > > > >
> > >> >> > > > > > Also, regarding Joel's comment on network/request idle
> > ratio
> > >> >> > metrics,
> > >> >> > > > > could
> > >> >> > > > > > you comment on whether they include the new controller
> > >> >> listener? If
> > >> >> > > > not,
> > >> >> > > > > do
> > >> >> > > > > > we need additional metrics to measure the utilization of
> > the
> > >> io
> > >> >> > > thread
> > >> >> > > > > for
> > >> >> > > > > > the control plane?
> > >> >> > > > > >
> > >> >> > > > > > Jun
> > >> >> > > > > >
> > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> > >> >> lucasatucla@gmail.com
> > >> >> > >
> > >> >> > > > > wrote:
> > >> >> > > > > >
> > >> >> > > > > > > Thanks for the comments, Jun.
> > >> >> > > > > > >
> > >> >> > > > > > > 1. I think the answer should be no, since the "
> > >> >> > > > > > inter.broker.listener.name"
> > >> >> > > > > > > are also used
> > >> >> > > > > > > for replication traffic, and merging these two types of
> > >> >> request
> > >> >> > to
> > >> >> > > > the
> > >> >> > > > > > > single threaded tunnel
> > >> >> > > > > > > would defeat the purpose of this KIP and also hurt
> > >> replication
> > >> >> > > > > > throughput.
> > >> >> > > > > > > So I think that means
> > >> >> > > > > > > we should validate to make sure when the new config is
> > set,
> > >> >> it's
> > >> >> > > > > > different
> > >> >> > > > > > > from "inter.broker.listener.name"
> > >> >> > > > > > > or "security.inter.broker.protocol", whichever is set.
> > >> >> > > > > > >
> > >> >> > > > > > > 2. Normally all broker configs in a given cluster are
> > >> changed
> > >> >> at
> > >> >> > > the
> > >> >> > > > > same
> > >> >> > > > > > > time. If there is a typo in the
> > >> >> > > > > > > controller.listener.name and it's not available in the
> > >> >> endpoints
> > >> >> > > > list,
> > >> >> > > > > > we
> > >> >> > > > > > > could catch it, give an error
> > >> >> > > > > > > and block restart of the first broker in the cluster.
> > With
> > >> >> that,
> > >> >> > we
> > >> >> > > > > could
> > >> >> > > > > > > keep the current behavior
> > >> >> > > > > > > in the KIP write up that falls back to
> > >> >> > "inter.broker.listener.nam"
> > >> >> > > > when
> > >> >> > > > > > the
> > >> >> > > > > > > "controller.listener.name"
> > >> >> > > > > > > is not found during the migration phase of this KIP.
> > >> Thoughts?
> > >> >> > > > > > >
> > >> >> > > > > > > 3. That makes sense, and I've changed it.
> > >> >> > > > > > >
> > >> >> > > > > > > Thanks,
> > >> >> > > > > > > Lucas
> > >> >> > > > > > >
> > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> > jun@confluent.io>
> > >> >> > wrote:
> > >> >> > > > > > >
> > >> >> > > > > > > > Hi, Lucas,
> > >> >> > > > > > > >
> > >> >> > > > > > > > Sorry for the delay. The new proposal looks good to
> me
> > >> >> > overall. A
> > >> >> > > > few
> > >> >> > > > > > > minor
> > >> >> > > > > > > > comments below.
> > >> >> > > > > > > >
> > >> >> > > > > > > > 1. It's possible that listener.name.for.controller is
> > >> set,
> > >> >> but
> > >> >> > > set
> > >> >> > > > to
> > >> >> > > > > > the
> > >> >> > > > > > > > same value as inter.broker.listener.name. In that
> > case,
> > >> >> should
> > >> >> > > we
> > >> >> > > > > > have a
> > >> >> > > > > > > > single network thread and the request handling thread
> > for
> > >> >> that
> > >> >> > > > > > listener?
> > >> >> > > > > > > >
> > >> >> > > > > > > > 2. Currently, the controller always picks the
> listener
> > >> >> > specified
> > >> >> > > by
> > >> >> > > > > > > > inter.broker.listener.name even if the listener name
> > is
> > >> not
> > >> >> > > > present
> > >> >> > > > > in
> > >> >> > > > > > > the
> > >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> > different
> > >> >> > approach
> > >> >> > > > for
> > >> >> > > > > > > > picking listener.name.for.controller only when the
> > >> receiving
> > >> >> > end
> > >> >> > > > has
> > >> >> > > > > > the
> > >> >> > > > > > > > listener and switches listener.name.for.controller
> > >> >> otherwise.
> > >> >> > > There
> > >> >> > > > > are
> > >> >> > > > > > > > some tradeoffs between the two approaches. To change
> > the
> > >> >> inter
> > >> >> > > > broker
> > >> >> > > > > > > > listener, the former requires 2 steps: (1) adding the
> > new
> > >> >> > > listener
> > >> >> > > > to
> > >> >> > > > > > > > listener list in every broker and (2) changing
> > >> >> > > > > > > > listener.name.for.controller.
> > >> >> > > > > > > > The latter can do both changes in 1 step. On the
> hand,
> > if
> > >> >> > > > > > > > listener.name.for.controller
> > >> >> > > > > > > > is mis-configured, the former will report an error
> and
> > >> the
> > >> >> > latter
> > >> >> > > > > will
> > >> >> > > > > > > hide
> > >> >> > > > > > > > it (so the user may not know the misconfiguration).
> It
> > >> seems
> > >> >> > that
> > >> >> > > > we
> > >> >> > > > > > > should
> > >> >> > > > > > > > pick one approach to handle both
> > >> >> listener.name.for.controller
> > >> >> > and
> > >> >> > > > > > > > inter.broker.listener.name consistently. To me, the
> > >> former
> > >> >> > seems
> > >> >> > > > > > > slightly
> > >> >> > > > > > > > better.
> > >> >> > > > > > > >
> > >> >> > > > > > > > 3. To be consistent with the existing naming, should
> > >> >> > > > > > > > listener.name.for.controller
> > >> >> > > > > > > > be controller.listener.name?
> > >> >> > > > > > > >
> > >> >> > > > > > > > Thanks,
> > >> >> > > > > > > >
> > >> >> > > > > > > > Jun
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > >> >> > > lucasatucla@gmail.com>
> > >> >> > > > > > > wrote:
> > >> >> > > > > > > >
> > >> >> > > > > > > > > Hi Jun and Joel,
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > The KIP writeup has changed by quite a bit since
> your
> > >> +1.
> > >> >> > > > > > > > > Can you please take another review? Thanks a lot
> for
> > >> your
> > >> >> > time!
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Lucas
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > >> >> > > > jjkoshy.w@gmail.com>
> > >> >> > > > > > > > wrote:
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > > +1 on the KIP.
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > (I'm not sure we actually necessary to introduce
> > the
> > >> >> > > condition
> > >> >> > > > > > > > variables
> > >> >> > > > > > > > > > for the concern that Jun raised, but it's an
> > >> >> implementation
> > >> >> > > > > detail
> > >> >> > > > > > > that
> > >> >> > > > > > > > > we
> > >> >> > > > > > > > > > can defer to a discussion in the PR.)
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > >> >> > > > > > lucasatucla@gmail.com>
> > >> >> > > > > > > > > > wrote:
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > > Hi Jun,
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > > > I agree by using the conditional variables,
> there
> > >> is
> > >> >> no
> > >> >> > > need
> > >> >> > > > to
> > >> >> > > > > > add
> > >> >> > > > > > > > > such
> > >> >> > > > > > > > > > a
> > >> >> > > > > > > > > > > new config.
> > >> >> > > > > > > > > > > Also thanks for approving this KIP.
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > > > Lucas
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

Sorry for the delay. The updated wiki looks good to me overall. Just a
couple more minor comments.

200. kafka.network:name=ControllerRequestQueueSize,type=RequestChannel: The
name ControllerRequestQueueSize gives the impression that it's only for the
controller broker. Perhaps we can just rename all metrics and configs from
controller to control. This indicates that the threads and the queues are
for the control requests (as oppose to data requests).

201. "<new controller, old broker>: In this scenario, the controller will
have the "controller.listener.name" config set to a value like
"CONTROLLER", however the broker's exposed endpoints do not have an entry
corresponding to the new listener name. Hence the controller should
preserve the existing behavior by determining the endpoint using
*inter-broker-listener-name *value. The end result should be the same
behavior as today." Currently, the controller makes connections based on
its local inter.broker.listener.name config without checking the target
broker's ZK registration. For consistency, perhaps we can just follow the
same strategy for controller.listener.name. This existing behavior seems
simpler to understand and has the benefit of catching inconsistent configs
across brokers.

Thanks,

Jun

On Mon, Oct 1, 2018 at 8:43 AM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Jun,
>
> Sorry to bother you again. Can you please take a look at the wiki again
> when you have time?
>
> Thanks a lot!
> Lucas
>
> On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > Thanks a lot for the detailed explanation.
> > I've restored the wiki to a previous version that does not require config
> > changes,
> > and keeps the current behavior with the proposed changes turned off by
> > default.
> > I'd appreciate it if you can review it again.
> >
> > Thanks!
> > Lucas
> >
> > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Lucas,
> >>
> >> When upgrading to a minor release, I think the expectation is that a
> user
> >> wouldn't need to make any config changes, other than the usual
> >> inter.broker.protocol. If we require other config changes during an
> >> upgrade, then it's probably better to do that in a major release.
> >>
> >> Regarding your proposal, I think removing host/advertised_host in favor
> of
> >> listeners:advertised_listeners seems useful regardless of this KIP.
> >> However, that can probably wait until a major release.
> >>
> >> As for the controller listener, I am not sure if one has to set it. To
> >> make
> >> a cluster healthy, one sort of have to make sure that the request queue
> is
> >> never full and no request will be sitting in the request queue for long.
> >> If
> >> one does that, setting the controller listener may not be necessary. On
> >> the
> >> flip side, even if one sets the controller listener, but the request
> queue
> >> and the request time for the data part are still high, the cluster may
> >> still not be healthy. Given that we have already started the 2.1 release
> >> planning, perhaps we can start with not requiring the controller
> listener.
> >> If this is indeed something that everyone wants to set, we can make it a
> >> required config in a major release.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com>
> >> wrote:
> >>
> >> > @Jun Rao <ju...@confluent.io>
> >> >
> >> > I made the recent config changes after thinking about the default
> >> behavior
> >> > for adopting this KIP.
> >> > I think there are basically two options:
> >> > 1. By default, the behavior proposed in this KIP is turned off, and
> >> > operators can turn it
> >> > on by adding the "controller.listener.name" config and entries in the
> >> > "listeners" and "advertised.listeners" list.
> >> > If no "controller.listener.name" is added, it'll be the *same as*
> the "
> >> > inter.broker.listener.name",
> >> > and the proposed feature is effectively turned off.
> >> > This has been the assumption in the KIP writeup before.
> >> >
> >> > 2. By default, the behavior proposed in this KIP is turned on, and
> >> > operators are forced to
> >> > recognize the proposed change if their "listeners" config is set (this
> >> is
> >> > most likely in production environments),
> >> > by allocating a new port for controller connections, and adding a new
> >> > endpoint to the "listeners" config.
> >> > For cases where "listeners" is not set explicitly,
> >> > there needs to be a default value for it that includes the controller
> >> > listener name,
> >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> >> >
> >> > I chose to go with option 2 since as author of this KIP,
> >> > I naturally think in the long run, it's worth the effort to adopt this
> >> > feature,
> >> > in order to prevent issues under circumstances listed in the
> motivation
> >> > section.
> >> >
> >> > 100, following the argument above, I want to enforce the separation
> >> > between controller
> >> > and data plane requests. Hence the "controller.listener.name" should
> >> > never be the same
> >> > as the "inter.broker.listener.name", which is intended for data plane
> >> > requests.
> >> >
> >> > 101, the default value for "listeners" will be
> >> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> >> > making values of "host", and "port" not being used under any config
> >> > settings.
> >> > And the default value for "advertised.listeners" will be derived from
> >> > "listeners",
> >> > making the values of "advertised.host", and "advertised.port" not
> being
> >> > used any more.
> >> >
> >> > 102, for upgrading, a single broker that has "listeners" and/or
> >> > "advertised.listeners" set,
> >> > must add a new endpoint for the CONTROLLER listener name, or end up
> >> using
> >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> >> > During rolling upgrade, in cases of <old controller> + <new broker> or
> >> > <new controller>  + <old broker>
> >> > we still need to fall back to the current behavior. However after the
> >> > rolling upgrade is done,
> >> > it is guaranteed that the controller plane and data plane are
> separated,
> >> > given
> >> > the "controller.listener.name" must be different from "
> >> > inter.broker.listener.name".
> >> >
> >> > @Ismael Juma <is...@juma.me.uk>
> >> > Thanks for pointing that out. I did not know that.
> >> > However my question is if the argument above makes sense, and my code
> >> > change
> >> > causes the configs "host", "port", "advertised.host",
> "advertised.port"
> >> to
> >> > be not used under any circumstance,
> >> > then it's no different from removing them.
> >> > Anyway if there is still a concern about removing them, is there a new
> >> > major new version
> >> > now or in the future where I can remove them?
> >> >
> >> > Thanks!
> >> > Lucas
> >> >
> >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk>
> wrote:
> >> >
> >> >> To be clear, we can only remove configs in major new versions.
> >> Otherwise,
> >> >> we can only deprecate them.
> >> >>
> >> >> Ismael
> >> >>
> >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
> >> >>
> >> >> > Hi, Lucas,
> >> >> >
> >> >> > For the network idlePct, your understanding is correct. Currently,
> >> >> > networkIdlePct metric is calculated as the average of (1 -
> io-ratio)
> >> in
> >> >> the
> >> >> > selector of all network threads.
> >> >> >
> >> >> > The metrics part looks good to me in the updated KIP.
> >> >> >
> >> >> > I am not still not quite sure about the configs.
> >> >> >
> >> >> > 100. "Whenever the "controller.listener.name" is set, upon broker
> >> >> startup,
> >> >> > we will validate its value and make sure it's different from the
> >> >> > *inter-broker-listener-name *value." Does that mean that
> >> >> > controller.listener.name has to be different from
> >> >> > inter.broker.listener.name?
> >> >> > That seems limiting.
> >> >> >
> >> >> > 101. The KIP says that advertised.listeners and listeners will now
> >> have
> >> >> a
> >> >> > different default value including controller. Could you document
> what
> >> >> the
> >> >> > default value looks like?
> >> >> >
> >> >> > 102. About removing the the following configs. How does that affect
> >> the
> >> >> > upgrade path? Do we now expect a user to add a new config when
> >> upgrading
> >> >> > from an old version to a new one?
> >> >> > host
> >> >> > port
> >> >> > advertised.host
> >> >> > advertised.port
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> >
> >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > > @Jun Rao <ju...@confluent.io>
> >> >> > >
> >> >> > > One clarification, currently on the selector level, we have the
> >> >> > > io-wait-ratio metric.
> >> >> > > For the new controller *network* thread, we can use it directly
> for
> >> >> > > IdlePct, instead of using 1- io-ratio,
> >> >> > > so that the logic is similar to the current average IdlePct for
> >> >> network
> >> >> > > threads. Is that correct?
> >> >> > >
> >> >> > > I've revised the KIP by adding two new metrics for measuring the
> >> >> IdlePct
> >> >> > > for the two additional threads.
> >> >> > > Please take a look again. Thanks!
> >> >> > >
> >> >> > > Lucas
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
> >> >> > >
> >> >> > > > Hi, Lucas,
> >> >> > > >
> >> >> > > > Thanks for the updated KIP.
> >> >> > > >
> >> >> > > > For monitoring the network thread utilization for the control
> >> >> plane, we
> >> >> > > > already have the metric io-ratio at the selector level (idlePct
> >> is
> >> >> 1 -
> >> >> > > > io-ratio). So, we just need to give that selector a meaningful
> >> name.
> >> >> > > >
> >> >> > > > For monitoring the io thread utilization for the control plane,
> >> it's
> >> >> > > > probably useful to have a separate metric for that. The
> >> controller
> >> >> > > request
> >> >> > > > queue size may not reflect the history in a window.
> >> >> > > >
> >> >> > > > Jun
> >> >> > > >
> >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
> >> lucasatucla@gmail.com>
> >> >> > > wrote:
> >> >> > > >
> >> >> > > > > Thanks Jun for your quick response. It looks like I forgot to
> >> >> click
> >> >> > the
> >> >> > > > > "Update" button, :)
> >> >> > > > > It's updated now.
> >> >> > > > >
> >> >> > > > > Regarding the idle ratio metrics for the additional threads,
> I
> >> >> > > discussed
> >> >> > > > > with Joel,
> >> >> > > > > and think they are not as useful, and I added our reasoning
> in
> >> the
> >> >> > last
> >> >> > > > > paragraph of the
> >> >> > > > > "How are controller requests handled over the dedicated
> >> >> connections?"
> >> >> > > > > section.
> >> >> > > > > On the other hand, we don't strongly oppose adding them if
> you
> >> >> think
> >> >> > > they
> >> >> > > > > are necessary.
> >> >> > > > >
> >> >> > > > > Thanks,
> >> >> > > > > Lucas
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io>
> >> wrote:
> >> >> > > > >
> >> >> > > > > > Hi, Lucas,
> >> >> > > > > >
> >> >> > > > > > Thanks for the reply. Have you actually updated the KIP?
> The
> >> >> wiki
> >> >> > > says
> >> >> > > > > that
> >> >> > > > > > it's last updated on Aug. 22. and some of the changes that
> >> you
> >> >> > > > mentioned
> >> >> > > > > > (#1 and #3) are not there.
> >> >> > > > > >
> >> >> > > > > > Also, regarding Joel's comment on network/request idle
> ratio
> >> >> > metrics,
> >> >> > > > > could
> >> >> > > > > > you comment on whether they include the new controller
> >> >> listener? If
> >> >> > > > not,
> >> >> > > > > do
> >> >> > > > > > we need additional metrics to measure the utilization of
> the
> >> io
> >> >> > > thread
> >> >> > > > > for
> >> >> > > > > > the control plane?
> >> >> > > > > >
> >> >> > > > > > Jun
> >> >> > > > > >
> >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> >> >> lucasatucla@gmail.com
> >> >> > >
> >> >> > > > > wrote:
> >> >> > > > > >
> >> >> > > > > > > Thanks for the comments, Jun.
> >> >> > > > > > >
> >> >> > > > > > > 1. I think the answer should be no, since the "
> >> >> > > > > > inter.broker.listener.name"
> >> >> > > > > > > are also used
> >> >> > > > > > > for replication traffic, and merging these two types of
> >> >> request
> >> >> > to
> >> >> > > > the
> >> >> > > > > > > single threaded tunnel
> >> >> > > > > > > would defeat the purpose of this KIP and also hurt
> >> replication
> >> >> > > > > > throughput.
> >> >> > > > > > > So I think that means
> >> >> > > > > > > we should validate to make sure when the new config is
> set,
> >> >> it's
> >> >> > > > > > different
> >> >> > > > > > > from "inter.broker.listener.name"
> >> >> > > > > > > or "security.inter.broker.protocol", whichever is set.
> >> >> > > > > > >
> >> >> > > > > > > 2. Normally all broker configs in a given cluster are
> >> changed
> >> >> at
> >> >> > > the
> >> >> > > > > same
> >> >> > > > > > > time. If there is a typo in the
> >> >> > > > > > > controller.listener.name and it's not available in the
> >> >> endpoints
> >> >> > > > list,
> >> >> > > > > > we
> >> >> > > > > > > could catch it, give an error
> >> >> > > > > > > and block restart of the first broker in the cluster.
> With
> >> >> that,
> >> >> > we
> >> >> > > > > could
> >> >> > > > > > > keep the current behavior
> >> >> > > > > > > in the KIP write up that falls back to
> >> >> > "inter.broker.listener.nam"
> >> >> > > > when
> >> >> > > > > > the
> >> >> > > > > > > "controller.listener.name"
> >> >> > > > > > > is not found during the migration phase of this KIP.
> >> Thoughts?
> >> >> > > > > > >
> >> >> > > > > > > 3. That makes sense, and I've changed it.
> >> >> > > > > > >
> >> >> > > > > > > Thanks,
> >> >> > > > > > > Lucas
> >> >> > > > > > >
> >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <
> jun@confluent.io>
> >> >> > wrote:
> >> >> > > > > > >
> >> >> > > > > > > > Hi, Lucas,
> >> >> > > > > > > >
> >> >> > > > > > > > Sorry for the delay. The new proposal looks good to me
> >> >> > overall. A
> >> >> > > > few
> >> >> > > > > > > minor
> >> >> > > > > > > > comments below.
> >> >> > > > > > > >
> >> >> > > > > > > > 1. It's possible that listener.name.for.controller is
> >> set,
> >> >> but
> >> >> > > set
> >> >> > > > to
> >> >> > > > > > the
> >> >> > > > > > > > same value as inter.broker.listener.name. In that
> case,
> >> >> should
> >> >> > > we
> >> >> > > > > > have a
> >> >> > > > > > > > single network thread and the request handling thread
> for
> >> >> that
> >> >> > > > > > listener?
> >> >> > > > > > > >
> >> >> > > > > > > > 2. Currently, the controller always picks the listener
> >> >> > specified
> >> >> > > by
> >> >> > > > > > > > inter.broker.listener.name even if the listener name
> is
> >> not
> >> >> > > > present
> >> >> > > > > in
> >> >> > > > > > > the
> >> >> > > > > > > > receiving broker. This KIP proposes a slightly
> different
> >> >> > approach
> >> >> > > > for
> >> >> > > > > > > > picking listener.name.for.controller only when the
> >> receiving
> >> >> > end
> >> >> > > > has
> >> >> > > > > > the
> >> >> > > > > > > > listener and switches listener.name.for.controller
> >> >> otherwise.
> >> >> > > There
> >> >> > > > > are
> >> >> > > > > > > > some tradeoffs between the two approaches. To change
> the
> >> >> inter
> >> >> > > > broker
> >> >> > > > > > > > listener, the former requires 2 steps: (1) adding the
> new
> >> >> > > listener
> >> >> > > > to
> >> >> > > > > > > > listener list in every broker and (2) changing
> >> >> > > > > > > > listener.name.for.controller.
> >> >> > > > > > > > The latter can do both changes in 1 step. On the hand,
> if
> >> >> > > > > > > > listener.name.for.controller
> >> >> > > > > > > > is mis-configured, the former will report an error and
> >> the
> >> >> > latter
> >> >> > > > > will
> >> >> > > > > > > hide
> >> >> > > > > > > > it (so the user may not know the misconfiguration). It
> >> seems
> >> >> > that
> >> >> > > > we
> >> >> > > > > > > should
> >> >> > > > > > > > pick one approach to handle both
> >> >> listener.name.for.controller
> >> >> > and
> >> >> > > > > > > > inter.broker.listener.name consistently. To me, the
> >> former
> >> >> > seems
> >> >> > > > > > > slightly
> >> >> > > > > > > > better.
> >> >> > > > > > > >
> >> >> > > > > > > > 3. To be consistent with the existing naming, should
> >> >> > > > > > > > listener.name.for.controller
> >> >> > > > > > > > be controller.listener.name?
> >> >> > > > > > > >
> >> >> > > > > > > > Thanks,
> >> >> > > > > > > >
> >> >> > > > > > > > Jun
> >> >> > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> >> >> > > lucasatucla@gmail.com>
> >> >> > > > > > > wrote:
> >> >> > > > > > > >
> >> >> > > > > > > > > Hi Jun and Joel,
> >> >> > > > > > > > >
> >> >> > > > > > > > > The KIP writeup has changed by quite a bit since your
> >> +1.
> >> >> > > > > > > > > Can you please take another review? Thanks a lot for
> >> your
> >> >> > time!
> >> >> > > > > > > > >
> >> >> > > > > > > > > Lucas
> >> >> > > > > > > > >
> >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> >> >> > > > jjkoshy.w@gmail.com>
> >> >> > > > > > > > wrote:
> >> >> > > > > > > > >
> >> >> > > > > > > > > > +1 on the KIP.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > (I'm not sure we actually necessary to introduce
> the
> >> >> > > condition
> >> >> > > > > > > > variables
> >> >> > > > > > > > > > for the concern that Jun raised, but it's an
> >> >> implementation
> >> >> > > > > detail
> >> >> > > > > > > that
> >> >> > > > > > > > > we
> >> >> > > > > > > > > > can defer to a discussion in the PR.)
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> >> >> > > > > > lucasatucla@gmail.com>
> >> >> > > > > > > > > > wrote:
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > > Hi Jun,
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > I agree by using the conditional variables, there
> >> is
> >> >> no
> >> >> > > need
> >> >> > > > to
> >> >> > > > > > add
> >> >> > > > > > > > > such
> >> >> > > > > > > > > > a
> >> >> > > > > > > > > > > new config.
> >> >> > > > > > > > > > > Also thanks for approving this KIP.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Lucas
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Hi Jun,

Sorry to bother you again. Can you please take a look at the wiki again
when you have time?

Thanks a lot!
Lucas

On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lu...@gmail.com> wrote:

> Hi Jun,
>
> Thanks a lot for the detailed explanation.
> I've restored the wiki to a previous version that does not require config
> changes,
> and keeps the current behavior with the proposed changes turned off by
> default.
> I'd appreciate it if you can review it again.
>
> Thanks!
> Lucas
>
> On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Lucas,
>>
>> When upgrading to a minor release, I think the expectation is that a user
>> wouldn't need to make any config changes, other than the usual
>> inter.broker.protocol. If we require other config changes during an
>> upgrade, then it's probably better to do that in a major release.
>>
>> Regarding your proposal, I think removing host/advertised_host in favor of
>> listeners:advertised_listeners seems useful regardless of this KIP.
>> However, that can probably wait until a major release.
>>
>> As for the controller listener, I am not sure if one has to set it. To
>> make
>> a cluster healthy, one sort of have to make sure that the request queue is
>> never full and no request will be sitting in the request queue for long.
>> If
>> one does that, setting the controller listener may not be necessary. On
>> the
>> flip side, even if one sets the controller listener, but the request queue
>> and the request time for the data part are still high, the cluster may
>> still not be healthy. Given that we have already started the 2.1 release
>> planning, perhaps we can start with not requiring the controller listener.
>> If this is indeed something that everyone wants to set, we can make it a
>> required config in a major release.
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com>
>> wrote:
>>
>> > @Jun Rao <ju...@confluent.io>
>> >
>> > I made the recent config changes after thinking about the default
>> behavior
>> > for adopting this KIP.
>> > I think there are basically two options:
>> > 1. By default, the behavior proposed in this KIP is turned off, and
>> > operators can turn it
>> > on by adding the "controller.listener.name" config and entries in the
>> > "listeners" and "advertised.listeners" list.
>> > If no "controller.listener.name" is added, it'll be the *same as* the "
>> > inter.broker.listener.name",
>> > and the proposed feature is effectively turned off.
>> > This has been the assumption in the KIP writeup before.
>> >
>> > 2. By default, the behavior proposed in this KIP is turned on, and
>> > operators are forced to
>> > recognize the proposed change if their "listeners" config is set (this
>> is
>> > most likely in production environments),
>> > by allocating a new port for controller connections, and adding a new
>> > endpoint to the "listeners" config.
>> > For cases where "listeners" is not set explicitly,
>> > there needs to be a default value for it that includes the controller
>> > listener name,
>> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
>> >
>> > I chose to go with option 2 since as author of this KIP,
>> > I naturally think in the long run, it's worth the effort to adopt this
>> > feature,
>> > in order to prevent issues under circumstances listed in the motivation
>> > section.
>> >
>> > 100, following the argument above, I want to enforce the separation
>> > between controller
>> > and data plane requests. Hence the "controller.listener.name" should
>> > never be the same
>> > as the "inter.broker.listener.name", which is intended for data plane
>> > requests.
>> >
>> > 101, the default value for "listeners" will be
>> > "PLAINTEXT://:9092,CONTROLLER://:9091",
>> > making values of "host", and "port" not being used under any config
>> > settings.
>> > And the default value for "advertised.listeners" will be derived from
>> > "listeners",
>> > making the values of "advertised.host", and "advertised.port" not being
>> > used any more.
>> >
>> > 102, for upgrading, a single broker that has "listeners" and/or
>> > "advertised.listeners" set,
>> > must add a new endpoint for the CONTROLLER listener name, or end up
>> using
>> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
>> > During rolling upgrade, in cases of <old controller> + <new broker> or
>> > <new controller>  + <old broker>
>> > we still need to fall back to the current behavior. However after the
>> > rolling upgrade is done,
>> > it is guaranteed that the controller plane and data plane are separated,
>> > given
>> > the "controller.listener.name" must be different from "
>> > inter.broker.listener.name".
>> >
>> > @Ismael Juma <is...@juma.me.uk>
>> > Thanks for pointing that out. I did not know that.
>> > However my question is if the argument above makes sense, and my code
>> > change
>> > causes the configs "host", "port", "advertised.host", "advertised.port"
>> to
>> > be not used under any circumstance,
>> > then it's no different from removing them.
>> > Anyway if there is still a concern about removing them, is there a new
>> > major new version
>> > now or in the future where I can remove them?
>> >
>> > Thanks!
>> > Lucas
>> >
>> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> >> To be clear, we can only remove configs in major new versions.
>> Otherwise,
>> >> we can only deprecate them.
>> >>
>> >> Ismael
>> >>
>> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
>> >>
>> >> > Hi, Lucas,
>> >> >
>> >> > For the network idlePct, your understanding is correct. Currently,
>> >> > networkIdlePct metric is calculated as the average of (1 - io-ratio)
>> in
>> >> the
>> >> > selector of all network threads.
>> >> >
>> >> > The metrics part looks good to me in the updated KIP.
>> >> >
>> >> > I am not still not quite sure about the configs.
>> >> >
>> >> > 100. "Whenever the "controller.listener.name" is set, upon broker
>> >> startup,
>> >> > we will validate its value and make sure it's different from the
>> >> > *inter-broker-listener-name *value." Does that mean that
>> >> > controller.listener.name has to be different from
>> >> > inter.broker.listener.name?
>> >> > That seems limiting.
>> >> >
>> >> > 101. The KIP says that advertised.listeners and listeners will now
>> have
>> >> a
>> >> > different default value including controller. Could you document what
>> >> the
>> >> > default value looks like?
>> >> >
>> >> > 102. About removing the the following configs. How does that affect
>> the
>> >> > upgrade path? Do we now expect a user to add a new config when
>> upgrading
>> >> > from an old version to a new one?
>> >> > host
>> >> > port
>> >> > advertised.host
>> >> > advertised.port
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> >
>> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com>
>> >> wrote:
>> >> >
>> >> > > @Jun Rao <ju...@confluent.io>
>> >> > >
>> >> > > One clarification, currently on the selector level, we have the
>> >> > > io-wait-ratio metric.
>> >> > > For the new controller *network* thread, we can use it directly for
>> >> > > IdlePct, instead of using 1- io-ratio,
>> >> > > so that the logic is similar to the current average IdlePct for
>> >> network
>> >> > > threads. Is that correct?
>> >> > >
>> >> > > I've revised the KIP by adding two new metrics for measuring the
>> >> IdlePct
>> >> > > for the two additional threads.
>> >> > > Please take a look again. Thanks!
>> >> > >
>> >> > > Lucas
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
>> >> > >
>> >> > > > Hi, Lucas,
>> >> > > >
>> >> > > > Thanks for the updated KIP.
>> >> > > >
>> >> > > > For monitoring the network thread utilization for the control
>> >> plane, we
>> >> > > > already have the metric io-ratio at the selector level (idlePct
>> is
>> >> 1 -
>> >> > > > io-ratio). So, we just need to give that selector a meaningful
>> name.
>> >> > > >
>> >> > > > For monitoring the io thread utilization for the control plane,
>> it's
>> >> > > > probably useful to have a separate metric for that. The
>> controller
>> >> > > request
>> >> > > > queue size may not reflect the history in a window.
>> >> > > >
>> >> > > > Jun
>> >> > > >
>> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <
>> lucasatucla@gmail.com>
>> >> > > wrote:
>> >> > > >
>> >> > > > > Thanks Jun for your quick response. It looks like I forgot to
>> >> click
>> >> > the
>> >> > > > > "Update" button, :)
>> >> > > > > It's updated now.
>> >> > > > >
>> >> > > > > Regarding the idle ratio metrics for the additional threads, I
>> >> > > discussed
>> >> > > > > with Joel,
>> >> > > > > and think they are not as useful, and I added our reasoning in
>> the
>> >> > last
>> >> > > > > paragraph of the
>> >> > > > > "How are controller requests handled over the dedicated
>> >> connections?"
>> >> > > > > section.
>> >> > > > > On the other hand, we don't strongly oppose adding them if you
>> >> think
>> >> > > they
>> >> > > > > are necessary.
>> >> > > > >
>> >> > > > > Thanks,
>> >> > > > > Lucas
>> >> > > > >
>> >> > > > >
>> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io>
>> wrote:
>> >> > > > >
>> >> > > > > > Hi, Lucas,
>> >> > > > > >
>> >> > > > > > Thanks for the reply. Have you actually updated the KIP? The
>> >> wiki
>> >> > > says
>> >> > > > > that
>> >> > > > > > it's last updated on Aug. 22. and some of the changes that
>> you
>> >> > > > mentioned
>> >> > > > > > (#1 and #3) are not there.
>> >> > > > > >
>> >> > > > > > Also, regarding Joel's comment on network/request idle ratio
>> >> > metrics,
>> >> > > > > could
>> >> > > > > > you comment on whether they include the new controller
>> >> listener? If
>> >> > > > not,
>> >> > > > > do
>> >> > > > > > we need additional metrics to measure the utilization of the
>> io
>> >> > > thread
>> >> > > > > for
>> >> > > > > > the control plane?
>> >> > > > > >
>> >> > > > > > Jun
>> >> > > > > >
>> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
>> >> lucasatucla@gmail.com
>> >> > >
>> >> > > > > wrote:
>> >> > > > > >
>> >> > > > > > > Thanks for the comments, Jun.
>> >> > > > > > >
>> >> > > > > > > 1. I think the answer should be no, since the "
>> >> > > > > > inter.broker.listener.name"
>> >> > > > > > > are also used
>> >> > > > > > > for replication traffic, and merging these two types of
>> >> request
>> >> > to
>> >> > > > the
>> >> > > > > > > single threaded tunnel
>> >> > > > > > > would defeat the purpose of this KIP and also hurt
>> replication
>> >> > > > > > throughput.
>> >> > > > > > > So I think that means
>> >> > > > > > > we should validate to make sure when the new config is set,
>> >> it's
>> >> > > > > > different
>> >> > > > > > > from "inter.broker.listener.name"
>> >> > > > > > > or "security.inter.broker.protocol", whichever is set.
>> >> > > > > > >
>> >> > > > > > > 2. Normally all broker configs in a given cluster are
>> changed
>> >> at
>> >> > > the
>> >> > > > > same
>> >> > > > > > > time. If there is a typo in the
>> >> > > > > > > controller.listener.name and it's not available in the
>> >> endpoints
>> >> > > > list,
>> >> > > > > > we
>> >> > > > > > > could catch it, give an error
>> >> > > > > > > and block restart of the first broker in the cluster. With
>> >> that,
>> >> > we
>> >> > > > > could
>> >> > > > > > > keep the current behavior
>> >> > > > > > > in the KIP write up that falls back to
>> >> > "inter.broker.listener.nam"
>> >> > > > when
>> >> > > > > > the
>> >> > > > > > > "controller.listener.name"
>> >> > > > > > > is not found during the migration phase of this KIP.
>> Thoughts?
>> >> > > > > > >
>> >> > > > > > > 3. That makes sense, and I've changed it.
>> >> > > > > > >
>> >> > > > > > > Thanks,
>> >> > > > > > > Lucas
>> >> > > > > > >
>> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io>
>> >> > wrote:
>> >> > > > > > >
>> >> > > > > > > > Hi, Lucas,
>> >> > > > > > > >
>> >> > > > > > > > Sorry for the delay. The new proposal looks good to me
>> >> > overall. A
>> >> > > > few
>> >> > > > > > > minor
>> >> > > > > > > > comments below.
>> >> > > > > > > >
>> >> > > > > > > > 1. It's possible that listener.name.for.controller is
>> set,
>> >> but
>> >> > > set
>> >> > > > to
>> >> > > > > > the
>> >> > > > > > > > same value as inter.broker.listener.name. In that case,
>> >> should
>> >> > > we
>> >> > > > > > have a
>> >> > > > > > > > single network thread and the request handling thread for
>> >> that
>> >> > > > > > listener?
>> >> > > > > > > >
>> >> > > > > > > > 2. Currently, the controller always picks the listener
>> >> > specified
>> >> > > by
>> >> > > > > > > > inter.broker.listener.name even if the listener name is
>> not
>> >> > > > present
>> >> > > > > in
>> >> > > > > > > the
>> >> > > > > > > > receiving broker. This KIP proposes a slightly different
>> >> > approach
>> >> > > > for
>> >> > > > > > > > picking listener.name.for.controller only when the
>> receiving
>> >> > end
>> >> > > > has
>> >> > > > > > the
>> >> > > > > > > > listener and switches listener.name.for.controller
>> >> otherwise.
>> >> > > There
>> >> > > > > are
>> >> > > > > > > > some tradeoffs between the two approaches. To change the
>> >> inter
>> >> > > > broker
>> >> > > > > > > > listener, the former requires 2 steps: (1) adding the new
>> >> > > listener
>> >> > > > to
>> >> > > > > > > > listener list in every broker and (2) changing
>> >> > > > > > > > listener.name.for.controller.
>> >> > > > > > > > The latter can do both changes in 1 step. On the hand, if
>> >> > > > > > > > listener.name.for.controller
>> >> > > > > > > > is mis-configured, the former will report an error and
>> the
>> >> > latter
>> >> > > > > will
>> >> > > > > > > hide
>> >> > > > > > > > it (so the user may not know the misconfiguration). It
>> seems
>> >> > that
>> >> > > > we
>> >> > > > > > > should
>> >> > > > > > > > pick one approach to handle both
>> >> listener.name.for.controller
>> >> > and
>> >> > > > > > > > inter.broker.listener.name consistently. To me, the
>> former
>> >> > seems
>> >> > > > > > > slightly
>> >> > > > > > > > better.
>> >> > > > > > > >
>> >> > > > > > > > 3. To be consistent with the existing naming, should
>> >> > > > > > > > listener.name.for.controller
>> >> > > > > > > > be controller.listener.name?
>> >> > > > > > > >
>> >> > > > > > > > Thanks,
>> >> > > > > > > >
>> >> > > > > > > > Jun
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
>> >> > > lucasatucla@gmail.com>
>> >> > > > > > > wrote:
>> >> > > > > > > >
>> >> > > > > > > > > Hi Jun and Joel,
>> >> > > > > > > > >
>> >> > > > > > > > > The KIP writeup has changed by quite a bit since your
>> +1.
>> >> > > > > > > > > Can you please take another review? Thanks a lot for
>> your
>> >> > time!
>> >> > > > > > > > >
>> >> > > > > > > > > Lucas
>> >> > > > > > > > >
>> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
>> >> > > > jjkoshy.w@gmail.com>
>> >> > > > > > > > wrote:
>> >> > > > > > > > >
>> >> > > > > > > > > > +1 on the KIP.
>> >> > > > > > > > > >
>> >> > > > > > > > > > (I'm not sure we actually necessary to introduce the
>> >> > > condition
>> >> > > > > > > > variables
>> >> > > > > > > > > > for the concern that Jun raised, but it's an
>> >> implementation
>> >> > > > > detail
>> >> > > > > > > that
>> >> > > > > > > > > we
>> >> > > > > > > > > > can defer to a discussion in the PR.)
>> >> > > > > > > > > >
>> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
>> >> > > > > > lucasatucla@gmail.com>
>> >> > > > > > > > > > wrote:
>> >> > > > > > > > > >
>> >> > > > > > > > > > > Hi Jun,
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > I agree by using the conditional variables, there
>> is
>> >> no
>> >> > > need
>> >> > > > to
>> >> > > > > > add
>> >> > > > > > > > > such
>> >> > > > > > > > > > a
>> >> > > > > > > > > > > new config.
>> >> > > > > > > > > > > Also thanks for approving this KIP.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Lucas
>> >> > > > > > > > > > >
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Hi Jun,

Thanks a lot for the detailed explanation.
I've restored the wiki to a previous version that does not require config
changes,
and keeps the current behavior with the proposed changes turned off by
default.
I'd appreciate it if you can review it again.

Thanks!
Lucas

On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> When upgrading to a minor release, I think the expectation is that a user
> wouldn't need to make any config changes, other than the usual
> inter.broker.protocol. If we require other config changes during an
> upgrade, then it's probably better to do that in a major release.
>
> Regarding your proposal, I think removing host/advertised_host in favor of
> listeners:advertised_listeners seems useful regardless of this KIP.
> However, that can probably wait until a major release.
>
> As for the controller listener, I am not sure if one has to set it. To make
> a cluster healthy, one sort of have to make sure that the request queue is
> never full and no request will be sitting in the request queue for long. If
> one does that, setting the controller listener may not be necessary. On the
> flip side, even if one sets the controller listener, but the request queue
> and the request time for the data part are still high, the cluster may
> still not be healthy. Given that we have already started the 2.1 release
> planning, perhaps we can start with not requiring the controller listener.
> If this is indeed something that everyone wants to set, we can make it a
> required config in a major release.
>
> Thanks,
>
> Jun
>
> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > @Jun Rao <ju...@confluent.io>
> >
> > I made the recent config changes after thinking about the default
> behavior
> > for adopting this KIP.
> > I think there are basically two options:
> > 1. By default, the behavior proposed in this KIP is turned off, and
> > operators can turn it
> > on by adding the "controller.listener.name" config and entries in the
> > "listeners" and "advertised.listeners" list.
> > If no "controller.listener.name" is added, it'll be the *same as* the "
> > inter.broker.listener.name",
> > and the proposed feature is effectively turned off.
> > This has been the assumption in the KIP writeup before.
> >
> > 2. By default, the behavior proposed in this KIP is turned on, and
> > operators are forced to
> > recognize the proposed change if their "listeners" config is set (this is
> > most likely in production environments),
> > by allocating a new port for controller connections, and adding a new
> > endpoint to the "listeners" config.
> > For cases where "listeners" is not set explicitly,
> > there needs to be a default value for it that includes the controller
> > listener name,
> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
> >
> > I chose to go with option 2 since as author of this KIP,
> > I naturally think in the long run, it's worth the effort to adopt this
> > feature,
> > in order to prevent issues under circumstances listed in the motivation
> > section.
> >
> > 100, following the argument above, I want to enforce the separation
> > between controller
> > and data plane requests. Hence the "controller.listener.name" should
> > never be the same
> > as the "inter.broker.listener.name", which is intended for data plane
> > requests.
> >
> > 101, the default value for "listeners" will be
> > "PLAINTEXT://:9092,CONTROLLER://:9091",
> > making values of "host", and "port" not being used under any config
> > settings.
> > And the default value for "advertised.listeners" will be derived from
> > "listeners",
> > making the values of "advertised.host", and "advertised.port" not being
> > used any more.
> >
> > 102, for upgrading, a single broker that has "listeners" and/or
> > "advertised.listeners" set,
> > must add a new endpoint for the CONTROLLER listener name, or end up using
> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> > During rolling upgrade, in cases of <old controller> + <new broker> or
> > <new controller>  + <old broker>
> > we still need to fall back to the current behavior. However after the
> > rolling upgrade is done,
> > it is guaranteed that the controller plane and data plane are separated,
> > given
> > the "controller.listener.name" must be different from "
> > inter.broker.listener.name".
> >
> > @Ismael Juma <is...@juma.me.uk>
> > Thanks for pointing that out. I did not know that.
> > However my question is if the argument above makes sense, and my code
> > change
> > causes the configs "host", "port", "advertised.host", "advertised.port"
> to
> > be not used under any circumstance,
> > then it's no different from removing them.
> > Anyway if there is still a concern about removing them, is there a new
> > major new version
> > now or in the future where I can remove them?
> >
> > Thanks!
> > Lucas
> >
> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk> wrote:
> >
> >> To be clear, we can only remove configs in major new versions.
> Otherwise,
> >> we can only deprecate them.
> >>
> >> Ismael
> >>
> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
> >>
> >> > Hi, Lucas,
> >> >
> >> > For the network idlePct, your understanding is correct. Currently,
> >> > networkIdlePct metric is calculated as the average of (1 - io-ratio)
> in
> >> the
> >> > selector of all network threads.
> >> >
> >> > The metrics part looks good to me in the updated KIP.
> >> >
> >> > I am not still not quite sure about the configs.
> >> >
> >> > 100. "Whenever the "controller.listener.name" is set, upon broker
> >> startup,
> >> > we will validate its value and make sure it's different from the
> >> > *inter-broker-listener-name *value." Does that mean that
> >> > controller.listener.name has to be different from
> >> > inter.broker.listener.name?
> >> > That seems limiting.
> >> >
> >> > 101. The KIP says that advertised.listeners and listeners will now
> have
> >> a
> >> > different default value including controller. Could you document what
> >> the
> >> > default value looks like?
> >> >
> >> > 102. About removing the the following configs. How does that affect
> the
> >> > upgrade path? Do we now expect a user to add a new config when
> upgrading
> >> > from an old version to a new one?
> >> > host
> >> > port
> >> > advertised.host
> >> > advertised.port
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com>
> >> wrote:
> >> >
> >> > > @Jun Rao <ju...@confluent.io>
> >> > >
> >> > > One clarification, currently on the selector level, we have the
> >> > > io-wait-ratio metric.
> >> > > For the new controller *network* thread, we can use it directly for
> >> > > IdlePct, instead of using 1- io-ratio,
> >> > > so that the logic is similar to the current average IdlePct for
> >> network
> >> > > threads. Is that correct?
> >> > >
> >> > > I've revised the KIP by adding two new metrics for measuring the
> >> IdlePct
> >> > > for the two additional threads.
> >> > > Please take a look again. Thanks!
> >> > >
> >> > > Lucas
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
> >> > >
> >> > > > Hi, Lucas,
> >> > > >
> >> > > > Thanks for the updated KIP.
> >> > > >
> >> > > > For monitoring the network thread utilization for the control
> >> plane, we
> >> > > > already have the metric io-ratio at the selector level (idlePct is
> >> 1 -
> >> > > > io-ratio). So, we just need to give that selector a meaningful
> name.
> >> > > >
> >> > > > For monitoring the io thread utilization for the control plane,
> it's
> >> > > > probably useful to have a separate metric for that. The controller
> >> > > request
> >> > > > queue size may not reflect the history in a window.
> >> > > >
> >> > > > Jun
> >> > > >
> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lucasatucla@gmail.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Thanks Jun for your quick response. It looks like I forgot to
> >> click
> >> > the
> >> > > > > "Update" button, :)
> >> > > > > It's updated now.
> >> > > > >
> >> > > > > Regarding the idle ratio metrics for the additional threads, I
> >> > > discussed
> >> > > > > with Joel,
> >> > > > > and think they are not as useful, and I added our reasoning in
> the
> >> > last
> >> > > > > paragraph of the
> >> > > > > "How are controller requests handled over the dedicated
> >> connections?"
> >> > > > > section.
> >> > > > > On the other hand, we don't strongly oppose adding them if you
> >> think
> >> > > they
> >> > > > > are necessary.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Lucas
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io>
> wrote:
> >> > > > >
> >> > > > > > Hi, Lucas,
> >> > > > > >
> >> > > > > > Thanks for the reply. Have you actually updated the KIP? The
> >> wiki
> >> > > says
> >> > > > > that
> >> > > > > > it's last updated on Aug. 22. and some of the changes that you
> >> > > > mentioned
> >> > > > > > (#1 and #3) are not there.
> >> > > > > >
> >> > > > > > Also, regarding Joel's comment on network/request idle ratio
> >> > metrics,
> >> > > > > could
> >> > > > > > you comment on whether they include the new controller
> >> listener? If
> >> > > > not,
> >> > > > > do
> >> > > > > > we need additional metrics to measure the utilization of the
> io
> >> > > thread
> >> > > > > for
> >> > > > > > the control plane?
> >> > > > > >
> >> > > > > > Jun
> >> > > > > >
> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> >> lucasatucla@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Thanks for the comments, Jun.
> >> > > > > > >
> >> > > > > > > 1. I think the answer should be no, since the "
> >> > > > > > inter.broker.listener.name"
> >> > > > > > > are also used
> >> > > > > > > for replication traffic, and merging these two types of
> >> request
> >> > to
> >> > > > the
> >> > > > > > > single threaded tunnel
> >> > > > > > > would defeat the purpose of this KIP and also hurt
> replication
> >> > > > > > throughput.
> >> > > > > > > So I think that means
> >> > > > > > > we should validate to make sure when the new config is set,
> >> it's
> >> > > > > > different
> >> > > > > > > from "inter.broker.listener.name"
> >> > > > > > > or "security.inter.broker.protocol", whichever is set.
> >> > > > > > >
> >> > > > > > > 2. Normally all broker configs in a given cluster are
> changed
> >> at
> >> > > the
> >> > > > > same
> >> > > > > > > time. If there is a typo in the
> >> > > > > > > controller.listener.name and it's not available in the
> >> endpoints
> >> > > > list,
> >> > > > > > we
> >> > > > > > > could catch it, give an error
> >> > > > > > > and block restart of the first broker in the cluster. With
> >> that,
> >> > we
> >> > > > > could
> >> > > > > > > keep the current behavior
> >> > > > > > > in the KIP write up that falls back to
> >> > "inter.broker.listener.nam"
> >> > > > when
> >> > > > > > the
> >> > > > > > > "controller.listener.name"
> >> > > > > > > is not found during the migration phase of this KIP.
> Thoughts?
> >> > > > > > >
> >> > > > > > > 3. That makes sense, and I've changed it.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Lucas
> >> > > > > > >
> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io>
> >> > wrote:
> >> > > > > > >
> >> > > > > > > > Hi, Lucas,
> >> > > > > > > >
> >> > > > > > > > Sorry for the delay. The new proposal looks good to me
> >> > overall. A
> >> > > > few
> >> > > > > > > minor
> >> > > > > > > > comments below.
> >> > > > > > > >
> >> > > > > > > > 1. It's possible that listener.name.for.controller is set,
> >> but
> >> > > set
> >> > > > to
> >> > > > > > the
> >> > > > > > > > same value as inter.broker.listener.name. In that case,
> >> should
> >> > > we
> >> > > > > > have a
> >> > > > > > > > single network thread and the request handling thread for
> >> that
> >> > > > > > listener?
> >> > > > > > > >
> >> > > > > > > > 2. Currently, the controller always picks the listener
> >> > specified
> >> > > by
> >> > > > > > > > inter.broker.listener.name even if the listener name is
> not
> >> > > > present
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > receiving broker. This KIP proposes a slightly different
> >> > approach
> >> > > > for
> >> > > > > > > > picking listener.name.for.controller only when the
> receiving
> >> > end
> >> > > > has
> >> > > > > > the
> >> > > > > > > > listener and switches listener.name.for.controller
> >> otherwise.
> >> > > There
> >> > > > > are
> >> > > > > > > > some tradeoffs between the two approaches. To change the
> >> inter
> >> > > > broker
> >> > > > > > > > listener, the former requires 2 steps: (1) adding the new
> >> > > listener
> >> > > > to
> >> > > > > > > > listener list in every broker and (2) changing
> >> > > > > > > > listener.name.for.controller.
> >> > > > > > > > The latter can do both changes in 1 step. On the hand, if
> >> > > > > > > > listener.name.for.controller
> >> > > > > > > > is mis-configured, the former will report an error and the
> >> > latter
> >> > > > > will
> >> > > > > > > hide
> >> > > > > > > > it (so the user may not know the misconfiguration). It
> seems
> >> > that
> >> > > > we
> >> > > > > > > should
> >> > > > > > > > pick one approach to handle both
> >> listener.name.for.controller
> >> > and
> >> > > > > > > > inter.broker.listener.name consistently. To me, the
> former
> >> > seems
> >> > > > > > > slightly
> >> > > > > > > > better.
> >> > > > > > > >
> >> > > > > > > > 3. To be consistent with the existing naming, should
> >> > > > > > > > listener.name.for.controller
> >> > > > > > > > be controller.listener.name?
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > >
> >> > > > > > > > Jun
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> >> > > lucasatucla@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Jun and Joel,
> >> > > > > > > > >
> >> > > > > > > > > The KIP writeup has changed by quite a bit since your
> +1.
> >> > > > > > > > > Can you please take another review? Thanks a lot for
> your
> >> > time!
> >> > > > > > > > >
> >> > > > > > > > > Lucas
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> >> > > > jjkoshy.w@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > +1 on the KIP.
> >> > > > > > > > > >
> >> > > > > > > > > > (I'm not sure we actually necessary to introduce the
> >> > > condition
> >> > > > > > > > variables
> >> > > > > > > > > > for the concern that Jun raised, but it's an
> >> implementation
> >> > > > > detail
> >> > > > > > > that
> >> > > > > > > > > we
> >> > > > > > > > > > can defer to a discussion in the PR.)
> >> > > > > > > > > >
> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> >> > > > > > lucasatucla@gmail.com>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi Jun,
> >> > > > > > > > > > >
> >> > > > > > > > > > > I agree by using the conditional variables, there is
> >> no
> >> > > need
> >> > > > to
> >> > > > > > add
> >> > > > > > > > > such
> >> > > > > > > > > > a
> >> > > > > > > > > > > new config.
> >> > > > > > > > > > > Also thanks for approving this KIP.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Lucas
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

When upgrading to a minor release, I think the expectation is that a user
wouldn't need to make any config changes, other than the usual
inter.broker.protocol. If we require other config changes during an
upgrade, then it's probably better to do that in a major release.

Regarding your proposal, I think removing host/advertised_host in favor of
listeners:advertised_listeners seems useful regardless of this KIP.
However, that can probably wait until a major release.

As for the controller listener, I am not sure if one has to set it. To make
a cluster healthy, one sort of have to make sure that the request queue is
never full and no request will be sitting in the request queue for long. If
one does that, setting the controller listener may not be necessary. On the
flip side, even if one sets the controller listener, but the request queue
and the request time for the data part are still high, the cluster may
still not be healthy. Given that we have already started the 2.1 release
planning, perhaps we can start with not requiring the controller listener.
If this is indeed something that everyone wants to set, we can make it a
required config in a major release.

Thanks,

Jun

On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lu...@gmail.com> wrote:

> @Jun Rao <ju...@confluent.io>
>
> I made the recent config changes after thinking about the default behavior
> for adopting this KIP.
> I think there are basically two options:
> 1. By default, the behavior proposed in this KIP is turned off, and
> operators can turn it
> on by adding the "controller.listener.name" config and entries in the
> "listeners" and "advertised.listeners" list.
> If no "controller.listener.name" is added, it'll be the *same as* the "
> inter.broker.listener.name",
> and the proposed feature is effectively turned off.
> This has been the assumption in the KIP writeup before.
>
> 2. By default, the behavior proposed in this KIP is turned on, and
> operators are forced to
> recognize the proposed change if their "listeners" config is set (this is
> most likely in production environments),
> by allocating a new port for controller connections, and adding a new
> endpoint to the "listeners" config.
> For cases where "listeners" is not set explicitly,
> there needs to be a default value for it that includes the controller
> listener name,
> e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"
>
> I chose to go with option 2 since as author of this KIP,
> I naturally think in the long run, it's worth the effort to adopt this
> feature,
> in order to prevent issues under circumstances listed in the motivation
> section.
>
> 100, following the argument above, I want to enforce the separation
> between controller
> and data plane requests. Hence the "controller.listener.name" should
> never be the same
> as the "inter.broker.listener.name", which is intended for data plane
> requests.
>
> 101, the default value for "listeners" will be
> "PLAINTEXT://:9092,CONTROLLER://:9091",
> making values of "host", and "port" not being used under any config
> settings.
> And the default value for "advertised.listeners" will be derived from
> "listeners",
> making the values of "advertised.host", and "advertised.port" not being
> used any more.
>
> 102, for upgrading, a single broker that has "listeners" and/or
> "advertised.listeners" set,
> must add a new endpoint for the CONTROLLER listener name, or end up using
> the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
> During rolling upgrade, in cases of <old controller> + <new broker> or
> <new controller>  + <old broker>
> we still need to fall back to the current behavior. However after the
> rolling upgrade is done,
> it is guaranteed that the controller plane and data plane are separated,
> given
> the "controller.listener.name" must be different from "
> inter.broker.listener.name".
>
> @Ismael Juma <is...@juma.me.uk>
> Thanks for pointing that out. I did not know that.
> However my question is if the argument above makes sense, and my code
> change
> causes the configs "host", "port", "advertised.host", "advertised.port" to
> be not used under any circumstance,
> then it's no different from removing them.
> Anyway if there is still a concern about removing them, is there a new
> major new version
> now or in the future where I can remove them?
>
> Thanks!
> Lucas
>
> On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk> wrote:
>
>> To be clear, we can only remove configs in major new versions. Otherwise,
>> we can only deprecate them.
>>
>> Ismael
>>
>> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
>>
>> > Hi, Lucas,
>> >
>> > For the network idlePct, your understanding is correct. Currently,
>> > networkIdlePct metric is calculated as the average of (1 - io-ratio) in
>> the
>> > selector of all network threads.
>> >
>> > The metrics part looks good to me in the updated KIP.
>> >
>> > I am not still not quite sure about the configs.
>> >
>> > 100. "Whenever the "controller.listener.name" is set, upon broker
>> startup,
>> > we will validate its value and make sure it's different from the
>> > *inter-broker-listener-name *value." Does that mean that
>> > controller.listener.name has to be different from
>> > inter.broker.listener.name?
>> > That seems limiting.
>> >
>> > 101. The KIP says that advertised.listeners and listeners will now have
>> a
>> > different default value including controller. Could you document what
>> the
>> > default value looks like?
>> >
>> > 102. About removing the the following configs. How does that affect the
>> > upgrade path? Do we now expect a user to add a new config when upgrading
>> > from an old version to a new one?
>> > host
>> > port
>> > advertised.host
>> > advertised.port
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com>
>> wrote:
>> >
>> > > @Jun Rao <ju...@confluent.io>
>> > >
>> > > One clarification, currently on the selector level, we have the
>> > > io-wait-ratio metric.
>> > > For the new controller *network* thread, we can use it directly for
>> > > IdlePct, instead of using 1- io-ratio,
>> > > so that the logic is similar to the current average IdlePct for
>> network
>> > > threads. Is that correct?
>> > >
>> > > I've revised the KIP by adding two new metrics for measuring the
>> IdlePct
>> > > for the two additional threads.
>> > > Please take a look again. Thanks!
>> > >
>> > > Lucas
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
>> > >
>> > > > Hi, Lucas,
>> > > >
>> > > > Thanks for the updated KIP.
>> > > >
>> > > > For monitoring the network thread utilization for the control
>> plane, we
>> > > > already have the metric io-ratio at the selector level (idlePct is
>> 1 -
>> > > > io-ratio). So, we just need to give that selector a meaningful name.
>> > > >
>> > > > For monitoring the io thread utilization for the control plane, it's
>> > > > probably useful to have a separate metric for that. The controller
>> > > request
>> > > > queue size may not reflect the history in a window.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Thanks Jun for your quick response. It looks like I forgot to
>> click
>> > the
>> > > > > "Update" button, :)
>> > > > > It's updated now.
>> > > > >
>> > > > > Regarding the idle ratio metrics for the additional threads, I
>> > > discussed
>> > > > > with Joel,
>> > > > > and think they are not as useful, and I added our reasoning in the
>> > last
>> > > > > paragraph of the
>> > > > > "How are controller requests handled over the dedicated
>> connections?"
>> > > > > section.
>> > > > > On the other hand, we don't strongly oppose adding them if you
>> think
>> > > they
>> > > > > are necessary.
>> > > > >
>> > > > > Thanks,
>> > > > > Lucas
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
>> > > > >
>> > > > > > Hi, Lucas,
>> > > > > >
>> > > > > > Thanks for the reply. Have you actually updated the KIP? The
>> wiki
>> > > says
>> > > > > that
>> > > > > > it's last updated on Aug. 22. and some of the changes that you
>> > > > mentioned
>> > > > > > (#1 and #3) are not there.
>> > > > > >
>> > > > > > Also, regarding Joel's comment on network/request idle ratio
>> > metrics,
>> > > > > could
>> > > > > > you comment on whether they include the new controller
>> listener? If
>> > > > not,
>> > > > > do
>> > > > > > we need additional metrics to measure the utilization of the io
>> > > thread
>> > > > > for
>> > > > > > the control plane?
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
>> lucasatucla@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Thanks for the comments, Jun.
>> > > > > > >
>> > > > > > > 1. I think the answer should be no, since the "
>> > > > > > inter.broker.listener.name"
>> > > > > > > are also used
>> > > > > > > for replication traffic, and merging these two types of
>> request
>> > to
>> > > > the
>> > > > > > > single threaded tunnel
>> > > > > > > would defeat the purpose of this KIP and also hurt replication
>> > > > > > throughput.
>> > > > > > > So I think that means
>> > > > > > > we should validate to make sure when the new config is set,
>> it's
>> > > > > > different
>> > > > > > > from "inter.broker.listener.name"
>> > > > > > > or "security.inter.broker.protocol", whichever is set.
>> > > > > > >
>> > > > > > > 2. Normally all broker configs in a given cluster are changed
>> at
>> > > the
>> > > > > same
>> > > > > > > time. If there is a typo in the
>> > > > > > > controller.listener.name and it's not available in the
>> endpoints
>> > > > list,
>> > > > > > we
>> > > > > > > could catch it, give an error
>> > > > > > > and block restart of the first broker in the cluster. With
>> that,
>> > we
>> > > > > could
>> > > > > > > keep the current behavior
>> > > > > > > in the KIP write up that falls back to
>> > "inter.broker.listener.nam"
>> > > > when
>> > > > > > the
>> > > > > > > "controller.listener.name"
>> > > > > > > is not found during the migration phase of this KIP. Thoughts?
>> > > > > > >
>> > > > > > > 3. That makes sense, and I've changed it.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Lucas
>> > > > > > >
>> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io>
>> > wrote:
>> > > > > > >
>> > > > > > > > Hi, Lucas,
>> > > > > > > >
>> > > > > > > > Sorry for the delay. The new proposal looks good to me
>> > overall. A
>> > > > few
>> > > > > > > minor
>> > > > > > > > comments below.
>> > > > > > > >
>> > > > > > > > 1. It's possible that listener.name.for.controller is set,
>> but
>> > > set
>> > > > to
>> > > > > > the
>> > > > > > > > same value as inter.broker.listener.name. In that case,
>> should
>> > > we
>> > > > > > have a
>> > > > > > > > single network thread and the request handling thread for
>> that
>> > > > > > listener?
>> > > > > > > >
>> > > > > > > > 2. Currently, the controller always picks the listener
>> > specified
>> > > by
>> > > > > > > > inter.broker.listener.name even if the listener name is not
>> > > > present
>> > > > > in
>> > > > > > > the
>> > > > > > > > receiving broker. This KIP proposes a slightly different
>> > approach
>> > > > for
>> > > > > > > > picking listener.name.for.controller only when the receiving
>> > end
>> > > > has
>> > > > > > the
>> > > > > > > > listener and switches listener.name.for.controller
>> otherwise.
>> > > There
>> > > > > are
>> > > > > > > > some tradeoffs between the two approaches. To change the
>> inter
>> > > > broker
>> > > > > > > > listener, the former requires 2 steps: (1) adding the new
>> > > listener
>> > > > to
>> > > > > > > > listener list in every broker and (2) changing
>> > > > > > > > listener.name.for.controller.
>> > > > > > > > The latter can do both changes in 1 step. On the hand, if
>> > > > > > > > listener.name.for.controller
>> > > > > > > > is mis-configured, the former will report an error and the
>> > latter
>> > > > > will
>> > > > > > > hide
>> > > > > > > > it (so the user may not know the misconfiguration). It seems
>> > that
>> > > > we
>> > > > > > > should
>> > > > > > > > pick one approach to handle both
>> listener.name.for.controller
>> > and
>> > > > > > > > inter.broker.listener.name consistently. To me, the former
>> > seems
>> > > > > > > slightly
>> > > > > > > > better.
>> > > > > > > >
>> > > > > > > > 3. To be consistent with the existing naming, should
>> > > > > > > > listener.name.for.controller
>> > > > > > > > be controller.listener.name?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jun
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
>> > > lucasatucla@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Jun and Joel,
>> > > > > > > > >
>> > > > > > > > > The KIP writeup has changed by quite a bit since your +1.
>> > > > > > > > > Can you please take another review? Thanks a lot for your
>> > time!
>> > > > > > > > >
>> > > > > > > > > Lucas
>> > > > > > > > >
>> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
>> > > > jjkoshy.w@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > +1 on the KIP.
>> > > > > > > > > >
>> > > > > > > > > > (I'm not sure we actually necessary to introduce the
>> > > condition
>> > > > > > > > variables
>> > > > > > > > > > for the concern that Jun raised, but it's an
>> implementation
>> > > > > detail
>> > > > > > > that
>> > > > > > > > > we
>> > > > > > > > > > can defer to a discussion in the PR.)
>> > > > > > > > > >
>> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
>> > > > > > lucasatucla@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Jun,
>> > > > > > > > > > >
>> > > > > > > > > > > I agree by using the conditional variables, there is
>> no
>> > > need
>> > > > to
>> > > > > > add
>> > > > > > > > > such
>> > > > > > > > > > a
>> > > > > > > > > > > new config.
>> > > > > > > > > > > Also thanks for approving this KIP.
>> > > > > > > > > > >
>> > > > > > > > > > > Lucas
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
@Jun Rao <ju...@confluent.io>

I made the recent config changes after thinking about the default behavior
for adopting this KIP.
I think there are basically two options:
1. By default, the behavior proposed in this KIP is turned off, and
operators can turn it
on by adding the "controller.listener.name" config and entries in the
"listeners" and "advertised.listeners" list.
If no "controller.listener.name" is added, it'll be the *same as* the "
inter.broker.listener.name",
and the proposed feature is effectively turned off.
This has been the assumption in the KIP writeup before.

2. By default, the behavior proposed in this KIP is turned on, and
operators are forced to
recognize the proposed change if their "listeners" config is set (this is
most likely in production environments),
by allocating a new port for controller connections, and adding a new
endpoint to the "listeners" config.
For cases where "listeners" is not set explicitly,
there needs to be a default value for it that includes the controller
listener name,
e.g. "PLAINTEXT://:9092,CONTROLLER://:9091"

I chose to go with option 2 since as author of this KIP,
I naturally think in the long run, it's worth the effort to adopt this
feature,
in order to prevent issues under circumstances listed in the motivation
section.

100, following the argument above, I want to enforce the separation between
controller
and data plane requests. Hence the "controller.listener.name" should never
be the same
as the "inter.broker.listener.name", which is intended for data plane
requests.

101, the default value for "listeners" will be
"PLAINTEXT://:9092,CONTROLLER://:9091",
making values of "host", and "port" not being used under any config
settings.
And the default value for "advertised.listeners" will be derived from
"listeners",
making the values of "advertised.host", and "advertised.port" not being
used any more.

102, for upgrading, a single broker that has "listeners" and/or
"advertised.listeners" set,
must add a new endpoint for the CONTROLLER listener name, or end up using
the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091".
During rolling upgrade, in cases of <old controller> + <new broker> or <new
controller>  + <old broker>
we still need to fall back to the current behavior. However after the
rolling upgrade is done,
it is guaranteed that the controller plane and data plane are separated,
given
the "controller.listener.name" must be different from "
inter.broker.listener.name".

@Ismael Juma <is...@juma.me.uk>
Thanks for pointing that out. I did not know that.
However my question is if the argument above makes sense, and my code change
causes the configs "host", "port", "advertised.host", "advertised.port" to
be not used under any circumstance,
then it's no different from removing them.
Anyway if there is still a concern about removing them, is there a new
major new version
now or in the future where I can remove them?

Thanks!
Lucas

On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <is...@juma.me.uk> wrote:

> To be clear, we can only remove configs in major new versions. Otherwise,
> we can only deprecate them.
>
> Ismael
>
> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > For the network idlePct, your understanding is correct. Currently,
> > networkIdlePct metric is calculated as the average of (1 - io-ratio) in
> the
> > selector of all network threads.
> >
> > The metrics part looks good to me in the updated KIP.
> >
> > I am not still not quite sure about the configs.
> >
> > 100. "Whenever the "controller.listener.name" is set, upon broker
> startup,
> > we will validate its value and make sure it's different from the
> > *inter-broker-listener-name *value." Does that mean that
> > controller.listener.name has to be different from
> > inter.broker.listener.name?
> > That seems limiting.
> >
> > 101. The KIP says that advertised.listeners and listeners will now have a
> > different default value including controller. Could you document what the
> > default value looks like?
> >
> > 102. About removing the the following configs. How does that affect the
> > upgrade path? Do we now expect a user to add a new config when upgrading
> > from an old version to a new one?
> > host
> > port
> > advertised.host
> > advertised.port
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > @Jun Rao <ju...@confluent.io>
> > >
> > > One clarification, currently on the selector level, we have the
> > > io-wait-ratio metric.
> > > For the new controller *network* thread, we can use it directly for
> > > IdlePct, instead of using 1- io-ratio,
> > > so that the logic is similar to the current average IdlePct for network
> > > threads. Is that correct?
> > >
> > > I've revised the KIP by adding two new metrics for measuring the
> IdlePct
> > > for the two additional threads.
> > > Please take a look again. Thanks!
> > >
> > > Lucas
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > Thanks for the updated KIP.
> > > >
> > > > For monitoring the network thread utilization for the control plane,
> we
> > > > already have the metric io-ratio at the selector level (idlePct is 1
> -
> > > > io-ratio). So, we just need to give that selector a meaningful name.
> > > >
> > > > For monitoring the io thread utilization for the control plane, it's
> > > > probably useful to have a separate metric for that. The controller
> > > request
> > > > queue size may not reflect the history in a window.
> > > >
> > > > Jun
> > > >
> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks Jun for your quick response. It looks like I forgot to click
> > the
> > > > > "Update" button, :)
> > > > > It's updated now.
> > > > >
> > > > > Regarding the idle ratio metrics for the additional threads, I
> > > discussed
> > > > > with Joel,
> > > > > and think they are not as useful, and I added our reasoning in the
> > last
> > > > > paragraph of the
> > > > > "How are controller requests handled over the dedicated
> connections?"
> > > > > section.
> > > > > On the other hand, we don't strongly oppose adding them if you
> think
> > > they
> > > > > are necessary.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > >
> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Lucas,
> > > > > >
> > > > > > Thanks for the reply. Have you actually updated the KIP? The wiki
> > > says
> > > > > that
> > > > > > it's last updated on Aug. 22. and some of the changes that you
> > > > mentioned
> > > > > > (#1 and #3) are not there.
> > > > > >
> > > > > > Also, regarding Joel's comment on network/request idle ratio
> > metrics,
> > > > > could
> > > > > > you comment on whether they include the new controller listener?
> If
> > > > not,
> > > > > do
> > > > > > we need additional metrics to measure the utilization of the io
> > > thread
> > > > > for
> > > > > > the control plane?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for the comments, Jun.
> > > > > > >
> > > > > > > 1. I think the answer should be no, since the "
> > > > > > inter.broker.listener.name"
> > > > > > > are also used
> > > > > > > for replication traffic, and merging these two types of request
> > to
> > > > the
> > > > > > > single threaded tunnel
> > > > > > > would defeat the purpose of this KIP and also hurt replication
> > > > > > throughput.
> > > > > > > So I think that means
> > > > > > > we should validate to make sure when the new config is set,
> it's
> > > > > > different
> > > > > > > from "inter.broker.listener.name"
> > > > > > > or "security.inter.broker.protocol", whichever is set.
> > > > > > >
> > > > > > > 2. Normally all broker configs in a given cluster are changed
> at
> > > the
> > > > > same
> > > > > > > time. If there is a typo in the
> > > > > > > controller.listener.name and it's not available in the
> endpoints
> > > > list,
> > > > > > we
> > > > > > > could catch it, give an error
> > > > > > > and block restart of the first broker in the cluster. With
> that,
> > we
> > > > > could
> > > > > > > keep the current behavior
> > > > > > > in the KIP write up that falls back to
> > "inter.broker.listener.nam"
> > > > when
> > > > > > the
> > > > > > > "controller.listener.name"
> > > > > > > is not found during the migration phase of this KIP. Thoughts?
> > > > > > >
> > > > > > > 3. That makes sense, and I've changed it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Lucas,
> > > > > > > >
> > > > > > > > Sorry for the delay. The new proposal looks good to me
> > overall. A
> > > > few
> > > > > > > minor
> > > > > > > > comments below.
> > > > > > > >
> > > > > > > > 1. It's possible that listener.name.for.controller is set,
> but
> > > set
> > > > to
> > > > > > the
> > > > > > > > same value as inter.broker.listener.name. In that case,
> should
> > > we
> > > > > > have a
> > > > > > > > single network thread and the request handling thread for
> that
> > > > > > listener?
> > > > > > > >
> > > > > > > > 2. Currently, the controller always picks the listener
> > specified
> > > by
> > > > > > > > inter.broker.listener.name even if the listener name is not
> > > > present
> > > > > in
> > > > > > > the
> > > > > > > > receiving broker. This KIP proposes a slightly different
> > approach
> > > > for
> > > > > > > > picking listener.name.for.controller only when the receiving
> > end
> > > > has
> > > > > > the
> > > > > > > > listener and switches listener.name.for.controller otherwise.
> > > There
> > > > > are
> > > > > > > > some tradeoffs between the two approaches. To change the
> inter
> > > > broker
> > > > > > > > listener, the former requires 2 steps: (1) adding the new
> > > listener
> > > > to
> > > > > > > > listener list in every broker and (2) changing
> > > > > > > > listener.name.for.controller.
> > > > > > > > The latter can do both changes in 1 step. On the hand, if
> > > > > > > > listener.name.for.controller
> > > > > > > > is mis-configured, the former will report an error and the
> > latter
> > > > > will
> > > > > > > hide
> > > > > > > > it (so the user may not know the misconfiguration). It seems
> > that
> > > > we
> > > > > > > should
> > > > > > > > pick one approach to handle both listener.name.for.controller
> > and
> > > > > > > > inter.broker.listener.name consistently. To me, the former
> > seems
> > > > > > > slightly
> > > > > > > > better.
> > > > > > > >
> > > > > > > > 3. To be consistent with the existing naming, should
> > > > > > > > listener.name.for.controller
> > > > > > > > be controller.listener.name?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun and Joel,
> > > > > > > > >
> > > > > > > > > The KIP writeup has changed by quite a bit since your +1.
> > > > > > > > > Can you please take another review? Thanks a lot for your
> > time!
> > > > > > > > >
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > > > jjkoshy.w@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 on the KIP.
> > > > > > > > > >
> > > > > > > > > > (I'm not sure we actually necessary to introduce the
> > > condition
> > > > > > > > variables
> > > > > > > > > > for the concern that Jun raised, but it's an
> implementation
> > > > > detail
> > > > > > > that
> > > > > > > > > we
> > > > > > > > > > can defer to a discussion in the PR.)
> > > > > > > > > >
> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Jun,
> > > > > > > > > > >
> > > > > > > > > > > I agree by using the conditional variables, there is no
> > > need
> > > > to
> > > > > > add
> > > > > > > > > such
> > > > > > > > > > a
> > > > > > > > > > > new config.
> > > > > > > > > > > Also thanks for approving this KIP.
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Ismael Juma <is...@juma.me.uk>.
To be clear, we can only remove configs in major new versions. Otherwise,
we can only deprecate them.

Ismael

On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> For the network idlePct, your understanding is correct. Currently,
> networkIdlePct metric is calculated as the average of (1 - io-ratio) in the
> selector of all network threads.
>
> The metrics part looks good to me in the updated KIP.
>
> I am not still not quite sure about the configs.
>
> 100. "Whenever the "controller.listener.name" is set, upon broker startup,
> we will validate its value and make sure it's different from the
> *inter-broker-listener-name *value." Does that mean that
> controller.listener.name has to be different from
> inter.broker.listener.name?
> That seems limiting.
>
> 101. The KIP says that advertised.listeners and listeners will now have a
> different default value including controller. Could you document what the
> default value looks like?
>
> 102. About removing the the following configs. How does that affect the
> upgrade path? Do we now expect a user to add a new config when upgrading
> from an old version to a new one?
> host
> port
> advertised.host
> advertised.port
>
> Thanks,
>
> Jun
>
>
> On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > @Jun Rao <ju...@confluent.io>
> >
> > One clarification, currently on the selector level, we have the
> > io-wait-ratio metric.
> > For the new controller *network* thread, we can use it directly for
> > IdlePct, instead of using 1- io-ratio,
> > so that the logic is similar to the current average IdlePct for network
> > threads. Is that correct?
> >
> > I've revised the KIP by adding two new metrics for measuring the IdlePct
> > for the two additional threads.
> > Please take a look again. Thanks!
> >
> > Lucas
> >
> >
> >
> >
> >
> > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > Thanks for the updated KIP.
> > >
> > > For monitoring the network thread utilization for the control plane, we
> > > already have the metric io-ratio at the selector level (idlePct is 1 -
> > > io-ratio). So, we just need to give that selector a meaningful name.
> > >
> > > For monitoring the io thread utilization for the control plane, it's
> > > probably useful to have a separate metric for that. The controller
> > request
> > > queue size may not reflect the history in a window.
> > >
> > > Jun
> > >
> > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Thanks Jun for your quick response. It looks like I forgot to click
> the
> > > > "Update" button, :)
> > > > It's updated now.
> > > >
> > > > Regarding the idle ratio metrics for the additional threads, I
> > discussed
> > > > with Joel,
> > > > and think they are not as useful, and I added our reasoning in the
> last
> > > > paragraph of the
> > > > "How are controller requests handled over the dedicated connections?"
> > > > section.
> > > > On the other hand, we don't strongly oppose adding them if you think
> > they
> > > > are necessary.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Lucas,
> > > > >
> > > > > Thanks for the reply. Have you actually updated the KIP? The wiki
> > says
> > > > that
> > > > > it's last updated on Aug. 22. and some of the changes that you
> > > mentioned
> > > > > (#1 and #3) are not there.
> > > > >
> > > > > Also, regarding Joel's comment on network/request idle ratio
> metrics,
> > > > could
> > > > > you comment on whether they include the new controller listener? If
> > > not,
> > > > do
> > > > > we need additional metrics to measure the utilization of the io
> > thread
> > > > for
> > > > > the control plane?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Thanks for the comments, Jun.
> > > > > >
> > > > > > 1. I think the answer should be no, since the "
> > > > > inter.broker.listener.name"
> > > > > > are also used
> > > > > > for replication traffic, and merging these two types of request
> to
> > > the
> > > > > > single threaded tunnel
> > > > > > would defeat the purpose of this KIP and also hurt replication
> > > > > throughput.
> > > > > > So I think that means
> > > > > > we should validate to make sure when the new config is set, it's
> > > > > different
> > > > > > from "inter.broker.listener.name"
> > > > > > or "security.inter.broker.protocol", whichever is set.
> > > > > >
> > > > > > 2. Normally all broker configs in a given cluster are changed at
> > the
> > > > same
> > > > > > time. If there is a typo in the
> > > > > > controller.listener.name and it's not available in the endpoints
> > > list,
> > > > > we
> > > > > > could catch it, give an error
> > > > > > and block restart of the first broker in the cluster. With that,
> we
> > > > could
> > > > > > keep the current behavior
> > > > > > in the KIP write up that falls back to
> "inter.broker.listener.nam"
> > > when
> > > > > the
> > > > > > "controller.listener.name"
> > > > > > is not found during the migration phase of this KIP. Thoughts?
> > > > > >
> > > > > > 3. That makes sense, and I've changed it.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Lucas,
> > > > > > >
> > > > > > > Sorry for the delay. The new proposal looks good to me
> overall. A
> > > few
> > > > > > minor
> > > > > > > comments below.
> > > > > > >
> > > > > > > 1. It's possible that listener.name.for.controller is set, but
> > set
> > > to
> > > > > the
> > > > > > > same value as inter.broker.listener.name. In that case, should
> > we
> > > > > have a
> > > > > > > single network thread and the request handling thread for that
> > > > > listener?
> > > > > > >
> > > > > > > 2. Currently, the controller always picks the listener
> specified
> > by
> > > > > > > inter.broker.listener.name even if the listener name is not
> > > present
> > > > in
> > > > > > the
> > > > > > > receiving broker. This KIP proposes a slightly different
> approach
> > > for
> > > > > > > picking listener.name.for.controller only when the receiving
> end
> > > has
> > > > > the
> > > > > > > listener and switches listener.name.for.controller otherwise.
> > There
> > > > are
> > > > > > > some tradeoffs between the two approaches. To change the inter
> > > broker
> > > > > > > listener, the former requires 2 steps: (1) adding the new
> > listener
> > > to
> > > > > > > listener list in every broker and (2) changing
> > > > > > > listener.name.for.controller.
> > > > > > > The latter can do both changes in 1 step. On the hand, if
> > > > > > > listener.name.for.controller
> > > > > > > is mis-configured, the former will report an error and the
> latter
> > > > will
> > > > > > hide
> > > > > > > it (so the user may not know the misconfiguration). It seems
> that
> > > we
> > > > > > should
> > > > > > > pick one approach to handle both listener.name.for.controller
> and
> > > > > > > inter.broker.listener.name consistently. To me, the former
> seems
> > > > > > slightly
> > > > > > > better.
> > > > > > >
> > > > > > > 3. To be consistent with the existing naming, should
> > > > > > > listener.name.for.controller
> > > > > > > be controller.listener.name?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun and Joel,
> > > > > > > >
> > > > > > > > The KIP writeup has changed by quite a bit since your +1.
> > > > > > > > Can you please take another review? Thanks a lot for your
> time!
> > > > > > > >
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > > jjkoshy.w@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 on the KIP.
> > > > > > > > >
> > > > > > > > > (I'm not sure we actually necessary to introduce the
> > condition
> > > > > > > variables
> > > > > > > > > for the concern that Jun raised, but it's an implementation
> > > > detail
> > > > > > that
> > > > > > > > we
> > > > > > > > > can defer to a discussion in the PR.)
> > > > > > > > >
> > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > I agree by using the conditional variables, there is no
> > need
> > > to
> > > > > add
> > > > > > > > such
> > > > > > > > > a
> > > > > > > > > > new config.
> > > > > > > > > > Also thanks for approving this KIP.
> > > > > > > > > >
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

For the network idlePct, your understanding is correct. Currently,
networkIdlePct metric is calculated as the average of (1 - io-ratio) in the
selector of all network threads.

The metrics part looks good to me in the updated KIP.

I am not still not quite sure about the configs.

100. "Whenever the "controller.listener.name" is set, upon broker startup,
we will validate its value and make sure it's different from the
*inter-broker-listener-name *value." Does that mean that
controller.listener.name has to be different from inter.broker.listener.name?
That seems limiting.

101. The KIP says that advertised.listeners and listeners will now have a
different default value including controller. Could you document what the
default value looks like?

102. About removing the the following configs. How does that affect the
upgrade path? Do we now expect a user to add a new config when upgrading
from an old version to a new one?
host
port
advertised.host
advertised.port

Thanks,

Jun


On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lu...@gmail.com> wrote:

> @Jun Rao <ju...@confluent.io>
>
> One clarification, currently on the selector level, we have the
> io-wait-ratio metric.
> For the new controller *network* thread, we can use it directly for
> IdlePct, instead of using 1- io-ratio,
> so that the logic is similar to the current average IdlePct for network
> threads. Is that correct?
>
> I've revised the KIP by adding two new metrics for measuring the IdlePct
> for the two additional threads.
> Please take a look again. Thanks!
>
> Lucas
>
>
>
>
>
> On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > Thanks for the updated KIP.
> >
> > For monitoring the network thread utilization for the control plane, we
> > already have the metric io-ratio at the selector level (idlePct is 1 -
> > io-ratio). So, we just need to give that selector a meaningful name.
> >
> > For monitoring the io thread utilization for the control plane, it's
> > probably useful to have a separate metric for that. The controller
> request
> > queue size may not reflect the history in a window.
> >
> > Jun
> >
> > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks Jun for your quick response. It looks like I forgot to click the
> > > "Update" button, :)
> > > It's updated now.
> > >
> > > Regarding the idle ratio metrics for the additional threads, I
> discussed
> > > with Joel,
> > > and think they are not as useful, and I added our reasoning in the last
> > > paragraph of the
> > > "How are controller requests handled over the dedicated connections?"
> > > section.
> > > On the other hand, we don't strongly oppose adding them if you think
> they
> > > are necessary.
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > Thanks for the reply. Have you actually updated the KIP? The wiki
> says
> > > that
> > > > it's last updated on Aug. 22. and some of the changes that you
> > mentioned
> > > > (#1 and #3) are not there.
> > > >
> > > > Also, regarding Joel's comment on network/request idle ratio metrics,
> > > could
> > > > you comment on whether they include the new controller listener? If
> > not,
> > > do
> > > > we need additional metrics to measure the utilization of the io
> thread
> > > for
> > > > the control plane?
> > > >
> > > > Jun
> > > >
> > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for the comments, Jun.
> > > > >
> > > > > 1. I think the answer should be no, since the "
> > > > inter.broker.listener.name"
> > > > > are also used
> > > > > for replication traffic, and merging these two types of request to
> > the
> > > > > single threaded tunnel
> > > > > would defeat the purpose of this KIP and also hurt replication
> > > > throughput.
> > > > > So I think that means
> > > > > we should validate to make sure when the new config is set, it's
> > > > different
> > > > > from "inter.broker.listener.name"
> > > > > or "security.inter.broker.protocol", whichever is set.
> > > > >
> > > > > 2. Normally all broker configs in a given cluster are changed at
> the
> > > same
> > > > > time. If there is a typo in the
> > > > > controller.listener.name and it's not available in the endpoints
> > list,
> > > > we
> > > > > could catch it, give an error
> > > > > and block restart of the first broker in the cluster. With that, we
> > > could
> > > > > keep the current behavior
> > > > > in the KIP write up that falls back to "inter.broker.listener.nam"
> > when
> > > > the
> > > > > "controller.listener.name"
> > > > > is not found during the migration phase of this KIP. Thoughts?
> > > > >
> > > > > 3. That makes sense, and I've changed it.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Lucas,
> > > > > >
> > > > > > Sorry for the delay. The new proposal looks good to me overall. A
> > few
> > > > > minor
> > > > > > comments below.
> > > > > >
> > > > > > 1. It's possible that listener.name.for.controller is set, but
> set
> > to
> > > > the
> > > > > > same value as inter.broker.listener.name. In that case, should
> we
> > > > have a
> > > > > > single network thread and the request handling thread for that
> > > > listener?
> > > > > >
> > > > > > 2. Currently, the controller always picks the listener specified
> by
> > > > > > inter.broker.listener.name even if the listener name is not
> > present
> > > in
> > > > > the
> > > > > > receiving broker. This KIP proposes a slightly different approach
> > for
> > > > > > picking listener.name.for.controller only when the receiving end
> > has
> > > > the
> > > > > > listener and switches listener.name.for.controller otherwise.
> There
> > > are
> > > > > > some tradeoffs between the two approaches. To change the inter
> > broker
> > > > > > listener, the former requires 2 steps: (1) adding the new
> listener
> > to
> > > > > > listener list in every broker and (2) changing
> > > > > > listener.name.for.controller.
> > > > > > The latter can do both changes in 1 step. On the hand, if
> > > > > > listener.name.for.controller
> > > > > > is mis-configured, the former will report an error and the latter
> > > will
> > > > > hide
> > > > > > it (so the user may not know the misconfiguration). It seems that
> > we
> > > > > should
> > > > > > pick one approach to handle both listener.name.for.controller and
> > > > > > inter.broker.listener.name consistently. To me, the former seems
> > > > > slightly
> > > > > > better.
> > > > > >
> > > > > > 3. To be consistent with the existing naming, should
> > > > > > listener.name.for.controller
> > > > > > be controller.listener.name?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Jun and Joel,
> > > > > > >
> > > > > > > The KIP writeup has changed by quite a bit since your +1.
> > > > > > > Can you please take another review? Thanks a lot for your time!
> > > > > > >
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> > jjkoshy.w@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on the KIP.
> > > > > > > >
> > > > > > > > (I'm not sure we actually necessary to introduce the
> condition
> > > > > > variables
> > > > > > > > for the concern that Jun raised, but it's an implementation
> > > detail
> > > > > that
> > > > > > > we
> > > > > > > > can defer to a discussion in the PR.)
> > > > > > > >
> > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > I agree by using the conditional variables, there is no
> need
> > to
> > > > add
> > > > > > > such
> > > > > > > > a
> > > > > > > > > new config.
> > > > > > > > > Also thanks for approving this KIP.
> > > > > > > > >
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
@Jun Rao <ju...@confluent.io>

One clarification, currently on the selector level, we have the
io-wait-ratio metric.
For the new controller *network* thread, we can use it directly for
IdlePct, instead of using 1- io-ratio,
so that the logic is similar to the current average IdlePct for network
threads. Is that correct?

I've revised the KIP by adding two new metrics for measuring the IdlePct
for the two additional threads.
Please take a look again. Thanks!

Lucas





On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> Thanks for the updated KIP.
>
> For monitoring the network thread utilization for the control plane, we
> already have the metric io-ratio at the selector level (idlePct is 1 -
> io-ratio). So, we just need to give that selector a meaningful name.
>
> For monitoring the io thread utilization for the control plane, it's
> probably useful to have a separate metric for that. The controller request
> queue size may not reflect the history in a window.
>
> Jun
>
> On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks Jun for your quick response. It looks like I forgot to click the
> > "Update" button, :)
> > It's updated now.
> >
> > Regarding the idle ratio metrics for the additional threads, I discussed
> > with Joel,
> > and think they are not as useful, and I added our reasoning in the last
> > paragraph of the
> > "How are controller requests handled over the dedicated connections?"
> > section.
> > On the other hand, we don't strongly oppose adding them if you think they
> > are necessary.
> >
> > Thanks,
> > Lucas
> >
> >
> > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > Thanks for the reply. Have you actually updated the KIP? The wiki says
> > that
> > > it's last updated on Aug. 22. and some of the changes that you
> mentioned
> > > (#1 and #3) are not there.
> > >
> > > Also, regarding Joel's comment on network/request idle ratio metrics,
> > could
> > > you comment on whether they include the new controller listener? If
> not,
> > do
> > > we need additional metrics to measure the utilization of the io thread
> > for
> > > the control plane?
> > >
> > > Jun
> > >
> > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the comments, Jun.
> > > >
> > > > 1. I think the answer should be no, since the "
> > > inter.broker.listener.name"
> > > > are also used
> > > > for replication traffic, and merging these two types of request to
> the
> > > > single threaded tunnel
> > > > would defeat the purpose of this KIP and also hurt replication
> > > throughput.
> > > > So I think that means
> > > > we should validate to make sure when the new config is set, it's
> > > different
> > > > from "inter.broker.listener.name"
> > > > or "security.inter.broker.protocol", whichever is set.
> > > >
> > > > 2. Normally all broker configs in a given cluster are changed at the
> > same
> > > > time. If there is a typo in the
> > > > controller.listener.name and it's not available in the endpoints
> list,
> > > we
> > > > could catch it, give an error
> > > > and block restart of the first broker in the cluster. With that, we
> > could
> > > > keep the current behavior
> > > > in the KIP write up that falls back to "inter.broker.listener.nam"
> when
> > > the
> > > > "controller.listener.name"
> > > > is not found during the migration phase of this KIP. Thoughts?
> > > >
> > > > 3. That makes sense, and I've changed it.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Lucas,
> > > > >
> > > > > Sorry for the delay. The new proposal looks good to me overall. A
> few
> > > > minor
> > > > > comments below.
> > > > >
> > > > > 1. It's possible that listener.name.for.controller is set, but set
> to
> > > the
> > > > > same value as inter.broker.listener.name. In that case, should we
> > > have a
> > > > > single network thread and the request handling thread for that
> > > listener?
> > > > >
> > > > > 2. Currently, the controller always picks the listener specified by
> > > > > inter.broker.listener.name even if the listener name is not
> present
> > in
> > > > the
> > > > > receiving broker. This KIP proposes a slightly different approach
> for
> > > > > picking listener.name.for.controller only when the receiving end
> has
> > > the
> > > > > listener and switches listener.name.for.controller otherwise. There
> > are
> > > > > some tradeoffs between the two approaches. To change the inter
> broker
> > > > > listener, the former requires 2 steps: (1) adding the new listener
> to
> > > > > listener list in every broker and (2) changing
> > > > > listener.name.for.controller.
> > > > > The latter can do both changes in 1 step. On the hand, if
> > > > > listener.name.for.controller
> > > > > is mis-configured, the former will report an error and the latter
> > will
> > > > hide
> > > > > it (so the user may not know the misconfiguration). It seems that
> we
> > > > should
> > > > > pick one approach to handle both listener.name.for.controller and
> > > > > inter.broker.listener.name consistently. To me, the former seems
> > > > slightly
> > > > > better.
> > > > >
> > > > > 3. To be consistent with the existing naming, should
> > > > > listener.name.for.controller
> > > > > be controller.listener.name?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Jun and Joel,
> > > > > >
> > > > > > The KIP writeup has changed by quite a bit since your +1.
> > > > > > Can you please take another review? Thanks a lot for your time!
> > > > > >
> > > > > > Lucas
> > > > > >
> > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <
> jjkoshy.w@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 on the KIP.
> > > > > > >
> > > > > > > (I'm not sure we actually necessary to introduce the condition
> > > > > variables
> > > > > > > for the concern that Jun raised, but it's an implementation
> > detail
> > > > that
> > > > > > we
> > > > > > > can defer to a discussion in the PR.)
> > > > > > >
> > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I agree by using the conditional variables, there is no need
> to
> > > add
> > > > > > such
> > > > > > > a
> > > > > > > > new config.
> > > > > > > > Also thanks for approving this KIP.
> > > > > > > >
> > > > > > > > Lucas
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.
Hi, Lucas,

Thanks for the updated KIP.

For monitoring the network thread utilization for the control plane, we
already have the metric io-ratio at the selector level (idlePct is 1 -
io-ratio). So, we just need to give that selector a meaningful name.

For monitoring the io thread utilization for the control plane, it's
probably useful to have a separate metric for that. The controller request
queue size may not reflect the history in a window.

Jun

On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks Jun for your quick response. It looks like I forgot to click the
> "Update" button, :)
> It's updated now.
>
> Regarding the idle ratio metrics for the additional threads, I discussed
> with Joel,
> and think they are not as useful, and I added our reasoning in the last
> paragraph of the
> "How are controller requests handled over the dedicated connections?"
> section.
> On the other hand, we don't strongly oppose adding them if you think they
> are necessary.
>
> Thanks,
> Lucas
>
>
> On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas,
> >
> > Thanks for the reply. Have you actually updated the KIP? The wiki says
> that
> > it's last updated on Aug. 22. and some of the changes that you mentioned
> > (#1 and #3) are not there.
> >
> > Also, regarding Joel's comment on network/request idle ratio metrics,
> could
> > you comment on whether they include the new controller listener? If not,
> do
> > we need additional metrics to measure the utilization of the io thread
> for
> > the control plane?
> >
> > Jun
> >
> > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks for the comments, Jun.
> > >
> > > 1. I think the answer should be no, since the "
> > inter.broker.listener.name"
> > > are also used
> > > for replication traffic, and merging these two types of request to the
> > > single threaded tunnel
> > > would defeat the purpose of this KIP and also hurt replication
> > throughput.
> > > So I think that means
> > > we should validate to make sure when the new config is set, it's
> > different
> > > from "inter.broker.listener.name"
> > > or "security.inter.broker.protocol", whichever is set.
> > >
> > > 2. Normally all broker configs in a given cluster are changed at the
> same
> > > time. If there is a typo in the
> > > controller.listener.name and it's not available in the endpoints list,
> > we
> > > could catch it, give an error
> > > and block restart of the first broker in the cluster. With that, we
> could
> > > keep the current behavior
> > > in the KIP write up that falls back to "inter.broker.listener.nam" when
> > the
> > > "controller.listener.name"
> > > is not found during the migration phase of this KIP. Thoughts?
> > >
> > > 3. That makes sense, and I've changed it.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Lucas,
> > > >
> > > > Sorry for the delay. The new proposal looks good to me overall. A few
> > > minor
> > > > comments below.
> > > >
> > > > 1. It's possible that listener.name.for.controller is set, but set to
> > the
> > > > same value as inter.broker.listener.name. In that case, should we
> > have a
> > > > single network thread and the request handling thread for that
> > listener?
> > > >
> > > > 2. Currently, the controller always picks the listener specified by
> > > > inter.broker.listener.name even if the listener name is not present
> in
> > > the
> > > > receiving broker. This KIP proposes a slightly different approach for
> > > > picking listener.name.for.controller only when the receiving end has
> > the
> > > > listener and switches listener.name.for.controller otherwise. There
> are
> > > > some tradeoffs between the two approaches. To change the inter broker
> > > > listener, the former requires 2 steps: (1) adding the new listener to
> > > > listener list in every broker and (2) changing
> > > > listener.name.for.controller.
> > > > The latter can do both changes in 1 step. On the hand, if
> > > > listener.name.for.controller
> > > > is mis-configured, the former will report an error and the latter
> will
> > > hide
> > > > it (so the user may not know the misconfiguration). It seems that we
> > > should
> > > > pick one approach to handle both listener.name.for.controller and
> > > > inter.broker.listener.name consistently. To me, the former seems
> > > slightly
> > > > better.
> > > >
> > > > 3. To be consistent with the existing naming, should
> > > > listener.name.for.controller
> > > > be controller.listener.name?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun and Joel,
> > > > >
> > > > > The KIP writeup has changed by quite a bit since your +1.
> > > > > Can you please take another review? Thanks a lot for your time!
> > > > >
> > > > > Lucas
> > > > >
> > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <jj...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > +1 on the KIP.
> > > > > >
> > > > > > (I'm not sure we actually necessary to introduce the condition
> > > > variables
> > > > > > for the concern that Jun raised, but it's an implementation
> detail
> > > that
> > > > > we
> > > > > > can defer to a discussion in the PR.)
> > > > > >
> > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I agree by using the conditional variables, there is no need to
> > add
> > > > > such
> > > > > > a
> > > > > > > new config.
> > > > > > > Also thanks for approving this KIP.
> > > > > > >
> > > > > > > Lucas
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.
Thanks Jun for your quick response. It looks like I forgot to click the
"Update" button, :)
It's updated now.

Regarding the idle ratio metrics for the additional threads, I discussed
with Joel,
and think they are not as useful, and I added our reasoning in the last
paragraph of the
"How are controller requests handled over the dedicated connections?"
section.
On the other hand, we don't strongly oppose adding them if you think they
are necessary.

Thanks,
Lucas


On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas,
>
> Thanks for the reply. Have you actually updated the KIP? The wiki says that
> it's last updated on Aug. 22. and some of the changes that you mentioned
> (#1 and #3) are not there.
>
> Also, regarding Joel's comment on network/request idle ratio metrics, could
> you comment on whether they include the new controller listener? If not, do
> we need additional metrics to measure the utilization of the io thread for
> the control plane?
>
> Jun
>
> On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the comments, Jun.
> >
> > 1. I think the answer should be no, since the "
> inter.broker.listener.name"
> > are also used
> > for replication traffic, and merging these two types of request to the
> > single threaded tunnel
> > would defeat the purpose of this KIP and also hurt replication
> throughput.
> > So I think that means
> > we should validate to make sure when the new config is set, it's
> different
> > from "inter.broker.listener.name"
> > or "security.inter.broker.protocol", whichever is set.
> >
> > 2. Normally all broker configs in a given cluster are changed at the same
> > time. If there is a typo in the
> > controller.listener.name and it's not available in the endpoints list,
> we
> > could catch it, give an error
> > and block restart of the first broker in the cluster. With that, we could
> > keep the current behavior
> > in the KIP write up that falls back to "inter.broker.listener.nam" when
> the
> > "controller.listener.name"
> > is not found during the migration phase of this KIP. Thoughts?
> >
> > 3. That makes sense, and I've changed it.
> >
> > Thanks,
> > Lucas
> >
> > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas,
> > >
> > > Sorry for the delay. The new proposal looks good to me overall. A few
> > minor
> > > comments below.
> > >
> > > 1. It's possible that listener.name.for.controller is set, but set to
> the
> > > same value as inter.broker.listener.name. In that case, should we
> have a
> > > single network thread and the request handling thread for that
> listener?
> > >
> > > 2. Currently, the controller always picks the listener specified by
> > > inter.broker.listener.name even if the listener name is not present in
> > the
> > > receiving broker. This KIP proposes a slightly different approach for
> > > picking listener.name.for.controller only when the receiving end has
> the
> > > listener and switches listener.name.for.controller otherwise. There are
> > > some tradeoffs between the two approaches. To change the inter broker
> > > listener, the former requires 2 steps: (1) adding the new listener to
> > > listener list in every broker and (2) changing
> > > listener.name.for.controller.
> > > The latter can do both changes in 1 step. On the hand, if
> > > listener.name.for.controller
> > > is mis-configured, the former will report an error and the latter will
> > hide
> > > it (so the user may not know the misconfiguration). It seems that we
> > should
> > > pick one approach to handle both listener.name.for.controller and
> > > inter.broker.listener.name consistently. To me, the former seems
> > slightly
> > > better.
> > >
> > > 3. To be consistent with the existing naming, should
> > > listener.name.for.controller
> > > be controller.listener.name?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun and Joel,
> > > >
> > > > The KIP writeup has changed by quite a bit since your +1.
> > > > Can you please take another review? Thanks a lot for your time!
> > > >
> > > > Lucas
> > > >
> > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy <jj...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 on the KIP.
> > > > >
> > > > > (I'm not sure we actually necessary to introduce the condition
> > > variables
> > > > > for the concern that Jun raised, but it's an implementation detail
> > that
> > > > we
> > > > > can defer to a discussion in the PR.)
> > > > >
> > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I agree by using the conditional variables, there is no need to
> add
> > > > such
> > > > > a
> > > > > > new config.
> > > > > > Also thanks for approving this KIP.
> > > > > >
> > > > > > Lucas
> > > > > >
> > > > >
> > > >
> > >
> >
>