You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Lucas Wang <lu...@gmail.com> on 2018/06/13 19:45:07 UTC

[DISCUSS] KIP-291: Have separate queues for control requests and data requests

Hi Kafka experts,

I created KIP-291 to add a separate queue for controller requests:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%3A+Have+separate+queues+for+control+requests+and+data+requests

Can you please take a look and let me know your feedback?

Thanks a lot for your time!
Regards,
Lucas

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for the comments, Dong.

I've updated the KIP and addressed your 3 comments.
Please take another look when you get a chance.

Lucas

On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Luca,
>
> Thanks for the KIP. Looks good overall. Some comments below:
>
> - We usually specify the full mbean for the new metrics in the KIP. Can you
> specify it in the Public Interface section similar to KIP-237
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 237%3A+More+Controller+Health+Metrics>
> ?
>
> - Maybe we could follow the same pattern as KIP-153
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> where we keep the existing sensor name "BytesInPerSec" and add a new sensor
> "ReplicationBytesInPerSec", rather than replacing the sensor name "
> BytesInPerSec" with e.g. "ClientBytesInPerSec".
>
> - It seems that the KIP changes the semantics of the broker config
> "queued.max.requests" because the number of total requests queued in the
> broker will be no longer bounded by "queued.max.requests". This probably
> needs to be specified in the Public Interfaces section for discussion.
>
>
> Thanks,
> Dong
>
>
> On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> > Hi Kafka experts,
> >
> > I created KIP-291 to add a separate queue for controller requests:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > 3A+Have+separate+queues+for+control+requests+and+data+requests
> >
> > Can you please take a look and let me know your feedback?
> >
> > Thanks a lot for your time!
> > Regards,
> > Lucas
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Revision on the description below, instead of a binary flag blockedOnHead,
there should be a variable to record the number of threads blocked on the
headNotFullCondition.

On Fri, Jul 20, 2018 at 2:19 PM, Lucas Wang <lu...@gmail.com> wrote:

> Great, it seems we are kind of converging to use correlation id to prevent
> out-of-order issues.
>
> Meanwhile, to address the concern raised by Jun
> "One potential issue with the dequeue approach is that if the queue is
> full,
> there is no guarantee that the controller requests will be enqueued
> quickly."
>
> we can still use the deque idea, only with our own implementation. It's
> something like
> the following diagram with 3 condition variables, and a flag
> - tailNotFullCondition used to block processor threads trying to enqueue a
> data request, when the queue is full
> - headNotFullCondition used to block processor threads trying to enqueue a
> controller request, when the queue is full
> - notEmptyCondition used to block io threads when the queue is empty
> - the blockedOnHead flag to indicate whether any thread is blocked on the
> headNotFullCondition
>
>
> The benefit with this approach is still that no public interface change is
> needed, and
> a processor thread trying to enqueue a controller request will always be
> waken up first
> when the queue is full.
> In terms of implementation complexity, it's quite similar to the separate
> queue approach. Thoughts?
>
> Thanks,
> Lucas
>
>
>
>
>
> On Thu, Jul 19, 2018 at 11:18 PM, Becket Qin <be...@gmail.com> wrote:
>
>> Lucas and Mayuresh,
>>
>> Good idea. The correlation id should work.
>>
>> In the ControllerChannelManager, a request will be resent until a response
>> is received. So if the controller to broker connection disconnects after
>> controller sends R1_a, but before the response of R1_a is received, a
>> disconnection may cause the controller to resend R1_b. i.e. until R1 is
>> acked, R2 won't be sent by the controller.
>> This gives two guarantees:
>> 1. Correlation id wise: R1_a < R1_b < R2.
>> 2. On the broker side, when R2 is seen, R1 must have been processed at
>> least once.
>>
>> So on the broker side, with a single thread controller request handler,
>> the
>> logic should be:
>> 1. Process what ever request seen in the controller request queue
>> 2. For the given epoch, drop request if its correlation id is smaller than
>> that of the last processed request.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
>>
>> > I agree that there is no strong ordering when there are more than one
>> > socket connections. Currently, we rely on controllerEpoch and
>> leaderEpoch
>> > to ensure that the receiving broker picks up the latest state for each
>> > partition.
>> >
>> > One potential issue with the dequeue approach is that if the queue is
>> full,
>> > there is no guarantee that the controller requests will be enqueued
>> > quickly.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
>> > gharatmayuresh15@gmail.com
>> > > wrote:
>> >
>> > > Yea, the correlationId is only set to 0 in the NetworkClient
>> constructor.
>> > > Since we reuse the same NetworkClient between Controller and the
>> broker,
>> > a
>> > > disconnection should not cause it to reset to 0, in which case it can
>> be
>> > > used to reject obsolete requests.
>> > >
>> > > Thanks,
>> > >
>> > > Mayuresh
>> > >
>> > > On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
>> > wrote:
>> > >
>> > > > @Dong,
>> > > > Great example and explanation, thanks!
>> > > >
>> > > > @All
>> > > > Regarding the example given by Dong, it seems even if we use a
>> queue,
>> > > and a
>> > > > dedicated controller request handling thread,
>> > > > the same result can still happen because R1_a will be sent on one
>> > > > connection, and R1_b & R2 will be sent on a different connection,
>> > > > and there is no ordering between different connections on the broker
>> > > side.
>> > > > I was discussing with Mayuresh offline, and it seems correlation id
>> > > within
>> > > > the same NetworkClient object is monotonically increasing and never
>> > > reset,
>> > > > hence a broker can leverage that to properly reject obsolete
>> requests.
>> > > > Thoughts?
>> > > >
>> > > > Thanks,
>> > > > Lucas
>> > > >
>> > > > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
>> > > > gharatmayuresh15@gmail.com> wrote:
>> > > >
>> > > > > Actually nvm, correlationId is reset in case of connection loss, I
>> > > think.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Mayuresh
>> > > > >
>> > > > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
>> > > > > gharatmayuresh15@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I agree with Dong that out-of-order processing can happen with
>> > > having 2
>> > > > > > separate queues as well and it can even happen today.
>> > > > > > Can we use the correlationId in the request from the controller
>> to
>> > > the
>> > > > > > broker to handle ordering ?
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Mayuresh
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
>> becket.qin@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > >> Good point, Joel. I agree that a dedicated controller request
>> > > handling
>> > > > > >> thread would be a better isolation. It also solves the
>> reordering
>> > > > issue.
>> > > > > >>
>> > > > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
>> jjkoshy.w@gmail.com>
>> > > > > wrote:
>> > > > > >>
>> > > > > >> > Good example. I think this scenario can occur in the current
>> > code
>> > > as
>> > > > > >> well
>> > > > > >> > but with even lower probability given that there are other
>> > > > > >> non-controller
>> > > > > >> > requests interleaved. It is still sketchy though and I think
>> a
>> > > safer
>> > > > > >> > approach would be separate queues and pinning controller
>> request
>> > > > > >> handling
>> > > > > >> > to one handler thread.
>> > > > > >> >
>> > > > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
>> lindong28@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >> >
>> > > > > >> > > Hey Becket,
>> > > > > >> > >
>> > > > > >> > > I think you are right that there may be out-of-order
>> > processing.
>> > > > > >> However,
>> > > > > >> > > it seems that out-of-order processing may also happen even
>> if
>> > we
>> > > > > use a
>> > > > > >> > > separate queue.
>> > > > > >> > >
>> > > > > >> > > Here is the example:
>> > > > > >> > >
>> > > > > >> > > - Controller sends R1 and got disconnected before receiving
>> > > > > response.
>> > > > > >> > Then
>> > > > > >> > > it reconnects and sends R2. Both requests now stay in the
>> > > > controller
>> > > > > >> > > request queue in the order they are sent.
>> > > > > >> > > - thread1 takes R1_a from the request queue and then
>> thread2
>> > > takes
>> > > > > R2
>> > > > > >> > from
>> > > > > >> > > the request queue almost at the same time.
>> > > > > >> > > - So R1_a and R2 are processed in parallel. There is chance
>> > that
>> > > > > R2's
>> > > > > >> > > processing is completed before R1.
>> > > > > >> > >
>> > > > > >> > > If out-of-order processing can happen for both approaches
>> with
>> > > > very
>> > > > > >> low
>> > > > > >> > > probability, it may not be worthwhile to add the extra
>> queue.
>> > > What
>> > > > > do
>> > > > > >> you
>> > > > > >> > > think?
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > > Dong
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
>> > > becket.qin@gmail.com
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > >
>> > > > > >> > > > Hi Mayuresh/Joel,
>> > > > > >> > > >
>> > > > > >> > > > Using the request channel as a dequeue was bright up some
>> > time
>> > > > ago
>> > > > > >> when
>> > > > > >> > > we
>> > > > > >> > > > initially thinking of prioritizing the request. The
>> concern
>> > > was
>> > > > > that
>> > > > > >> > the
>> > > > > >> > > > controller requests are supposed to be processed in
>> order.
>> > If
>> > > we
>> > > > > can
>> > > > > >> > > ensure
>> > > > > >> > > > that there is one controller request in the request
>> channel,
>> > > the
>> > > > > >> order
>> > > > > >> > is
>> > > > > >> > > > not a concern. But in cases that there are more than one
>> > > > > controller
>> > > > > >> > > request
>> > > > > >> > > > inserted into the queue, the controller request order may
>> > > change
>> > > > > and
>> > > > > >> > > cause
>> > > > > >> > > > problem. For example, think about the following sequence:
>> > > > > >> > > > 1. Controller successfully sent a request R1 to broker
>> > > > > >> > > > 2. Broker receives R1 and put the request to the head of
>> the
>> > > > > request
>> > > > > >> > > queue.
>> > > > > >> > > > 3. Controller to broker connection failed and the
>> controller
>> > > > > >> > reconnected
>> > > > > >> > > to
>> > > > > >> > > > the broker.
>> > > > > >> > > > 4. Controller sends a request R2 to the broker
>> > > > > >> > > > 5. Broker receives R2 and add it to the head of the
>> request
>> > > > queue.
>> > > > > >> > > > Now on the broker side, R2 will be processed before R1 is
>> > > > > processed,
>> > > > > >> > > which
>> > > > > >> > > > may cause problem.
>> > > > > >> > > >
>> > > > > >> > > > Thanks,
>> > > > > >> > > >
>> > > > > >> > > > Jiangjie (Becket) Qin
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
>> > > > jjkoshy.w@gmail.com>
>> > > > > >> > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > @Mayuresh - I like your idea. It appears to be a
>> simpler
>> > > less
>> > > > > >> > invasive
>> > > > > >> > > > > alternative and it should work. Jun/Becket/others, do
>> you
>> > > see
>> > > > > any
>> > > > > >> > > > pitfalls
>> > > > > >> > > > > with this approach?
>> > > > > >> > > > >
>> > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>> > > > > >> lucasatucla@gmail.com>
>> > > > > >> > > > > wrote:
>> > > > > >> > > > >
>> > > > > >> > > > > > @Mayuresh,
>> > > > > >> > > > > > That's a very interesting idea that I haven't thought
>> > > > before.
>> > > > > >> > > > > > It seems to solve our problem at hand pretty well,
>> and
>> > > also
>> > > > > >> > > > > > avoids the need to have a new size metric and
>> capacity
>> > > > config
>> > > > > >> > > > > > for the controller request queue. In fact, if we
>> were to
>> > > > adopt
>> > > > > >> > > > > > this design, there is no public interface change,
>> and we
>> > > > > >> > > > > > probably don't need a KIP.
>> > > > > >> > > > > > Also implementation wise, it seems
>> > > > > >> > > > > > the java class LinkedBlockingQueue can readily
>> satisfy
>> > the
>> > > > > >> > > requirement
>> > > > > >> > > > > > by supporting a capacity, and also allowing
>> inserting at
>> > > > both
>> > > > > >> ends.
>> > > > > >> > > > > >
>> > > > > >> > > > > > My only concern is that this design is tied to the
>> > > > coincidence
>> > > > > >> that
>> > > > > >> > > > > > we have two request priorities and there are two ends
>> > to a
>> > > > > >> deque.
>> > > > > >> > > > > > Hence by using the proposed design, it seems the
>> network
>> > > > layer
>> > > > > >> is
>> > > > > >> > > > > > more tightly coupled with upper layer logic, e.g. if
>> we
>> > > were
>> > > > > to
>> > > > > >> add
>> > > > > >> > > > > > an extra priority level in the future for some
>> reason,
>> > we
>> > > > > would
>> > > > > >> > > > probably
>> > > > > >> > > > > > need to go back to the design of separate queues, one
>> > for
>> > > > each
>> > > > > >> > > priority
>> > > > > >> > > > > > level.
>> > > > > >> > > > > >
>> > > > > >> > > > > > In summary, I'm ok with both designs and lean toward
>> > your
>> > > > > >> suggested
>> > > > > >> > > > > > approach.
>> > > > > >> > > > > > Let's hear what others think.
>> > > > > >> > > > > >
>> > > > > >> > > > > > @Becket,
>> > > > > >> > > > > > In light of Mayuresh's suggested new design, I'm
>> > answering
>> > > > > your
>> > > > > >> > > > question
>> > > > > >> > > > > > only in the context
>> > > > > >> > > > > > of the current KIP design: I think your suggestion
>> makes
>> > > > > sense,
>> > > > > >> and
>> > > > > >> > > I'm
>> > > > > >> > > > > ok
>> > > > > >> > > > > > with removing the capacity config and
>> > > > > >> > > > > > just relying on the default value of 20 being
>> sufficient
>> > > > > enough.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Thanks,
>> > > > > >> > > > > > Lucas
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>> > > > > >> > > > > > gharatmayuresh15@gmail.com
>> > > > > >> > > > > > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > > > Hi Lucas,
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Seems like the main intent here is to prioritize
>> the
>> > > > > >> controller
>> > > > > >> > > > request
>> > > > > >> > > > > > > over any other requests.
>> > > > > >> > > > > > > In that case, we can change the request queue to a
>> > > > dequeue,
>> > > > > >> where
>> > > > > >> > > you
>> > > > > >> > > > > > > always insert the normal requests (produce,
>> > > consume,..etc)
>> > > > > to
>> > > > > >> the
>> > > > > >> > > end
>> > > > > >> > > > > of
>> > > > > >> > > > > > > the dequeue, but if its a controller request, you
>> > insert
>> > > > it
>> > > > > to
>> > > > > >> > the
>> > > > > >> > > > head
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > the queue. This ensures that the controller request
>> > will
>> > > > be
>> > > > > >> given
>> > > > > >> > > > > higher
>> > > > > >> > > > > > > priority over other requests.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Also since we only read one request from the socket
>> > and
>> > > > mute
>> > > > > >> it
>> > > > > >> > and
>> > > > > >> > > > > only
>> > > > > >> > > > > > > unmute it after handling the request, this would
>> > ensure
>> > > > that
>> > > > > >> we
>> > > > > >> > > don't
>> > > > > >> > > > > > > handle controller requests out of order.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > With this approach we can avoid the second queue
>> and
>> > the
>> > > > > >> > additional
>> > > > > >> > > > > > config
>> > > > > >> > > > > > > for the size of the queue.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > What do you think ?
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Thanks,
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Mayuresh
>> > > > > >> > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>> > > > > >> becket.qin@gmail.com
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > Hey Joel,
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Thank for the detail explanation. I agree the
>> > current
>> > > > > design
>> > > > > >> > > makes
>> > > > > >> > > > > > sense.
>> > > > > >> > > > > > > > My confusion is about whether the new config for
>> the
>> > > > > >> controller
>> > > > > >> > > > queue
>> > > > > >> > > > > > > > capacity is necessary. I cannot think of a case
>> in
>> > > which
>> > > > > >> users
>> > > > > >> > > > would
>> > > > > >> > > > > > > change
>> > > > > >> > > > > > > > it.
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Thanks,
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Jiangjie (Becket) Qin
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
>> > > > > >> > > becket.qin@gmail.com>
>> > > > > >> > > > > > > wrote:
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > Hi Lucas,
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > I guess my question can be rephrased to "do we
>> > > expect
>> > > > > >> user to
>> > > > > >> > > > ever
>> > > > > >> > > > > > > change
>> > > > > >> > > > > > > > > the controller request queue capacity"? If we
>> > agree
>> > > > that
>> > > > > >> 20
>> > > > > >> > is
>> > > > > >> > > > > > already
>> > > > > >> > > > > > > a
>> > > > > >> > > > > > > > > very generous default number and we do not
>> expect
>> > > user
>> > > > > to
>> > > > > >> > > change
>> > > > > >> > > > > it,
>> > > > > >> > > > > > is
>> > > > > >> > > > > > > > it
>> > > > > >> > > > > > > > > still necessary to expose this as a config?
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Thanks,
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Jiangjie (Becket) Qin
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>> > > > > >> > > > lucasatucla@gmail.com
>> > > > > >> > > > > >
>> > > > > >> > > > > > > > wrote:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >> @Becket
>> > > > > >> > > > > > > > >> 1. Thanks for the comment. You are right that
>> > > > normally
>> > > > > >> there
>> > > > > >> > > > > should
>> > > > > >> > > > > > be
>> > > > > >> > > > > > > > >> just
>> > > > > >> > > > > > > > >> one controller request because of muting,
>> > > > > >> > > > > > > > >> and I had NOT intended to say there would be
>> many
>> > > > > >> enqueued
>> > > > > >> > > > > > controller
>> > > > > >> > > > > > > > >> requests.
>> > > > > >> > > > > > > > >> I went through the KIP again, and I'm not sure
>> > > which
>> > > > > part
>> > > > > >> > > > conveys
>> > > > > >> > > > > > that
>> > > > > >> > > > > > > > >> info.
>> > > > > >> > > > > > > > >> I'd be happy to revise if you point it out the
>> > > > section.
>> > > > > >> > > > > > > > >>
>> > > > > >> > > > > > > > >> 2. Though it should not happen in normal
>> > > conditions,
>> > > > > the
>> > > > > >> > > current
>> > > > > >> > > > > > > design
>> > > > > >> > > > > > > > >> does not preclude multiple controllers running
>> > > > > >> > > > > > > > >> at the same time, hence if we don't have the
>> > > > controller
>> > > > > >> > queue
>> > > > > >> > > > > > capacity
>> > > > > >> > > > > > > > >> config and simply make its capacity to be 1,
>> > > > > >> > > > > > > > >> network threads handling requests from
>> different
>> > > > > >> controllers
>> > > > > >> > > > will
>> > > > > >> > > > > be
>> > > > > >> > > > > > > > >> blocked during those troublesome times,
>> > > > > >> > > > > > > > >> which is probably not what we want. On the
>> other
>> > > > hand,
>> > > > > >> > adding
>> > > > > >> > > > the
>> > > > > >> > > > > > > extra
>> > > > > >> > > > > > > > >> config with a default value, say 20, guards us
>> > from
>> > > > > >> issues
>> > > > > >> > in
>> > > > > >> > > > > those
>> > > > > >> > > > > > > > >> troublesome times, and IMO there isn't much
>> > > downside
>> > > > of
>> > > > > >> > adding
>> > > > > >> > > > the
>> > > > > >> > > > > > > extra
>> > > > > >> > > > > > > > >> config.
>> > > > > >> > > > > > > > >>
>> > > > > >> > > > > > > > >> @Mayuresh
>> > > > > >> > > > > > > > >> Good catch, this sentence is an obsolete
>> > statement
>> > > > > based
>> > > > > >> on
>> > > > > >> > a
>> > > > > >> > > > > > previous
>> > > > > >> > > > > > > > >> design. I've revised the wording in the KIP.
>> > > > > >> > > > > > > > >>
>> > > > > >> > > > > > > > >> Thanks,
>> > > > > >> > > > > > > > >> Lucas
>> > > > > >> > > > > > > > >>
>> > > > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
>> > Gharat <
>> > > > > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
>> > > > > >> > > > > > > > >>
>> > > > > >> > > > > > > > >> > Hi Lucas,
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > Thanks for the KIP.
>> > > > > >> > > > > > > > >> > I am trying to understand why you think "The
>> > > memory
>> > > > > >> > > > consumption
>> > > > > >> > > > > > can
>> > > > > >> > > > > > > > rise
>> > > > > >> > > > > > > > >> > given the total number of queued requests
>> can
>> > go
>> > > up
>> > > > > to
>> > > > > >> 2x"
>> > > > > >> > > in
>> > > > > >> > > > > the
>> > > > > >> > > > > > > > impact
>> > > > > >> > > > > > > > >> > section. Normally the requests from
>> controller
>> > > to a
>> > > > > >> Broker
>> > > > > >> > > are
>> > > > > >> > > > > not
>> > > > > >> > > > > > > > high
>> > > > > >> > > > > > > > >> > volume, right ?
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > Thanks,
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > Mayuresh
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>> > > > > >> > > > > becket.qin@gmail.com>
>> > > > > >> > > > > > > > >> wrote:
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the
>> > > control
>> > > > > >> plane
>> > > > > >> > > from
>> > > > > >> > > > > the
>> > > > > >> > > > > > > > data
>> > > > > >> > > > > > > > >> > plane
>> > > > > >> > > > > > > > >> > > makes a lot of sense.
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > In the KIP you mentioned that the
>> controller
>> > > > > request
>> > > > > >> > queue
>> > > > > >> > > > may
>> > > > > >> > > > > > > have
>> > > > > >> > > > > > > > >> many
>> > > > > >> > > > > > > > >> > > requests in it. Will this be a common
>> case?
>> > The
>> > > > > >> > controller
>> > > > > >> > > > > > > requests
>> > > > > >> > > > > > > > >> still
>> > > > > >> > > > > > > > >> > > goes through the SocketServer. The
>> > SocketServer
>> > > > > will
>> > > > > >> > mute
>> > > > > >> > > > the
>> > > > > >> > > > > > > > channel
>> > > > > >> > > > > > > > >> > once
>> > > > > >> > > > > > > > >> > > a request is read and put into the request
>> > > > channel.
>> > > > > >> So
>> > > > > >> > > > > assuming
>> > > > > >> > > > > > > > there
>> > > > > >> > > > > > > > >> is
>> > > > > >> > > > > > > > >> > > only one connection between controller and
>> > each
>> > > > > >> broker,
>> > > > > >> > on
>> > > > > >> > > > the
>> > > > > >> > > > > > > > broker
>> > > > > >> > > > > > > > >> > side,
>> > > > > >> > > > > > > > >> > > there should be only one controller
>> request
>> > in
>> > > > the
>> > > > > >> > > > controller
>> > > > > >> > > > > > > > request
>> > > > > >> > > > > > > > >> > queue
>> > > > > >> > > > > > > > >> > > at any given time. If that is the case,
>> do we
>> > > > need
>> > > > > a
>> > > > > >> > > > separate
>> > > > > >> > > > > > > > >> controller
>> > > > > >> > > > > > > > >> > > request queue capacity config? The default
>> > > value
>> > > > 20
>> > > > > >> > means
>> > > > > >> > > > that
>> > > > > >> > > > > > we
>> > > > > >> > > > > > > > >> expect
>> > > > > >> > > > > > > > >> > > there are 20 controller switches to happen
>> > in a
>> > > > > short
>> > > > > >> > > period
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > time.
>> > > > > >> > > > > > > > >> I
>> > > > > >> > > > > > > > >> > am
>> > > > > >> > > > > > > > >> > > not sure whether someone should increase
>> the
>> > > > > >> controller
>> > > > > >> > > > > request
>> > > > > >> > > > > > > > queue
>> > > > > >> > > > > > > > >> > > capacity to handle such case, as it seems
>> > > > > indicating
>> > > > > >> > > > something
>> > > > > >> > > > > > > very
>> > > > > >> > > > > > > > >> wrong
>> > > > > >> > > > > > > > >> > > has happened.
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > Thanks,
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin
>> <
>> > > > > >> > > > > lindong28@gmail.com>
>> > > > > >> > > > > > > > >> wrote:
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > > Thanks for the update Lucas.
>> > > > > >> > > > > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > I think the motivation section is
>> > intuitive.
>> > > It
>> > > > > >> will
>> > > > > >> > be
>> > > > > >> > > > good
>> > > > > >> > > > > > to
>> > > > > >> > > > > > > > >> learn
>> > > > > >> > > > > > > > >> > > more
>> > > > > >> > > > > > > > >> > > > about the comments from other reviewers.
>> > > > > >> > > > > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas
>> > Wang <
>> > > > > >> > > > > > > > lucasatucla@gmail.com>
>> > > > > >> > > > > > > > >> > > wrote:
>> > > > > >> > > > > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > Hi Dong,
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > I've updated the motivation section of
>> > the
>> > > > KIP
>> > > > > by
>> > > > > >> > > > > explaining
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > >> > cases
>> > > > > >> > > > > > > > >> > > > that
>> > > > > >> > > > > > > > >> > > > > would have user impacts.
>> > > > > >> > > > > > > > >> > > > > Please take a look at let me know your
>> > > > > comments.
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > Thanks,
>> > > > > >> > > > > > > > >> > > > > Lucas
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas
>> > Wang
>> > > <
>> > > > > >> > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > >> >
>> > > > > >> > > > > > > > >> > > > wrote:
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > > Hi Dong,
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is
>> > > merely
>> > > > > >> for me
>> > > > > >> > > to
>> > > > > >> > > > > > easily
>> > > > > >> > > > > > > > >> > > construct
>> > > > > >> > > > > > > > >> > > > a
>> > > > > >> > > > > > > > >> > > > > > testing scenario
>> > > > > >> > > > > > > > >> > > > > > with a backlog of produce requests.
>> In
>> > > > > >> production,
>> > > > > >> > > > other
>> > > > > >> > > > > > > than
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > disk
>> > > > > >> > > > > > > > >> > > > > > being slow, a backlog of
>> > > > > >> > > > > > > > >> > > > > > produce requests may also be caused
>> by
>> > > high
>> > > > > >> > produce
>> > > > > >> > > > QPS.
>> > > > > >> > > > > > > > >> > > > > > In that case, we may not want to
>> kill
>> > the
>> > > > > >> broker
>> > > > > >> > and
>> > > > > >> > > > > > that's
>> > > > > >> > > > > > > > when
>> > > > > >> > > > > > > > >> > this
>> > > > > >> > > > > > > > >> > > > KIP
>> > > > > >> > > > > > > > >> > > > > > can be useful, both for JBOD
>> > > > > >> > > > > > > > >> > > > > > and non-JBOD setup.
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > Going back to your previous question
>> > > about
>> > > > > each
>> > > > > >> > > > > > > ProduceRequest
>> > > > > >> > > > > > > > >> > > covering
>> > > > > >> > > > > > > > >> > > > > 20
>> > > > > >> > > > > > > > >> > > > > > partitions that are randomly
>> > > > > >> > > > > > > > >> > > > > > distributed, let's say a
>> LeaderAndIsr
>> > > > request
>> > > > > >> is
>> > > > > >> > > > > enqueued
>> > > > > >> > > > > > > that
>> > > > > >> > > > > > > > >> > tries
>> > > > > >> > > > > > > > >> > > to
>> > > > > >> > > > > > > > >> > > > > > switch the current broker, say
>> broker0,
>> > > > from
>> > > > > >> > leader
>> > > > > >> > > to
>> > > > > >> > > > > > > > follower
>> > > > > >> > > > > > > > >> > > > > > *for one of the partitions*, say
>> > > *test-0*.
>> > > > > For
>> > > > > >> the
>> > > > > >> > > > sake
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > >> > argument,
>> > > > > >> > > > > > > > >> > > > > > let's also assume the other brokers,
>> > say
>> > > > > >> broker1,
>> > > > > >> > > have
>> > > > > >> > > > > > > > *stopped*
>> > > > > >> > > > > > > > >> > > > fetching
>> > > > > >> > > > > > > > >> > > > > > from
>> > > > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
>> > > > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests
>> > have
>> > > > > acks =
>> > > > > >> > -1
>> > > > > >> > > > > (ALL)
>> > > > > >> > > > > > > > >> > > > > >   1.1 without this KIP, the
>> > > ProduceRequests
>> > > > > >> ahead
>> > > > > >> > of
>> > > > > >> > > > > > > > >> LeaderAndISR
>> > > > > >> > > > > > > > >> > > will
>> > > > > >> > > > > > > > >> > > > be
>> > > > > >> > > > > > > > >> > > > > > put into the purgatory,
>> > > > > >> > > > > > > > >> > > > > >         and since they'll never be
>> > > > replicated
>> > > > > >> to
>> > > > > >> > > other
>> > > > > >> > > > > > > brokers
>> > > > > >> > > > > > > > >> > > (because
>> > > > > >> > > > > > > > >> > > > > of
>> > > > > >> > > > > > > > >> > > > > > the assumption made above), they
>> will
>> > > > > >> > > > > > > > >> > > > > >         be completed either when the
>> > > > > >> LeaderAndISR
>> > > > > >> > > > > request
>> > > > > >> > > > > > is
>> > > > > >> > > > > > > > >> > > processed
>> > > > > >> > > > > > > > >> > > > or
>> > > > > >> > > > > > > > >> > > > > > when the timeout happens.
>> > > > > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
>> > > > immediately
>> > > > > >> > > > transition
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > > partition
>> > > > > >> > > > > > > > >> > > > > > test-0 to become a follower,
>> > > > > >> > > > > > > > >> > > > > >         after the current broker
>> sees
>> > the
>> > > > > >> > > replication
>> > > > > >> > > > of
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > > remaining
>> > > > > >> > > > > > > > >> > > > 19
>> > > > > >> > > > > > > > >> > > > > > partitions, it can send a response
>> > > > indicating
>> > > > > >> that
>> > > > > >> > > > > > > > >> > > > > >         it's no longer the leader
>> for
>> > the
>> > > > > >> > "test-0".
>> > > > > >> > > > > > > > >> > > > > >   To see the latency difference
>> between
>> > > 1.1
>> > > > > and
>> > > > > >> > 1.2,
>> > > > > >> > > > > let's
>> > > > > >> > > > > > > say
>> > > > > >> > > > > > > > >> > there
>> > > > > >> > > > > > > > >> > > > are
>> > > > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
>> > > > > LeaderAndISR,
>> > > > > >> > and
>> > > > > >> > > > > there
>> > > > > >> > > > > > > are
>> > > > > >> > > > > > > > 8
>> > > > > >> > > > > > > > >> io
>> > > > > >> > > > > > > > >> > > > > threads,
>> > > > > >> > > > > > > > >> > > > > >   so each io thread will process
>> > > > > approximately
>> > > > > >> > 3000
>> > > > > >> > > > > > produce
>> > > > > >> > > > > > > > >> > requests.
>> > > > > >> > > > > > > > >> > > > Now
>> > > > > >> > > > > > > > >> > > > > > let's investigate the io thread that
>> > > > finally
>> > > > > >> > > processed
>> > > > > >> > > > > the
>> > > > > >> > > > > > > > >> > > > LeaderAndISR.
>> > > > > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if
>> we
>> > > > model
>> > > > > >> the
>> > > > > >> > > time
>> > > > > >> > > > > when
>> > > > > >> > > > > > > > their
>> > > > > >> > > > > > > > >> > > > > remaining
>> > > > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1,
>> > > ...t2999,
>> > > > > and
>> > > > > >> > the
>> > > > > >> > > > > > > > LeaderAndISR
>> > > > > >> > > > > > > > >> > > > request
>> > > > > >> > > > > > > > >> > > > > is
>> > > > > >> > > > > > > > >> > > > > > processed at time t3000.
>> > > > > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce
>> > > request
>> > > > > >> would
>> > > > > >> > > have
>> > > > > >> > > > > > > waited
>> > > > > >> > > > > > > > an
>> > > > > >> > > > > > > > >> > > extra
>> > > > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory,
>> the
>> > 2nd
>> > > > an
>> > > > > >> extra
>> > > > > >> > > > time
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > >> t3000 -
>> > > > > >> > > > > > > > >> > > t1,
>> > > > > >> > > > > > > > >> > > > > etc.
>> > > > > >> > > > > > > > >> > > > > >   Roughly speaking, the latency
>> > > difference
>> > > > is
>> > > > > >> > bigger
>> > > > > >> > > > for
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > earlier
>> > > > > >> > > > > > > > >> > > > > > produce requests than for the later
>> > ones.
>> > > > For
>> > > > > >> the
>> > > > > >> > > same
>> > > > > >> > > > > > > reason,
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > more
>> > > > > >> > > > > > > > >> > > > > > ProduceRequests queued
>> > > > > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the
>> bigger
>> > > > benefit
>> > > > > >> we
>> > > > > >> > get
>> > > > > >> > > > > > (capped
>> > > > > >> > > > > > > > by
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > > > > produce timeout).
>> > > > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests
>> > have
>> > > > > >> acks=0 or
>> > > > > >> > > > > acks=1
>> > > > > >> > > > > > > > >> > > > > >   There will be no latency
>> differences
>> > in
>> > > > > this
>> > > > > >> > case,
>> > > > > >> > > > but
>> > > > > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records
>> of
>> > > > > >> partition
>> > > > > >> > > > test-0
>> > > > > >> > > > > in
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the
>> > LeaderAndISR
>> > > > > will
>> > > > > >> be
>> > > > > >> > > > > appended
>> > > > > >> > > > > > > to
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > local
>> > > > > >> > > > > > > > >> > > > > log,
>> > > > > >> > > > > > > > >> > > > > >         and eventually be truncated
>> > after
>> > > > > >> > processing
>> > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > LeaderAndISR.
>> > > > > >> > > > > > > > >> > > > > > This is what's referred to as
>> > > > > >> > > > > > > > >> > > > > >         "some unofficial definition
>> of
>> > > data
>> > > > > >> loss
>> > > > > >> > in
>> > > > > >> > > > > terms
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > > >> > messages
>> > > > > >> > > > > > > > >> > > > > > beyond the high watermark".
>> > > > > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate
>> > the
>> > > > > effect
>> > > > > >> > > since
>> > > > > >> > > > if
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > > > LeaderAndISR
>> > > > > >> > > > > > > > >> > > > > > is immediately processed, the
>> response
>> > to
>> > > > > >> > producers
>> > > > > >> > > > will
>> > > > > >> > > > > > > have
>> > > > > >> > > > > > > > >> > > > > >         the NotLeaderForPartition
>> > error,
>> > > > > >> causing
>> > > > > >> > > > > producers
>> > > > > >> > > > > > > to
>> > > > > >> > > > > > > > >> retry
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > This explanation above is the
>> benefit
>> > for
>> > > > > >> reducing
>> > > > > >> > > the
>> > > > > >> > > > > > > latency
>> > > > > >> > > > > > > > >> of a
>> > > > > >> > > > > > > > >> > > > > broker
>> > > > > >> > > > > > > > >> > > > > > becoming the follower,
>> > > > > >> > > > > > > > >> > > > > > closely related is reducing the
>> latency
>> > > of
>> > > > a
>> > > > > >> > broker
>> > > > > >> > > > > > becoming
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >> > > > leader.
>> > > > > >> > > > > > > > >> > > > > > In this case, the benefit is even
>> more
>> > > > > >> obvious, if
>> > > > > >> > > > other
>> > > > > >> > > > > > > > brokers
>> > > > > >> > > > > > > > >> > have
>> > > > > >> > > > > > > > >> > > > > > resigned leadership, and the
>> > > > > >> > > > > > > > >> > > > > > current broker should take
>> leadership.
>> > > Any
>> > > > > >> delay
>> > > > > >> > in
>> > > > > >> > > > > > > processing
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
>> > > > > >> > > > > > > > >> > > > > > by clients as unavailability. In
>> > extreme
>> > > > > cases,
>> > > > > >> > this
>> > > > > >> > > > can
>> > > > > >> > > > > > > cause
>> > > > > >> > > > > > > > >> > failed
>> > > > > >> > > > > > > > >> > > > > > produce requests if the retries are
>> > > > > >> > > > > > > > >> > > > > > exhausted.
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > Another two types of controller
>> > requests
>> > > > are
>> > > > > >> > > > > > UpdateMetadata
>> > > > > >> > > > > > > > and
>> > > > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly
>> discuss
>> > > as
>> > > > > >> > follows:
>> > > > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
>> > > > > processing
>> > > > > >> > > means
>> > > > > >> > > > > > > clients
>> > > > > >> > > > > > > > >> > > receiving
>> > > > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
>> > > > > leadership
>> > > > > >> > info
>> > > > > >> > > > > > > > >> > > > > > for certain partitions, and the
>> effect
>> > is
>> > > > > more
>> > > > > >> > > retries
>> > > > > >> > > > > or
>> > > > > >> > > > > > > even
>> > > > > >> > > > > > > > >> > fatal
>> > > > > >> > > > > > > > >> > > > > > failure if the retries are
>> exhausted.
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long
>> > queuing
>> > > > time
>> > > > > >> may
>> > > > > >> > > > > degrade
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > >> > > > performance
>> > > > > >> > > > > > > > >> > > > > > of topic deletion.
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > Regarding your last question of the
>> > delay
>> > > > for
>> > > > > >> > > > > > > > >> > DescribeLogDirsRequest,
>> > > > > >> > > > > > > > >> > > > you
>> > > > > >> > > > > > > > >> > > > > > are right
>> > > > > >> > > > > > > > >> > > > > > that this KIP cannot help with the
>> > > latency
>> > > > in
>> > > > > >> > > getting
>> > > > > >> > > > > the
>> > > > > >> > > > > > > log
>> > > > > >> > > > > > > > >> dirs
>> > > > > >> > > > > > > > >> > > > info,
>> > > > > >> > > > > > > > >> > > > > > and it's only relevant
>> > > > > >> > > > > > > > >> > > > > > when controller requests are
>> involved.
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > Regards,
>> > > > > >> > > > > > > > >> > > > > > Lucas
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong
>> > Lin
>> > > <
>> > > > > >> > > > > > > lindong28@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >> > > wrote:
>> > > > > >> > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > >> Hey Jun,
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is
>> > good
>> > > > > >> point.
>> > > > > >> > So
>> > > > > >> > > > the
>> > > > > >> > > > > > > > feature
>> > > > > >> > > > > > > > >> may
>> > > > > >> > > > > > > > >> > > be
>> > > > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have
>> one
>> > > > > question
>> > > > > >> > > below.
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> Hey Lucas,
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> Do you think this feature is also
>> > useful
>> > > > for
>> > > > > >> > > non-JBOD
>> > > > > >> > > > > > setup
>> > > > > >> > > > > > > > or
>> > > > > >> > > > > > > > >> it
>> > > > > >> > > > > > > > >> > is
>> > > > > >> > > > > > > > >> > > > > only
>> > > > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may
>> be
>> > > > useful
>> > > > > to
>> > > > > >> > > > > understand
>> > > > > >> > > > > > > > this.
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> When the broker is setup using
>> JBOD,
>> > in
>> > > > > order
>> > > > > >> to
>> > > > > >> > > move
>> > > > > >> > > > > > > leaders
>> > > > > >> > > > > > > > >> on
>> > > > > >> > > > > > > > >> > the
>> > > > > >> > > > > > > > >> > > > > >> failed
>> > > > > >> > > > > > > > >> > > > > >> disk to other disks, the system
>> > operator
>> > > > > first
>> > > > > >> > > needs
>> > > > > >> > > > to
>> > > > > >> > > > > > get
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >> > list
>> > > > > >> > > > > > > > >> > > > of
>> > > > > >> > > > > > > > >> > > > > >> partitions on the failed disk.
>> This is
>> > > > > >> currently
>> > > > > >> > > > > achieved
>> > > > > >> > > > > > > > using
>> > > > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(),
>> which
>> > > sends
>> > > > > >> > > > > > > > >> DescribeLogDirsRequest
>> > > > > >> > > > > > > > >> > to
>> > > > > >> > > > > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
>> > > > controller
>> > > > > >> > > > requests,
>> > > > > >> > > > > > then
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
>> > > > > >> > > > > > > > >> > > > > >> may still take a long time to be
>> > > processed
>> > > > > by
>> > > > > >> the
>> > > > > >> > > > > broker.
>> > > > > >> > > > > > > So
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > > overall
>> > > > > >> > > > > > > > >> > > > > >> time to move leaders away from the
>> > > failed
>> > > > > disk
>> > > > > >> > may
>> > > > > >> > > > > still
>> > > > > >> > > > > > be
>> > > > > >> > > > > > > > >> long
>> > > > > >> > > > > > > > >> > > even
>> > > > > >> > > > > > > > >> > > > > with
>> > > > > >> > > > > > > > >> > > > > >> this KIP. What do you think?
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> Thanks,
>> > > > > >> > > > > > > > >> > > > > >> Dong
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM,
>> Lucas
>> > > > Wang <
>> > > > > >> > > > > > > > >> lucasatucla@gmail.com
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > wrote:
>> > > > > >> > > > > > > > >> > > > > >>
>> > > > > >> > > > > > > > >> > > > > >> > Thanks for the insightful
>> comment,
>> > > Jun.
>> > > > > >> > > > > > > > >> > > > > >> >
>> > > > > >> > > > > > > > >> > > > > >> > @Dong,
>> > > > > >> > > > > > > > >> > > > > >> > Since both of the two comments in
>> > your
>> > > > > >> previous
>> > > > > >> > > > email
>> > > > > >> > > > > > are
>> > > > > >> > > > > > > > >> about
>> > > > > >> > > > > > > > >> > > the
>> > > > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether
>> > it's
>> > > > > >> useful,
>> > > > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment,
>> do
>> > you
>> > > > > agree
>> > > > > >> > that
>> > > > > >> > > > > this
>> > > > > >> > > > > > > KIP
>> > > > > >> > > > > > > > >> can
>> > > > > >> > > > > > > > >> > be
>> > > > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned
>> by
>> > > Jun?
>> > > > > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
>> > > > > >> > > > > > > > >> > > > > >> >
>> > > > > >> > > > > > > > >> > > > > >> > Regards,
>> > > > > >> > > > > > > > >> > > > > >> > Lucas
>> > > > > >> > > > > > > > >> > > > > >> >
>> > > > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM,
>> Jun
>> > > Rao
>> > > > <
>> > > > > >> > > > > > > jun@confluent.io>
>> > > > > >> > > > > > > > >> > wrote:
>> > > > > >> > > > > > > > >> > > > > >> >
>> > > > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
>> > > > > >> > > > > > > > >> > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are
>> slow,
>> > > one
>> > > > > >> > probably
>> > > > > >> > > > > > should
>> > > > > >> > > > > > > > just
>> > > > > >> > > > > > > > >> > kill
>> > > > > >> > > > > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP
>> may
>> > > not
>> > > > > >> help.
>> > > > > >> > If
>> > > > > >> > > > > only
>> > > > > >> > > > > > > one
>> > > > > >> > > > > > > > of
>> > > > > >> > > > > > > > >> > the
>> > > > > >> > > > > > > > >> > > > > disks
>> > > > > >> > > > > > > > >> > > > > >> on
>> > > > > >> > > > > > > > >> > > > > >> > a
>> > > > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to
>> > fail
>> > > > > that
>> > > > > >> > disk
>> > > > > >> > > > and
>> > > > > >> > > > > > move
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >> > > > leaders
>> > > > > >> > > > > > > > >> > > > > on
>> > > > > >> > > > > > > > >> > > > > >> > that
>> > > > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that
>> > case,
>> > > > > being
>> > > > > >> > able
>> > > > > >> > > > to
>> > > > > >> > > > > > > > process
>> > > > > >> > > > > > > > >> the
>> > > > > >> > > > > > > > >> > > > > >> > LeaderAndIsr
>> > > > > >> > > > > > > > >> > > > > >> > > requests faster will
>> potentially
>> > > help
>> > > > > the
>> > > > > >> > > > producers
>> > > > > >> > > > > > > > recover
>> > > > > >> > > > > > > > >> > > > quicker.
>> > > > > >> > > > > > > > >> > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > >> > > Thanks,
>> > > > > >> > > > > > > > >> > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > >> > > Jun
>> > > > > >> > > > > > > > >> > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM,
>> > Dong
>> > > > > Lin <
>> > > > > >> > > > > > > > >> lindong28@gmail.com
>> > > > > >> > > > > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > wrote:
>> > > > > >> > > > > > > > >> > > > > >> > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some
>> > follow
>> > > up
>> > > > > >> > > questions
>> > > > > >> > > > > > below.
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each
>> > > ProduceRequest
>> > > > > >> covers
>> > > > > >> > 20
>> > > > > >> > > > > > > > partitions
>> > > > > >> > > > > > > > >> > that
>> > > > > >> > > > > > > > >> > > > are
>> > > > > >> > > > > > > > >> > > > > >> > > randomly
>> > > > > >> > > > > > > > >> > > > > >> > > > distributed across all
>> > partitions,
>> > > > > then
>> > > > > >> > each
>> > > > > >> > > > > > > > >> ProduceRequest
>> > > > > >> > > > > > > > >> > > will
>> > > > > >> > > > > > > > >> > > > > >> likely
>> > > > > >> > > > > > > > >> > > > > >> > > > cover some partitions for
>> which
>> > > the
>> > > > > >> broker
>> > > > > >> > is
>> > > > > >> > > > > still
>> > > > > >> > > > > > > > >> leader
>> > > > > >> > > > > > > > >> > > after
>> > > > > >> > > > > > > > >> > > > > it
>> > > > > >> > > > > > > > >> > > > > >> > > quickly
>> > > > > >> > > > > > > > >> > > > > >> > > > processes the
>> > > > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then
>> broker
>> > > > will
>> > > > > >> still
>> > > > > >> > > be
>> > > > > >> > > > > slow
>> > > > > >> > > > > > > in
>> > > > > >> > > > > > > > >> > > > processing
>> > > > > >> > > > > > > > >> > > > > >> these
>> > > > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request
>> will
>> > > > still
>> > > > > be
>> > > > > >> > very
>> > > > > >> > > > > high
>> > > > > >> > > > > > > with
>> > > > > >> > > > > > > > >> this
>> > > > > >> > > > > > > > >> > > > KIP.
>> > > > > >> > > > > > > > >> > > > > It
>> > > > > >> > > > > > > > >> > > > > >> > > seems
>> > > > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will
>> > > still
>> > > > > >> timeout
>> > > > > >> > > > after
>> > > > > >> > > > > > 30
>> > > > > >> > > > > > > > >> > seconds.
>> > > > > >> > > > > > > > >> > > Is
>> > > > > >> > > > > > > > >> > > > > >> this
>> > > > > >> > > > > > > > >> > > > > >> > > > understanding correct?
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most
>> > > ProduceRequest
>> > > > > will
>> > > > > >> > > still
>> > > > > >> > > > > > > timeout
>> > > > > >> > > > > > > > >> after
>> > > > > >> > > > > > > > >> > > 30
>> > > > > >> > > > > > > > >> > > > > >> > seconds,
>> > > > > >> > > > > > > > >> > > > > >> > > > then it is less clear how
>> this
>> > KIP
>> > > > > >> reduces
>> > > > > >> > > > > average
>> > > > > >> > > > > > > > >> produce
>> > > > > >> > > > > > > > >> > > > > latency.
>> > > > > >> > > > > > > > >> > > > > >> Can
>> > > > > >> > > > > > > > >> > > > > >> > > you
>> > > > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be
>> > > improved
>> > > > > by
>> > > > > >> > this
>> > > > > >> > > > KIP?
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
>> > > > directly
>> > > > > >> cares
>> > > > > >> > > > > number
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > > >> > > truncated
>> > > > > >> > > > > > > > >> > > > > >> > messages.
>> > > > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can
>> improve
>> > > > > average
>> > > > > >> > > > > throughput
>> > > > > >> > > > > > > or
>> > > > > >> > > > > > > > >> > reduce
>> > > > > >> > > > > > > > >> > > > > >> message
>> > > > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good
>> to
>> > > > > >> understand
>> > > > > >> > > > this.
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > Thanks,
>> > > > > >> > > > > > > > >> > > > > >> > > > Dong
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM
>> > > Lucas
>> > > > > >> Wang <
>> > > > > >> > > > > > > > >> > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > > > > >> > > > > >> > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable
>> > > comments.
>> > > > > >> Please
>> > > > > >> > > see
>> > > > > >> > > > > my
>> > > > > >> > > > > > > > reply
>> > > > > >> > > > > > > > >> > > below.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed
>> only
>> > 1
>> > > > > >> > partition.
>> > > > > >> > > > Now
>> > > > > >> > > > > > > let's
>> > > > > >> > > > > > > > >> > > consider
>> > > > > >> > > > > > > > >> > > > a
>> > > > > >> > > > > > > > >> > > > > >> more
>> > > > > >> > > > > > > > >> > > > > >> > > > common
>> > > > > >> > > > > > > > >> > > > > >> > > > > scenario
>> > > > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the
>> leader of
>> > > > many
>> > > > > >> > > > partitions.
>> > > > > >> > > > > > And
>> > > > > >> > > > > > > > >> let's
>> > > > > >> > > > > > > > >> > > say
>> > > > > >> > > > > > > > >> > > > > for
>> > > > > >> > > > > > > > >> > > > > >> > some
>> > > > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
>> > > > > >> > > > > > > > >> > > > > >> > > > > The number of leader
>> > partitions
>> > > on
>> > > > > >> > broker0
>> > > > > >> > > is
>> > > > > >> > > > > so
>> > > > > >> > > > > > > > large,
>> > > > > >> > > > > > > > >> > say
>> > > > > >> > > > > > > > >> > > > 10K,
>> > > > > >> > > > > > > > >> > > > > >> that
>> > > > > >> > > > > > > > >> > > > > >> > > the
>> > > > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
>> > > > > >> > > > > > > > >> > > > > >> > > > > and the operator would
>> like to
>> > > > shift
>> > > > > >> the
>> > > > > >> > > > > > leadership
>> > > > > >> > > > > > > > >> for a
>> > > > > >> > > > > > > > >> > > lot
>> > > > > >> > > > > > > > >> > > > of
>> > > > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to
>> other
>> > > > > brokers,
>> > > > > >> > > > > > > > >> > > > > >> > > > > either manually or through
>> > some
>> > > > > >> service
>> > > > > >> > > like
>> > > > > >> > > > > > cruise
>> > > > > >> > > > > > > > >> > control.
>> > > > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only
>> will
>> > the
>> > > > > >> > leadership
>> > > > > >> > > > > > > > transitions
>> > > > > >> > > > > > > > >> > > finish
>> > > > > >> > > > > > > > >> > > > > >> more
>> > > > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the
>> cluster
>> > > > itself
>> > > > > >> > > becoming
>> > > > > >> > > > > more
>> > > > > >> > > > > > > > >> > balanced,
>> > > > > >> > > > > > > > >> > > > > >> > > > > but all existing producers
>> > > > > >> corresponding
>> > > > > >> > to
>> > > > > >> > > > the
>> > > > > >> > > > > > 9K
>> > > > > >> > > > > > > > >> > > partitions
>> > > > > >> > > > > > > > >> > > > > will
>> > > > > >> > > > > > > > >> > > > > >> > get
>> > > > > >> > > > > > > > >> > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
>> > > > > >> > > > > > > > >> > > > > >> > > > > rather than relying on
>> their
>> > > > > timeout,
>> > > > > >> > > thanks
>> > > > > >> > > > to
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > batched
>> > > > > >> > > > > > > > >> > > > > async
>> > > > > >> > > > > > > > >> > > > > >> ZK
>> > > > > >> > > > > > > > >> > > > > >> > > > > operations.
>> > > > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful
>> feature to
>> > > > have
>> > > > > >> > during
>> > > > > >> > > > such
>> > > > > >> > > > > > > > >> > troublesome
>> > > > > >> > > > > > > > >> > > > > times.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the
>> > Google
>> > > > Doc
>> > > > > >> have
>> > > > > >> > > > shown
>> > > > > >> > > > > > > that
>> > > > > >> > > > > > > > >> with
>> > > > > >> > > > > > > > >> > > this
>> > > > > >> > > > > > > > >> > > > > KIP
>> > > > > >> > > > > > > > >> > > > > >> > many
>> > > > > >> > > > > > > > >> > > > > >> > > > > producers
>> > > > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
>> > > > > >> > > > > NotLeaderForPartition,
>> > > > > >> > > > > > > > based
>> > > > > >> > > > > > > > >> on
>> > > > > >> > > > > > > > >> > > > which
>> > > > > >> > > > > > > > >> > > > > >> they
>> > > > > >> > > > > > > > >> > > > > >> > > > retry
>> > > > > >> > > > > > > > >> > > > > >> > > > > immediately.
>> > > > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
>> > > > > >> seconds+quick
>> > > > > >> > > > retry)
>> > > > > >> > > > > > for
>> > > > > >> > > > > > > > >> their
>> > > > > >> > > > > > > > >> > > > single
>> > > > > >> > > > > > > > >> > > > > >> > > message
>> > > > > >> > > > > > > > >> > > > > >> > > > is
>> > > > > >> > > > > > > > >> > > > > >> > > > > much smaller
>> > > > > >> > > > > > > > >> > > > > >> > > > > compared with the case of
>> > timing
>> > > > out
>> > > > > >> > > without
>> > > > > >> > > > > the
>> > > > > >> > > > > > > KIP
>> > > > > >> > > > > > > > >> (30
>> > > > > >> > > > > > > > >> > > > seconds
>> > > > > >> > > > > > > > >> > > > > >> for
>> > > > > >> > > > > > > > >> > > > > >> > > > timing
>> > > > > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
>> > > > > >> > > > > > > > >> > > > > >> > > > > One might argue that
>> reducing
>> > > the
>> > > > > >> timing
>> > > > > >> > > out
>> > > > > >> > > > on
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > producer
>> > > > > >> > > > > > > > >> > > > > side
>> > > > > >> > > > > > > > >> > > > > >> can
>> > > > > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
>> > > > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout
>> has
>> > its
>> > > > own
>> > > > > >> > > > > > drawbacks[1].
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a
>> metric
>> > to
>> > > > > show
>> > > > > >> the
>> > > > > >> > > > > number
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > > >> > > truncated
>> > > > > >> > > > > > > > >> > > > > >> > messages
>> > > > > >> > > > > > > > >> > > > > >> > > on
>> > > > > >> > > > > > > > >> > > > > >> > > > > brokers,
>> > > > > >> > > > > > > > >> > > > > >> > > > > with the experiments done
>> in
>> > the
>> > > > > >> Google
>> > > > > >> > > Doc,
>> > > > > >> > > > it
>> > > > > >> > > > > > > > should
>> > > > > >> > > > > > > > >> be
>> > > > > >> > > > > > > > >> > > easy
>> > > > > >> > > > > > > > >> > > > > to
>> > > > > >> > > > > > > > >> > > > > >> see
>> > > > > >> > > > > > > > >> > > > > >> > > > that
>> > > > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
>> > > > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0
>> > since
>> > > > the
>> > > > > >> > > > up-to-date
>> > > > > >> > > > > > > > >> metadata
>> > > > > >> > > > > > > > >> > > > avoids
>> > > > > >> > > > > > > > >> > > > > >> > > appending
>> > > > > >> > > > > > > > >> > > > > >> > > > > of messages
>> > > > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE
>> > requests.
>> > > If
>> > > > > we
>> > > > > >> > talk
>> > > > > >> > > > to a
>> > > > > >> > > > > > > > system
>> > > > > >> > > > > > > > >> > > > operator
>> > > > > >> > > > > > > > >> > > > > >> and
>> > > > > >> > > > > > > > >> > > > > >> > ask
>> > > > > >> > > > > > > > >> > > > > >> > > > > whether
>> > > > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful
>> > IOs,
>> > > I
>> > > > > bet
>> > > > > >> > most
>> > > > > >> > > > > likely
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > >> > answer
>> > > > > >> > > > > > > > >> > > > is
>> > > > > >> > > > > > > > >> > > > > >> yes.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your
>> question, I
>> > > > think
>> > > > > it
>> > > > > >> > > might
>> > > > > >> > > > be
>> > > > > >> > > > > > > > >> helpful to
>> > > > > >> > > > > > > > >> > > > > >> construct
>> > > > > >> > > > > > > > >> > > > > >> > > some
>> > > > > >> > > > > > > > >> > > > > >> > > > > formulas.
>> > > > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling,
>> I'm
>> > > > going
>> > > > > >> back
>> > > > > >> > to
>> > > > > >> > > > the
>> > > > > >> > > > > > > case
>> > > > > >> > > > > > > > >> where
>> > > > > >> > > > > > > > >> > > > there
>> > > > > >> > > > > > > > >> > > > > >> is
>> > > > > >> > > > > > > > >> > > > > >> > > only
>> > > > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
>> > > > > >> > > > > > > > >> > > > > >> > > > > Following the experiments
>> in
>> > the
>> > > > > >> Google
>> > > > > >> > > Doc,
>> > > > > >> > > > > > let's
>> > > > > >> > > > > > > > say
>> > > > > >> > > > > > > > >> > > broker0
>> > > > > >> > > > > > > > >> > > > > >> > becomes
>> > > > > >> > > > > > > > >> > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
>> > > > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were
>> still
>> > N
>> > > > > >> produce
>> > > > > >> > > > > requests
>> > > > > >> > > > > > in
>> > > > > >> > > > > > > > its
>> > > > > >> > > > > > > > >> > > > request
>> > > > > >> > > > > > > > >> > > > > >> > queue.
>> > > > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date
>> metadata
>> > > > brought
>> > > > > >> by
>> > > > > >> > > this
>> > > > > >> > > > > KIP,
>> > > > > >> > > > > > > > >> broker0
>> > > > > >> > > > > > > > >> > > can
>> > > > > >> > > > > > > > >> > > > > >> reply
>> > > > > >> > > > > > > > >> > > > > >> > > with
>> > > > > >> > > > > > > > >> > > > > >> > > > an
>> > > > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition
>> > exception,
>> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the
>> > > average
>> > > > > >> > > processing
>> > > > > >> > > > > > time
>> > > > > >> > > > > > > of
>> > > > > >> > > > > > > > >> > > replying
>> > > > > >> > > > > > > > >> > > > > >> with
>> > > > > >> > > > > > > > >> > > > > >> > > such
>> > > > > >> > > > > > > > >> > > > > >> > > > an
>> > > > > >> > > > > > > > >> > > > > >> > > > > error message.
>> > > > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the
>> broker
>> > > will
>> > > > > >> need to
>> > > > > >> > > > > append
>> > > > > >> > > > > > > > >> messages
>> > > > > >> > > > > > > > >> > to
>> > > > > >> > > > > > > > >> > > > > >> > segments,
>> > > > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush
>> to
>> > > disk,
>> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the
>> > > average
>> > > > > >> > > processing
>> > > > > >> > > > > > time
>> > > > > >> > > > > > > > for
>> > > > > >> > > > > > > > >> > such
>> > > > > >> > > > > > > > >> > > > > logic.
>> > > > > >> > > > > > > > >> > > > > >> > > > > Then the average extra
>> latency
>> > > > > >> incurred
>> > > > > >> > > > without
>> > > > > >> > > > > > > this
>> > > > > >> > > > > > > > >> KIP
>> > > > > >> > > > > > > > >> > is
>> > > > > >> > > > > > > > >> > > N
>> > > > > >> > > > > > > > >> > > > *
>> > > > > >> > > > > > > > >> > > > > >> (M2 -
>> > > > > >> > > > > > > > >> > > > > >> > > > M1) /
>> > > > > >> > > > > > > > >> > > > > >> > > > > 2.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should
>> always
>> > be
>> > > > > >> larger
>> > > > > >> > > than
>> > > > > >> > > > > M1,
>> > > > > >> > > > > > > > which
>> > > > > >> > > > > > > > >> > means
>> > > > > >> > > > > > > > >> > > > as
>> > > > > >> > > > > > > > >> > > > > >> long
>> > > > > >> > > > > > > > >> > > > > >> > > as N
>> > > > > >> > > > > > > > >> > > > > >> > > > > is positive,
>> > > > > >> > > > > > > > >> > > > > >> > > > > we would see improvements
>> on
>> > the
>> > > > > >> average
>> > > > > >> > > > > latency.
>> > > > > >> > > > > > > > >> > > > > >> > > > > There does not need to be
>> > > > > significant
>> > > > > >> > > backlog
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > >> requests
>> > > > > >> > > > > > > > >> > in
>> > > > > >> > > > > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> > > request
>> > > > > >> > > > > > > > >> > > > > >> > > > > queue,
>> > > > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of
>> disk
>> > > > > >> performance
>> > > > > >> > > to
>> > > > > >> > > > > have
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > >> > > > > improvement.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > Regards,
>> > > > > >> > > > > > > > >> > > > > >> > > > > Lucas
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing
>> the
>> > > > > >> timeout on
>> > > > > >> > > the
>> > > > > >> > > > > > > > producer
>> > > > > >> > > > > > > > >> > side
>> > > > > >> > > > > > > > >> > > > can
>> > > > > >> > > > > > > > >> > > > > >> > trigger
>> > > > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate
>> requests
>> > > > > >> > > > > > > > >> > > > > >> > > > > when the corresponding
>> leader
>> > > > broker
>> > > > > >> is
>> > > > > >> > > > > > overloaded,
>> > > > > >> > > > > > > > >> > > > exacerbating
>> > > > > >> > > > > > > > >> > > > > >> the
>> > > > > >> > > > > > > > >> > > > > >> > > > > situation.
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18
>> > PM,
>> > > > Dong
>> > > > > >> Lin
>> > > > > >> > <
>> > > > > >> > > > > > > > >> > > lindong28@gmail.com
>> > > > > >> > > > > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > > > > >> > > > > >> > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
>> > > > > >> > > > > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the
>> detailed
>> > > > > >> > > documentation
>> > > > > >> > > > of
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > >> > > > experiment.
>> > > > > >> > > > > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think
>> > having
>> > > a
>> > > > > >> > separate
>> > > > > >> > > > > queue
>> > > > > >> > > > > > > for
>> > > > > >> > > > > > > > >> > > > controller
>> > > > > >> > > > > > > > >> > > > > >> > > requests
>> > > > > >> > > > > > > > >> > > > > >> > > > is
>> > > > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you
>> > > mentioned
>> > > > > in
>> > > > > >> the
>> > > > > >> > > > > summary
>> > > > > >> > > > > > > > >> section
>> > > > > >> > > > > > > > >> > of
>> > > > > >> > > > > > > > >> > > > the
>> > > > > >> > > > > > > > >> > > > > >> > Google
>> > > > > >> > > > > > > > >> > > > > >> > > > > doc,
>> > > > > >> > > > > > > > >> > > > > >> > > > > > controller requests are
>> > > > generally
>> > > > > >> more
>> > > > > >> > >
>> > >
>> > >
>> > >
>> > > --
>> > > -Regards,
>> > > Mayuresh R. Gharat
>> > > (862) 250-7125
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Great, it seems we are kind of converging to use correlation id to prevent
out-of-order issues.

Meanwhile, to address the concern raised by Jun
"One potential issue with the dequeue approach is that if the queue is full,
there is no guarantee that the controller requests will be enqueued
quickly."

we can still use the deque idea, only with our own implementation. It's
something like
the following diagram with 3 condition variables, and a flag
- tailNotFullCondition used to block processor threads trying to enqueue a
data request, when the queue is full
- headNotFullCondition used to block processor threads trying to enqueue a
controller request, when the queue is full
- notEmptyCondition used to block io threads when the queue is empty
- the blockedOnHead flag to indicate whether any thread is blocked on the
headNotFullCondition


The benefit with this approach is still that no public interface change is
needed, and
a processor thread trying to enqueue a controller request will always be
waken up first
when the queue is full.
In terms of implementation complexity, it's quite similar to the separate
queue approach. Thoughts?

Thanks,
Lucas





On Thu, Jul 19, 2018 at 11:18 PM, Becket Qin <be...@gmail.com> wrote:

> Lucas and Mayuresh,
>
> Good idea. The correlation id should work.
>
> In the ControllerChannelManager, a request will be resent until a response
> is received. So if the controller to broker connection disconnects after
> controller sends R1_a, but before the response of R1_a is received, a
> disconnection may cause the controller to resend R1_b. i.e. until R1 is
> acked, R2 won't be sent by the controller.
> This gives two guarantees:
> 1. Correlation id wise: R1_a < R1_b < R2.
> 2. On the broker side, when R2 is seen, R1 must have been processed at
> least once.
>
> So on the broker side, with a single thread controller request handler, the
> logic should be:
> 1. Process what ever request seen in the controller request queue
> 2. For the given epoch, drop request if its correlation id is smaller than
> that of the last processed request.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
>
> > I agree that there is no strong ordering when there are more than one
> > socket connections. Currently, we rely on controllerEpoch and leaderEpoch
> > to ensure that the receiving broker picks up the latest state for each
> > partition.
> >
> > One potential issue with the dequeue approach is that if the queue is
> full,
> > there is no guarantee that the controller requests will be enqueued
> > quickly.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com
> > > wrote:
> >
> > > Yea, the correlationId is only set to 0 in the NetworkClient
> constructor.
> > > Since we reuse the same NetworkClient between Controller and the
> broker,
> > a
> > > disconnection should not cause it to reset to 0, in which case it can
> be
> > > used to reject obsolete requests.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > @Dong,
> > > > Great example and explanation, thanks!
> > > >
> > > > @All
> > > > Regarding the example given by Dong, it seems even if we use a queue,
> > > and a
> > > > dedicated controller request handling thread,
> > > > the same result can still happen because R1_a will be sent on one
> > > > connection, and R1_b & R2 will be sent on a different connection,
> > > > and there is no ordering between different connections on the broker
> > > side.
> > > > I was discussing with Mayuresh offline, and it seems correlation id
> > > within
> > > > the same NetworkClient object is monotonically increasing and never
> > > reset,
> > > > hence a broker can leverage that to properly reject obsolete
> requests.
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com> wrote:
> > > >
> > > > > Actually nvm, correlationId is reset in case of connection loss, I
> > > think.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I agree with Dong that out-of-order processing can happen with
> > > having 2
> > > > > > separate queues as well and it can even happen today.
> > > > > > Can we use the correlationId in the request from the controller
> to
> > > the
> > > > > > broker to handle ordering ?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Good point, Joel. I agree that a dedicated controller request
> > > handling
> > > > > >> thread would be a better isolation. It also solves the
> reordering
> > > > issue.
> > > > > >>
> > > > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> jjkoshy.w@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > Good example. I think this scenario can occur in the current
> > code
> > > as
> > > > > >> well
> > > > > >> > but with even lower probability given that there are other
> > > > > >> non-controller
> > > > > >> > requests interleaved. It is still sketchy though and I think a
> > > safer
> > > > > >> > approach would be separate queues and pinning controller
> request
> > > > > >> handling
> > > > > >> > to one handler thread.
> > > > > >> >
> > > > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Hey Becket,
> > > > > >> > >
> > > > > >> > > I think you are right that there may be out-of-order
> > processing.
> > > > > >> However,
> > > > > >> > > it seems that out-of-order processing may also happen even
> if
> > we
> > > > > use a
> > > > > >> > > separate queue.
> > > > > >> > >
> > > > > >> > > Here is the example:
> > > > > >> > >
> > > > > >> > > - Controller sends R1 and got disconnected before receiving
> > > > > response.
> > > > > >> > Then
> > > > > >> > > it reconnects and sends R2. Both requests now stay in the
> > > > controller
> > > > > >> > > request queue in the order they are sent.
> > > > > >> > > - thread1 takes R1_a from the request queue and then thread2
> > > takes
> > > > > R2
> > > > > >> > from
> > > > > >> > > the request queue almost at the same time.
> > > > > >> > > - So R1_a and R2 are processed in parallel. There is chance
> > that
> > > > > R2's
> > > > > >> > > processing is completed before R1.
> > > > > >> > >
> > > > > >> > > If out-of-order processing can happen for both approaches
> with
> > > > very
> > > > > >> low
> > > > > >> > > probability, it may not be worthwhile to add the extra
> queue.
> > > What
> > > > > do
> > > > > >> you
> > > > > >> > > think?
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Dong
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Mayuresh/Joel,
> > > > > >> > > >
> > > > > >> > > > Using the request channel as a dequeue was bright up some
> > time
> > > > ago
> > > > > >> when
> > > > > >> > > we
> > > > > >> > > > initially thinking of prioritizing the request. The
> concern
> > > was
> > > > > that
> > > > > >> > the
> > > > > >> > > > controller requests are supposed to be processed in order.
> > If
> > > we
> > > > > can
> > > > > >> > > ensure
> > > > > >> > > > that there is one controller request in the request
> channel,
> > > the
> > > > > >> order
> > > > > >> > is
> > > > > >> > > > not a concern. But in cases that there are more than one
> > > > > controller
> > > > > >> > > request
> > > > > >> > > > inserted into the queue, the controller request order may
> > > change
> > > > > and
> > > > > >> > > cause
> > > > > >> > > > problem. For example, think about the following sequence:
> > > > > >> > > > 1. Controller successfully sent a request R1 to broker
> > > > > >> > > > 2. Broker receives R1 and put the request to the head of
> the
> > > > > request
> > > > > >> > > queue.
> > > > > >> > > > 3. Controller to broker connection failed and the
> controller
> > > > > >> > reconnected
> > > > > >> > > to
> > > > > >> > > > the broker.
> > > > > >> > > > 4. Controller sends a request R2 to the broker
> > > > > >> > > > 5. Broker receives R2 and add it to the head of the
> request
> > > > queue.
> > > > > >> > > > Now on the broker side, R2 will be processed before R1 is
> > > > > processed,
> > > > > >> > > which
> > > > > >> > > > may cause problem.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > >
> > > > > >> > > > Jiangjie (Becket) Qin
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > jjkoshy.w@gmail.com>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler
> > > less
> > > > > >> > invasive
> > > > > >> > > > > alternative and it should work. Jun/Becket/others, do
> you
> > > see
> > > > > any
> > > > > >> > > > pitfalls
> > > > > >> > > > > with this approach?
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > @Mayuresh,
> > > > > >> > > > > > That's a very interesting idea that I haven't thought
> > > > before.
> > > > > >> > > > > > It seems to solve our problem at hand pretty well, and
> > > also
> > > > > >> > > > > > avoids the need to have a new size metric and capacity
> > > > config
> > > > > >> > > > > > for the controller request queue. In fact, if we were
> to
> > > > adopt
> > > > > >> > > > > > this design, there is no public interface change, and
> we
> > > > > >> > > > > > probably don't need a KIP.
> > > > > >> > > > > > Also implementation wise, it seems
> > > > > >> > > > > > the java class LinkedBlockingQueue can readily satisfy
> > the
> > > > > >> > > requirement
> > > > > >> > > > > > by supporting a capacity, and also allowing inserting
> at
> > > > both
> > > > > >> ends.
> > > > > >> > > > > >
> > > > > >> > > > > > My only concern is that this design is tied to the
> > > > coincidence
> > > > > >> that
> > > > > >> > > > > > we have two request priorities and there are two ends
> > to a
> > > > > >> deque.
> > > > > >> > > > > > Hence by using the proposed design, it seems the
> network
> > > > layer
> > > > > >> is
> > > > > >> > > > > > more tightly coupled with upper layer logic, e.g. if
> we
> > > were
> > > > > to
> > > > > >> add
> > > > > >> > > > > > an extra priority level in the future for some reason,
> > we
> > > > > would
> > > > > >> > > > probably
> > > > > >> > > > > > need to go back to the design of separate queues, one
> > for
> > > > each
> > > > > >> > > priority
> > > > > >> > > > > > level.
> > > > > >> > > > > >
> > > > > >> > > > > > In summary, I'm ok with both designs and lean toward
> > your
> > > > > >> suggested
> > > > > >> > > > > > approach.
> > > > > >> > > > > > Let's hear what others think.
> > > > > >> > > > > >
> > > > > >> > > > > > @Becket,
> > > > > >> > > > > > In light of Mayuresh's suggested new design, I'm
> > answering
> > > > > your
> > > > > >> > > > question
> > > > > >> > > > > > only in the context
> > > > > >> > > > > > of the current KIP design: I think your suggestion
> makes
> > > > > sense,
> > > > > >> and
> > > > > >> > > I'm
> > > > > >> > > > > ok
> > > > > >> > > > > > with removing the capacity config and
> > > > > >> > > > > > just relying on the default value of 20 being
> sufficient
> > > > > enough.
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks,
> > > > > >> > > > > > Lucas
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > >> > > > > > gharatmayuresh15@gmail.com
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi Lucas,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Seems like the main intent here is to prioritize the
> > > > > >> controller
> > > > > >> > > > request
> > > > > >> > > > > > > over any other requests.
> > > > > >> > > > > > > In that case, we can change the request queue to a
> > > > dequeue,
> > > > > >> where
> > > > > >> > > you
> > > > > >> > > > > > > always insert the normal requests (produce,
> > > consume,..etc)
> > > > > to
> > > > > >> the
> > > > > >> > > end
> > > > > >> > > > > of
> > > > > >> > > > > > > the dequeue, but if its a controller request, you
> > insert
> > > > it
> > > > > to
> > > > > >> > the
> > > > > >> > > > head
> > > > > >> > > > > > of
> > > > > >> > > > > > > the queue. This ensures that the controller request
> > will
> > > > be
> > > > > >> given
> > > > > >> > > > > higher
> > > > > >> > > > > > > priority over other requests.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Also since we only read one request from the socket
> > and
> > > > mute
> > > > > >> it
> > > > > >> > and
> > > > > >> > > > > only
> > > > > >> > > > > > > unmute it after handling the request, this would
> > ensure
> > > > that
> > > > > >> we
> > > > > >> > > don't
> > > > > >> > > > > > > handle controller requests out of order.
> > > > > >> > > > > > >
> > > > > >> > > > > > > With this approach we can avoid the second queue and
> > the
> > > > > >> > additional
> > > > > >> > > > > > config
> > > > > >> > > > > > > for the size of the queue.
> > > > > >> > > > > > >
> > > > > >> > > > > > > What do you think ?
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Mayuresh
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > >> becket.qin@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hey Joel,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thank for the detail explanation. I agree the
> > current
> > > > > design
> > > > > >> > > makes
> > > > > >> > > > > > sense.
> > > > > >> > > > > > > > My confusion is about whether the new config for
> the
> > > > > >> controller
> > > > > >> > > > queue
> > > > > >> > > > > > > > capacity is necessary. I cannot think of a case in
> > > which
> > > > > >> users
> > > > > >> > > > would
> > > > > >> > > > > > > change
> > > > > >> > > > > > > > it.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > >> > > becket.qin@gmail.com>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I guess my question can be rephrased to "do we
> > > expect
> > > > > >> user to
> > > > > >> > > > ever
> > > > > >> > > > > > > change
> > > > > >> > > > > > > > > the controller request queue capacity"? If we
> > agree
> > > > that
> > > > > >> 20
> > > > > >> > is
> > > > > >> > > > > > already
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > > very generous default number and we do not
> expect
> > > user
> > > > > to
> > > > > >> > > change
> > > > > >> > > > > it,
> > > > > >> > > > > > is
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > still necessary to expose this as a config?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > >> > > > lucasatucla@gmail.com
> > > > > >> > > > > >
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> @Becket
> > > > > >> > > > > > > > >> 1. Thanks for the comment. You are right that
> > > > normally
> > > > > >> there
> > > > > >> > > > > should
> > > > > >> > > > > > be
> > > > > >> > > > > > > > >> just
> > > > > >> > > > > > > > >> one controller request because of muting,
> > > > > >> > > > > > > > >> and I had NOT intended to say there would be
> many
> > > > > >> enqueued
> > > > > >> > > > > > controller
> > > > > >> > > > > > > > >> requests.
> > > > > >> > > > > > > > >> I went through the KIP again, and I'm not sure
> > > which
> > > > > part
> > > > > >> > > > conveys
> > > > > >> > > > > > that
> > > > > >> > > > > > > > >> info.
> > > > > >> > > > > > > > >> I'd be happy to revise if you point it out the
> > > > section.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> 2. Though it should not happen in normal
> > > conditions,
> > > > > the
> > > > > >> > > current
> > > > > >> > > > > > > design
> > > > > >> > > > > > > > >> does not preclude multiple controllers running
> > > > > >> > > > > > > > >> at the same time, hence if we don't have the
> > > > controller
> > > > > >> > queue
> > > > > >> > > > > > capacity
> > > > > >> > > > > > > > >> config and simply make its capacity to be 1,
> > > > > >> > > > > > > > >> network threads handling requests from
> different
> > > > > >> controllers
> > > > > >> > > > will
> > > > > >> > > > > be
> > > > > >> > > > > > > > >> blocked during those troublesome times,
> > > > > >> > > > > > > > >> which is probably not what we want. On the
> other
> > > > hand,
> > > > > >> > adding
> > > > > >> > > > the
> > > > > >> > > > > > > extra
> > > > > >> > > > > > > > >> config with a default value, say 20, guards us
> > from
> > > > > >> issues
> > > > > >> > in
> > > > > >> > > > > those
> > > > > >> > > > > > > > >> troublesome times, and IMO there isn't much
> > > downside
> > > > of
> > > > > >> > adding
> > > > > >> > > > the
> > > > > >> > > > > > > extra
> > > > > >> > > > > > > > >> config.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> @Mayuresh
> > > > > >> > > > > > > > >> Good catch, this sentence is an obsolete
> > statement
> > > > > based
> > > > > >> on
> > > > > >> > a
> > > > > >> > > > > > previous
> > > > > >> > > > > > > > >> design. I've revised the wording in the KIP.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Thanks,
> > > > > >> > > > > > > > >> Lucas
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > Gharat <
> > > > > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> > Hi Lucas,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thanks for the KIP.
> > > > > >> > > > > > > > >> > I am trying to understand why you think "The
> > > memory
> > > > > >> > > > consumption
> > > > > >> > > > > > can
> > > > > >> > > > > > > > rise
> > > > > >> > > > > > > > >> > given the total number of queued requests can
> > go
> > > up
> > > > > to
> > > > > >> 2x"
> > > > > >> > > in
> > > > > >> > > > > the
> > > > > >> > > > > > > > impact
> > > > > >> > > > > > > > >> > section. Normally the requests from
> controller
> > > to a
> > > > > >> Broker
> > > > > >> > > are
> > > > > >> > > > > not
> > > > > >> > > > > > > > high
> > > > > >> > > > > > > > >> > volume, right ?
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thanks,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Mayuresh
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > >> > > > > becket.qin@gmail.com>
> > > > > >> > > > > > > > >> wrote:
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the
> > > control
> > > > > >> plane
> > > > > >> > > from
> > > > > >> > > > > the
> > > > > >> > > > > > > > data
> > > > > >> > > > > > > > >> > plane
> > > > > >> > > > > > > > >> > > makes a lot of sense.
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > In the KIP you mentioned that the
> controller
> > > > > request
> > > > > >> > queue
> > > > > >> > > > may
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > >> many
> > > > > >> > > > > > > > >> > > requests in it. Will this be a common case?
> > The
> > > > > >> > controller
> > > > > >> > > > > > > requests
> > > > > >> > > > > > > > >> still
> > > > > >> > > > > > > > >> > > goes through the SocketServer. The
> > SocketServer
> > > > > will
> > > > > >> > mute
> > > > > >> > > > the
> > > > > >> > > > > > > > channel
> > > > > >> > > > > > > > >> > once
> > > > > >> > > > > > > > >> > > a request is read and put into the request
> > > > channel.
> > > > > >> So
> > > > > >> > > > > assuming
> > > > > >> > > > > > > > there
> > > > > >> > > > > > > > >> is
> > > > > >> > > > > > > > >> > > only one connection between controller and
> > each
> > > > > >> broker,
> > > > > >> > on
> > > > > >> > > > the
> > > > > >> > > > > > > > broker
> > > > > >> > > > > > > > >> > side,
> > > > > >> > > > > > > > >> > > there should be only one controller request
> > in
> > > > the
> > > > > >> > > > controller
> > > > > >> > > > > > > > request
> > > > > >> > > > > > > > >> > queue
> > > > > >> > > > > > > > >> > > at any given time. If that is the case, do
> we
> > > > need
> > > > > a
> > > > > >> > > > separate
> > > > > >> > > > > > > > >> controller
> > > > > >> > > > > > > > >> > > request queue capacity config? The default
> > > value
> > > > 20
> > > > > >> > means
> > > > > >> > > > that
> > > > > >> > > > > > we
> > > > > >> > > > > > > > >> expect
> > > > > >> > > > > > > > >> > > there are 20 controller switches to happen
> > in a
> > > > > short
> > > > > >> > > period
> > > > > >> > > > > of
> > > > > >> > > > > > > > time.
> > > > > >> > > > > > > > >> I
> > > > > >> > > > > > > > >> > am
> > > > > >> > > > > > > > >> > > not sure whether someone should increase
> the
> > > > > >> controller
> > > > > >> > > > > request
> > > > > >> > > > > > > > queue
> > > > > >> > > > > > > > >> > > capacity to handle such case, as it seems
> > > > > indicating
> > > > > >> > > > something
> > > > > >> > > > > > > very
> > > > > >> > > > > > > > >> wrong
> > > > > >> > > > > > > > >> > > has happened.
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > Thanks,
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > >> > > > > lindong28@gmail.com>
> > > > > >> > > > > > > > >> wrote:
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > Thanks for the update Lucas.
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > I think the motivation section is
> > intuitive.
> > > It
> > > > > >> will
> > > > > >> > be
> > > > > >> > > > good
> > > > > >> > > > > > to
> > > > > >> > > > > > > > >> learn
> > > > > >> > > > > > > > >> > > more
> > > > > >> > > > > > > > >> > > > about the comments from other reviewers.
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > Wang <
> > > > > >> > > > > > > > lucasatucla@gmail.com>
> > > > > >> > > > > > > > >> > > wrote:
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > I've updated the motivation section of
> > the
> > > > KIP
> > > > > by
> > > > > >> > > > > explaining
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > cases
> > > > > >> > > > > > > > >> > > > that
> > > > > >> > > > > > > > >> > > > > would have user impacts.
> > > > > >> > > > > > > > >> > > > > Please take a look at let me know your
> > > > > comments.
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > Thanks,
> > > > > >> > > > > > > > >> > > > > Lucas
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > Wang
> > > <
> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > > > wrote:
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is
> > > merely
> > > > > >> for me
> > > > > >> > > to
> > > > > >> > > > > > easily
> > > > > >> > > > > > > > >> > > construct
> > > > > >> > > > > > > > >> > > > a
> > > > > >> > > > > > > > >> > > > > > testing scenario
> > > > > >> > > > > > > > >> > > > > > with a backlog of produce requests.
> In
> > > > > >> production,
> > > > > >> > > > other
> > > > > >> > > > > > > than
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > disk
> > > > > >> > > > > > > > >> > > > > > being slow, a backlog of
> > > > > >> > > > > > > > >> > > > > > produce requests may also be caused
> by
> > > high
> > > > > >> > produce
> > > > > >> > > > QPS.
> > > > > >> > > > > > > > >> > > > > > In that case, we may not want to kill
> > the
> > > > > >> broker
> > > > > >> > and
> > > > > >> > > > > > that's
> > > > > >> > > > > > > > when
> > > > > >> > > > > > > > >> > this
> > > > > >> > > > > > > > >> > > > KIP
> > > > > >> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > >> > > > > > > > >> > > > > > and non-JBOD setup.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Going back to your previous question
> > > about
> > > > > each
> > > > > >> > > > > > > ProduceRequest
> > > > > >> > > > > > > > >> > > covering
> > > > > >> > > > > > > > >> > > > > 20
> > > > > >> > > > > > > > >> > > > > > partitions that are randomly
> > > > > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr
> > > > request
> > > > > >> is
> > > > > >> > > > > enqueued
> > > > > >> > > > > > > that
> > > > > >> > > > > > > > >> > tries
> > > > > >> > > > > > > > >> > > to
> > > > > >> > > > > > > > >> > > > > > switch the current broker, say
> broker0,
> > > > from
> > > > > >> > leader
> > > > > >> > > to
> > > > > >> > > > > > > > follower
> > > > > >> > > > > > > > >> > > > > > *for one of the partitions*, say
> > > *test-0*.
> > > > > For
> > > > > >> the
> > > > > >> > > > sake
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> > argument,
> > > > > >> > > > > > > > >> > > > > > let's also assume the other brokers,
> > say
> > > > > >> broker1,
> > > > > >> > > have
> > > > > >> > > > > > > > *stopped*
> > > > > >> > > > > > > > >> > > > fetching
> > > > > >> > > > > > > > >> > > > > > from
> > > > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests
> > have
> > > > > acks =
> > > > > >> > -1
> > > > > >> > > > > (ALL)
> > > > > >> > > > > > > > >> > > > > >   1.1 without this KIP, the
> > > ProduceRequests
> > > > > >> ahead
> > > > > >> > of
> > > > > >> > > > > > > > >> LeaderAndISR
> > > > > >> > > > > > > > >> > > will
> > > > > >> > > > > > > > >> > > > be
> > > > > >> > > > > > > > >> > > > > > put into the purgatory,
> > > > > >> > > > > > > > >> > > > > >         and since they'll never be
> > > > replicated
> > > > > >> to
> > > > > >> > > other
> > > > > >> > > > > > > brokers
> > > > > >> > > > > > > > >> > > (because
> > > > > >> > > > > > > > >> > > > > of
> > > > > >> > > > > > > > >> > > > > > the assumption made above), they will
> > > > > >> > > > > > > > >> > > > > >         be completed either when the
> > > > > >> LeaderAndISR
> > > > > >> > > > > request
> > > > > >> > > > > > is
> > > > > >> > > > > > > > >> > > processed
> > > > > >> > > > > > > > >> > > > or
> > > > > >> > > > > > > > >> > > > > > when the timeout happens.
> > > > > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
> > > > immediately
> > > > > >> > > > transition
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > partition
> > > > > >> > > > > > > > >> > > > > > test-0 to become a follower,
> > > > > >> > > > > > > > >> > > > > >         after the current broker sees
> > the
> > > > > >> > > replication
> > > > > >> > > > of
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > remaining
> > > > > >> > > > > > > > >> > > > 19
> > > > > >> > > > > > > > >> > > > > > partitions, it can send a response
> > > > indicating
> > > > > >> that
> > > > > >> > > > > > > > >> > > > > >         it's no longer the leader for
> > the
> > > > > >> > "test-0".
> > > > > >> > > > > > > > >> > > > > >   To see the latency difference
> between
> > > 1.1
> > > > > and
> > > > > >> > 1.2,
> > > > > >> > > > > let's
> > > > > >> > > > > > > say
> > > > > >> > > > > > > > >> > there
> > > > > >> > > > > > > > >> > > > are
> > > > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> > > > > LeaderAndISR,
> > > > > >> > and
> > > > > >> > > > > there
> > > > > >> > > > > > > are
> > > > > >> > > > > > > > 8
> > > > > >> > > > > > > > >> io
> > > > > >> > > > > > > > >> > > > > threads,
> > > > > >> > > > > > > > >> > > > > >   so each io thread will process
> > > > > approximately
> > > > > >> > 3000
> > > > > >> > > > > > produce
> > > > > >> > > > > > > > >> > requests.
> > > > > >> > > > > > > > >> > > > Now
> > > > > >> > > > > > > > >> > > > > > let's investigate the io thread that
> > > > finally
> > > > > >> > > processed
> > > > > >> > > > > the
> > > > > >> > > > > > > > >> > > > LeaderAndISR.
> > > > > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if
> we
> > > > model
> > > > > >> the
> > > > > >> > > time
> > > > > >> > > > > when
> > > > > >> > > > > > > > their
> > > > > >> > > > > > > > >> > > > > remaining
> > > > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1,
> > > ...t2999,
> > > > > and
> > > > > >> > the
> > > > > >> > > > > > > > LeaderAndISR
> > > > > >> > > > > > > > >> > > > request
> > > > > >> > > > > > > > >> > > > > is
> > > > > >> > > > > > > > >> > > > > > processed at time t3000.
> > > > > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce
> > > request
> > > > > >> would
> > > > > >> > > have
> > > > > >> > > > > > > waited
> > > > > >> > > > > > > > an
> > > > > >> > > > > > > > >> > > extra
> > > > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the
> > 2nd
> > > > an
> > > > > >> extra
> > > > > >> > > > time
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> t3000 -
> > > > > >> > > > > > > > >> > > t1,
> > > > > >> > > > > > > > >> > > > > etc.
> > > > > >> > > > > > > > >> > > > > >   Roughly speaking, the latency
> > > difference
> > > > is
> > > > > >> > bigger
> > > > > >> > > > for
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > earlier
> > > > > >> > > > > > > > >> > > > > > produce requests than for the later
> > ones.
> > > > For
> > > > > >> the
> > > > > >> > > same
> > > > > >> > > > > > > reason,
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > more
> > > > > >> > > > > > > > >> > > > > > ProduceRequests queued
> > > > > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger
> > > > benefit
> > > > > >> we
> > > > > >> > get
> > > > > >> > > > > > (capped
> > > > > >> > > > > > > > by
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > > produce timeout).
> > > > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests
> > have
> > > > > >> acks=0 or
> > > > > >> > > > > acks=1
> > > > > >> > > > > > > > >> > > > > >   There will be no latency
> differences
> > in
> > > > > this
> > > > > >> > case,
> > > > > >> > > > but
> > > > > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records
> of
> > > > > >> partition
> > > > > >> > > > test-0
> > > > > >> > > > > in
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the
> > LeaderAndISR
> > > > > will
> > > > > >> be
> > > > > >> > > > > appended
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > local
> > > > > >> > > > > > > > >> > > > > log,
> > > > > >> > > > > > > > >> > > > > >         and eventually be truncated
> > after
> > > > > >> > processing
> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > LeaderAndISR.
> > > > > >> > > > > > > > >> > > > > > This is what's referred to as
> > > > > >> > > > > > > > >> > > > > >         "some unofficial definition
> of
> > > data
> > > > > >> loss
> > > > > >> > in
> > > > > >> > > > > terms
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > messages
> > > > > >> > > > > > > > >> > > > > > beyond the high watermark".
> > > > > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate
> > the
> > > > > effect
> > > > > >> > > since
> > > > > >> > > > if
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > > LeaderAndISR
> > > > > >> > > > > > > > >> > > > > > is immediately processed, the
> response
> > to
> > > > > >> > producers
> > > > > >> > > > will
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > >> > > > > >         the NotLeaderForPartition
> > error,
> > > > > >> causing
> > > > > >> > > > > producers
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > >> retry
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > This explanation above is the benefit
> > for
> > > > > >> reducing
> > > > > >> > > the
> > > > > >> > > > > > > latency
> > > > > >> > > > > > > > >> of a
> > > > > >> > > > > > > > >> > > > > broker
> > > > > >> > > > > > > > >> > > > > > becoming the follower,
> > > > > >> > > > > > > > >> > > > > > closely related is reducing the
> latency
> > > of
> > > > a
> > > > > >> > broker
> > > > > >> > > > > > becoming
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > leader.
> > > > > >> > > > > > > > >> > > > > > In this case, the benefit is even
> more
> > > > > >> obvious, if
> > > > > >> > > > other
> > > > > >> > > > > > > > brokers
> > > > > >> > > > > > > > >> > have
> > > > > >> > > > > > > > >> > > > > > resigned leadership, and the
> > > > > >> > > > > > > > >> > > > > > current broker should take
> leadership.
> > > Any
> > > > > >> delay
> > > > > >> > in
> > > > > >> > > > > > > processing
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > >> > > > > > > > >> > > > > > by clients as unavailability. In
> > extreme
> > > > > cases,
> > > > > >> > this
> > > > > >> > > > can
> > > > > >> > > > > > > cause
> > > > > >> > > > > > > > >> > failed
> > > > > >> > > > > > > > >> > > > > > produce requests if the retries are
> > > > > >> > > > > > > > >> > > > > > exhausted.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Another two types of controller
> > requests
> > > > are
> > > > > >> > > > > > UpdateMetadata
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly
> discuss
> > > as
> > > > > >> > follows:
> > > > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> > > > > processing
> > > > > >> > > means
> > > > > >> > > > > > > clients
> > > > > >> > > > > > > > >> > > receiving
> > > > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> > > > > leadership
> > > > > >> > info
> > > > > >> > > > > > > > >> > > > > > for certain partitions, and the
> effect
> > is
> > > > > more
> > > > > >> > > retries
> > > > > >> > > > > or
> > > > > >> > > > > > > even
> > > > > >> > > > > > > > >> > fatal
> > > > > >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long
> > queuing
> > > > time
> > > > > >> may
> > > > > >> > > > > degrade
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > performance
> > > > > >> > > > > > > > >> > > > > > of topic deletion.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Regarding your last question of the
> > delay
> > > > for
> > > > > >> > > > > > > > >> > DescribeLogDirsRequest,
> > > > > >> > > > > > > > >> > > > you
> > > > > >> > > > > > > > >> > > > > > are right
> > > > > >> > > > > > > > >> > > > > > that this KIP cannot help with the
> > > latency
> > > > in
> > > > > >> > > getting
> > > > > >> > > > > the
> > > > > >> > > > > > > log
> > > > > >> > > > > > > > >> dirs
> > > > > >> > > > > > > > >> > > > info,
> > > > > >> > > > > > > > >> > > > > > and it's only relevant
> > > > > >> > > > > > > > >> > > > > > when controller requests are
> involved.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Regards,
> > > > > >> > > > > > > > >> > > > > > Lucas
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > Lin
> > > <
> > > > > >> > > > > > > lindong28@gmail.com
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> > > wrote:
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> Hey Jun,
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is
> > good
> > > > > >> point.
> > > > > >> > So
> > > > > >> > > > the
> > > > > >> > > > > > > > feature
> > > > > >> > > > > > > > >> may
> > > > > >> > > > > > > > >> > > be
> > > > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> > > > > question
> > > > > >> > > below.
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Do you think this feature is also
> > useful
> > > > for
> > > > > >> > > non-JBOD
> > > > > >> > > > > > setup
> > > > > >> > > > > > > > or
> > > > > >> > > > > > > > >> it
> > > > > >> > > > > > > > >> > is
> > > > > >> > > > > > > > >> > > > > only
> > > > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be
> > > > useful
> > > > > to
> > > > > >> > > > > understand
> > > > > >> > > > > > > > this.
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD,
> > in
> > > > > order
> > > > > >> to
> > > > > >> > > move
> > > > > >> > > > > > > leaders
> > > > > >> > > > > > > > >> on
> > > > > >> > > > > > > > >> > the
> > > > > >> > > > > > > > >> > > > > >> failed
> > > > > >> > > > > > > > >> > > > > >> disk to other disks, the system
> > operator
> > > > > first
> > > > > >> > > needs
> > > > > >> > > > to
> > > > > >> > > > > > get
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > list
> > > > > >> > > > > > > > >> > > > of
> > > > > >> > > > > > > > >> > > > > >> partitions on the failed disk. This
> is
> > > > > >> currently
> > > > > >> > > > > achieved
> > > > > >> > > > > > > > using
> > > > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which
> > > sends
> > > > > >> > > > > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > > > > >> > to
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
> > > > controller
> > > > > >> > > > requests,
> > > > > >> > > > > > then
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > > > > >> > > > > >> may still take a long time to be
> > > processed
> > > > > by
> > > > > >> the
> > > > > >> > > > > broker.
> > > > > >> > > > > > > So
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > overall
> > > > > >> > > > > > > > >> > > > > >> time to move leaders away from the
> > > failed
> > > > > disk
> > > > > >> > may
> > > > > >> > > > > still
> > > > > >> > > > > > be
> > > > > >> > > > > > > > >> long
> > > > > >> > > > > > > > >> > > even
> > > > > >> > > > > > > > >> > > > > with
> > > > > >> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Thanks,
> > > > > >> > > > > > > > >> > > > > >> Dong
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM,
> Lucas
> > > > Wang <
> > > > > >> > > > > > > > >> lucasatucla@gmail.com
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > > wrote:
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment,
> > > Jun.
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > @Dong,
> > > > > >> > > > > > > > >> > > > > >> > Since both of the two comments in
> > your
> > > > > >> previous
> > > > > >> > > > email
> > > > > >> > > > > > are
> > > > > >> > > > > > > > >> about
> > > > > >> > > > > > > > >> > > the
> > > > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether
> > it's
> > > > > >> useful,
> > > > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do
> > you
> > > > > agree
> > > > > >> > that
> > > > > >> > > > > this
> > > > > >> > > > > > > KIP
> > > > > >> > > > > > > > >> can
> > > > > >> > > > > > > > >> > be
> > > > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned
> by
> > > Jun?
> > > > > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > Regards,
> > > > > >> > > > > > > > >> > > > > >> > Lucas
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM,
> Jun
> > > Rao
> > > > <
> > > > > >> > > > > > > jun@confluent.io>
> > > > > >> > > > > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are
> slow,
> > > one
> > > > > >> > probably
> > > > > >> > > > > > should
> > > > > >> > > > > > > > just
> > > > > >> > > > > > > > >> > kill
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP
> may
> > > not
> > > > > >> help.
> > > > > >> > If
> > > > > >> > > > > only
> > > > > >> > > > > > > one
> > > > > >> > > > > > > > of
> > > > > >> > > > > > > > >> > the
> > > > > >> > > > > > > > >> > > > > disks
> > > > > >> > > > > > > > >> > > > > >> on
> > > > > >> > > > > > > > >> > > > > >> > a
> > > > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to
> > fail
> > > > > that
> > > > > >> > disk
> > > > > >> > > > and
> > > > > >> > > > > > move
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > leaders
> > > > > >> > > > > > > > >> > > > > on
> > > > > >> > > > > > > > >> > > > > >> > that
> > > > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that
> > case,
> > > > > being
> > > > > >> > able
> > > > > >> > > > to
> > > > > >> > > > > > > > process
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > >> > > > > > > > >> > > > > >> > > requests faster will potentially
> > > help
> > > > > the
> > > > > >> > > > producers
> > > > > >> > > > > > > > recover
> > > > > >> > > > > > > > >> > > > quicker.
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > Jun
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM,
> > Dong
> > > > > Lin <
> > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > > wrote:
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some
> > follow
> > > up
> > > > > >> > > questions
> > > > > >> > > > > > below.
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each
> > > ProduceRequest
> > > > > >> covers
> > > > > >> > 20
> > > > > >> > > > > > > > partitions
> > > > > >> > > > > > > > >> > that
> > > > > >> > > > > > > > >> > > > are
> > > > > >> > > > > > > > >> > > > > >> > > randomly
> > > > > >> > > > > > > > >> > > > > >> > > > distributed across all
> > partitions,
> > > > > then
> > > > > >> > each
> > > > > >> > > > > > > > >> ProduceRequest
> > > > > >> > > > > > > > >> > > will
> > > > > >> > > > > > > > >> > > > > >> likely
> > > > > >> > > > > > > > >> > > > > >> > > > cover some partitions for
> which
> > > the
> > > > > >> broker
> > > > > >> > is
> > > > > >> > > > > still
> > > > > >> > > > > > > > >> leader
> > > > > >> > > > > > > > >> > > after
> > > > > >> > > > > > > > >> > > > > it
> > > > > >> > > > > > > > >> > > > > >> > > quickly
> > > > > >> > > > > > > > >> > > > > >> > > > processes the
> > > > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then
> broker
> > > > will
> > > > > >> still
> > > > > >> > > be
> > > > > >> > > > > slow
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > >> > > > processing
> > > > > >> > > > > > > > >> > > > > >> these
> > > > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request
> will
> > > > still
> > > > > be
> > > > > >> > very
> > > > > >> > > > > high
> > > > > >> > > > > > > with
> > > > > >> > > > > > > > >> this
> > > > > >> > > > > > > > >> > > > KIP.
> > > > > >> > > > > > > > >> > > > > It
> > > > > >> > > > > > > > >> > > > > >> > > seems
> > > > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will
> > > still
> > > > > >> timeout
> > > > > >> > > > after
> > > > > >> > > > > > 30
> > > > > >> > > > > > > > >> > seconds.
> > > > > >> > > > > > > > >> > > Is
> > > > > >> > > > > > > > >> > > > > >> this
> > > > > >> > > > > > > > >> > > > > >> > > > understanding correct?
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most
> > > ProduceRequest
> > > > > will
> > > > > >> > > still
> > > > > >> > > > > > > timeout
> > > > > >> > > > > > > > >> after
> > > > > >> > > > > > > > >> > > 30
> > > > > >> > > > > > > > >> > > > > >> > seconds,
> > > > > >> > > > > > > > >> > > > > >> > > > then it is less clear how this
> > KIP
> > > > > >> reduces
> > > > > >> > > > > average
> > > > > >> > > > > > > > >> produce
> > > > > >> > > > > > > > >> > > > > latency.
> > > > > >> > > > > > > > >> > > > > >> Can
> > > > > >> > > > > > > > >> > > > > >> > > you
> > > > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be
> > > improved
> > > > > by
> > > > > >> > this
> > > > > >> > > > KIP?
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
> > > > directly
> > > > > >> cares
> > > > > >> > > > > number
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > > truncated
> > > > > >> > > > > > > > >> > > > > >> > messages.
> > > > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can
> improve
> > > > > average
> > > > > >> > > > > throughput
> > > > > >> > > > > > > or
> > > > > >> > > > > > > > >> > reduce
> > > > > >> > > > > > > > >> > > > > >> message
> > > > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good
> to
> > > > > >> understand
> > > > > >> > > > this.
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Thanks,
> > > > > >> > > > > > > > >> > > > > >> > > > Dong
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM
> > > Lucas
> > > > > >> Wang <
> > > > > >> > > > > > > > >> > > lucasatucla@gmail.com
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable
> > > comments.
> > > > > >> Please
> > > > > >> > > see
> > > > > >> > > > > my
> > > > > >> > > > > > > > reply
> > > > > >> > > > > > > > >> > > below.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed
> only
> > 1
> > > > > >> > partition.
> > > > > >> > > > Now
> > > > > >> > > > > > > let's
> > > > > >> > > > > > > > >> > > consider
> > > > > >> > > > > > > > >> > > > a
> > > > > >> > > > > > > > >> > > > > >> more
> > > > > >> > > > > > > > >> > > > > >> > > > common
> > > > > >> > > > > > > > >> > > > > >> > > > > scenario
> > > > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader
> of
> > > > many
> > > > > >> > > > partitions.
> > > > > >> > > > > > And
> > > > > >> > > > > > > > >> let's
> > > > > >> > > > > > > > >> > > say
> > > > > >> > > > > > > > >> > > > > for
> > > > > >> > > > > > > > >> > > > > >> > some
> > > > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > > > > >> > > > > >> > > > > The number of leader
> > partitions
> > > on
> > > > > >> > broker0
> > > > > >> > > is
> > > > > >> > > > > so
> > > > > >> > > > > > > > large,
> > > > > >> > > > > > > > >> > say
> > > > > >> > > > > > > > >> > > > 10K,
> > > > > >> > > > > > > > >> > > > > >> that
> > > > > >> > > > > > > > >> > > > > >> > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > > > > >> > > > > >> > > > > and the operator would like
> to
> > > > shift
> > > > > >> the
> > > > > >> > > > > > leadership
> > > > > >> > > > > > > > >> for a
> > > > > >> > > > > > > > >> > > lot
> > > > > >> > > > > > > > >> > > > of
> > > > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> > > > > brokers,
> > > > > >> > > > > > > > >> > > > > >> > > > > either manually or through
> > some
> > > > > >> service
> > > > > >> > > like
> > > > > >> > > > > > cruise
> > > > > >> > > > > > > > >> > control.
> > > > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will
> > the
> > > > > >> > leadership
> > > > > >> > > > > > > > transitions
> > > > > >> > > > > > > > >> > > finish
> > > > > >> > > > > > > > >> > > > > >> more
> > > > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster
> > > > itself
> > > > > >> > > becoming
> > > > > >> > > > > more
> > > > > >> > > > > > > > >> > balanced,
> > > > > >> > > > > > > > >> > > > > >> > > > > but all existing producers
> > > > > >> corresponding
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > 9K
> > > > > >> > > > > > > > >> > > partitions
> > > > > >> > > > > > > > >> > > > > will
> > > > > >> > > > > > > > >> > > > > >> > get
> > > > > >> > > > > > > > >> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> > > > > timeout,
> > > > > >> > > thanks
> > > > > >> > > > to
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > batched
> > > > > >> > > > > > > > >> > > > > async
> > > > > >> > > > > > > > >> > > > > >> ZK
> > > > > >> > > > > > > > >> > > > > >> > > > > operations.
> > > > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature
> to
> > > > have
> > > > > >> > during
> > > > > >> > > > such
> > > > > >> > > > > > > > >> > troublesome
> > > > > >> > > > > > > > >> > > > > times.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the
> > Google
> > > > Doc
> > > > > >> have
> > > > > >> > > > shown
> > > > > >> > > > > > > that
> > > > > >> > > > > > > > >> with
> > > > > >> > > > > > > > >> > > this
> > > > > >> > > > > > > > >> > > > > KIP
> > > > > >> > > > > > > > >> > > > > >> > many
> > > > > >> > > > > > > > >> > > > > >> > > > > producers
> > > > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > > >> > > > > NotLeaderForPartition,
> > > > > >> > > > > > > > based
> > > > > >> > > > > > > > >> on
> > > > > >> > > > > > > > >> > > > which
> > > > > >> > > > > > > > >> > > > > >> they
> > > > > >> > > > > > > > >> > > > > >> > > > retry
> > > > > >> > > > > > > > >> > > > > >> > > > > immediately.
> > > > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > > > > >> seconds+quick
> > > > > >> > > > retry)
> > > > > >> > > > > > for
> > > > > >> > > > > > > > >> their
> > > > > >> > > > > > > > >> > > > single
> > > > > >> > > > > > > > >> > > > > >> > > message
> > > > > >> > > > > > > > >> > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> > > > > much smaller
> > > > > >> > > > > > > > >> > > > > >> > > > > compared with the case of
> > timing
> > > > out
> > > > > >> > > without
> > > > > >> > > > > the
> > > > > >> > > > > > > KIP
> > > > > >> > > > > > > > >> (30
> > > > > >> > > > > > > > >> > > > seconds
> > > > > >> > > > > > > > >> > > > > >> for
> > > > > >> > > > > > > > >> > > > > >> > > > timing
> > > > > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > > > > >> > > > > >> > > > > One might argue that
> reducing
> > > the
> > > > > >> timing
> > > > > >> > > out
> > > > > >> > > > on
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > producer
> > > > > >> > > > > > > > >> > > > > side
> > > > > >> > > > > > > > >> > > > > >> can
> > > > > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has
> > its
> > > > own
> > > > > >> > > > > > drawbacks[1].
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a
> metric
> > to
> > > > > show
> > > > > >> the
> > > > > >> > > > > number
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > > truncated
> > > > > >> > > > > > > > >> > > > > >> > messages
> > > > > >> > > > > > > > >> > > > > >> > > on
> > > > > >> > > > > > > > >> > > > > >> > > > > brokers,
> > > > > >> > > > > > > > >> > > > > >> > > > > with the experiments done in
> > the
> > > > > >> Google
> > > > > >> > > Doc,
> > > > > >> > > > it
> > > > > >> > > > > > > > should
> > > > > >> > > > > > > > >> be
> > > > > >> > > > > > > > >> > > easy
> > > > > >> > > > > > > > >> > > > > to
> > > > > >> > > > > > > > >> > > > > >> see
> > > > > >> > > > > > > > >> > > > > >> > > > that
> > > > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0
> > since
> > > > the
> > > > > >> > > > up-to-date
> > > > > >> > > > > > > > >> metadata
> > > > > >> > > > > > > > >> > > > avoids
> > > > > >> > > > > > > > >> > > > > >> > > appending
> > > > > >> > > > > > > > >> > > > > >> > > > > of messages
> > > > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE
> > requests.
> > > If
> > > > > we
> > > > > >> > talk
> > > > > >> > > > to a
> > > > > >> > > > > > > > system
> > > > > >> > > > > > > > >> > > > operator
> > > > > >> > > > > > > > >> > > > > >> and
> > > > > >> > > > > > > > >> > > > > >> > ask
> > > > > >> > > > > > > > >> > > > > >> > > > > whether
> > > > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful
> > IOs,
> > > I
> > > > > bet
> > > > > >> > most
> > > > > >> > > > > likely
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > answer
> > > > > >> > > > > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> yes.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question,
> I
> > > > think
> > > > > it
> > > > > >> > > might
> > > > > >> > > > be
> > > > > >> > > > > > > > >> helpful to
> > > > > >> > > > > > > > >> > > > > >> construct
> > > > > >> > > > > > > > >> > > > > >> > > some
> > > > > >> > > > > > > > >> > > > > >> > > > > formulas.
> > > > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling,
> I'm
> > > > going
> > > > > >> back
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > > case
> > > > > >> > > > > > > > >> where
> > > > > >> > > > > > > > >> > > > there
> > > > > >> > > > > > > > >> > > > > >> is
> > > > > >> > > > > > > > >> > > > > >> > > only
> > > > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > > > > >> > > > > >> > > > > Following the experiments in
> > the
> > > > > >> Google
> > > > > >> > > Doc,
> > > > > >> > > > > > let's
> > > > > >> > > > > > > > say
> > > > > >> > > > > > > > >> > > broker0
> > > > > >> > > > > > > > >> > > > > >> > becomes
> > > > > >> > > > > > > > >> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were
> still
> > N
> > > > > >> produce
> > > > > >> > > > > requests
> > > > > >> > > > > > in
> > > > > >> > > > > > > > its
> > > > > >> > > > > > > > >> > > > request
> > > > > >> > > > > > > > >> > > > > >> > queue.
> > > > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata
> > > > brought
> > > > > >> by
> > > > > >> > > this
> > > > > >> > > > > KIP,
> > > > > >> > > > > > > > >> broker0
> > > > > >> > > > > > > > >> > > can
> > > > > >> > > > > > > > >> > > > > >> reply
> > > > > >> > > > > > > > >> > > > > >> > > with
> > > > > >> > > > > > > > >> > > > > >> > > > an
> > > > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition
> > exception,
> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the
> > > average
> > > > > >> > > processing
> > > > > >> > > > > > time
> > > > > >> > > > > > > of
> > > > > >> > > > > > > > >> > > replying
> > > > > >> > > > > > > > >> > > > > >> with
> > > > > >> > > > > > > > >> > > > > >> > > such
> > > > > >> > > > > > > > >> > > > > >> > > > an
> > > > > >> > > > > > > > >> > > > > >> > > > > error message.
> > > > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker
> > > will
> > > > > >> need to
> > > > > >> > > > > append
> > > > > >> > > > > > > > >> messages
> > > > > >> > > > > > > > >> > to
> > > > > >> > > > > > > > >> > > > > >> > segments,
> > > > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to
> > > disk,
> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the
> > > average
> > > > > >> > > processing
> > > > > >> > > > > > time
> > > > > >> > > > > > > > for
> > > > > >> > > > > > > > >> > such
> > > > > >> > > > > > > > >> > > > > logic.
> > > > > >> > > > > > > > >> > > > > >> > > > > Then the average extra
> latency
> > > > > >> incurred
> > > > > >> > > > without
> > > > > >> > > > > > > this
> > > > > >> > > > > > > > >> KIP
> > > > > >> > > > > > > > >> > is
> > > > > >> > > > > > > > >> > > N
> > > > > >> > > > > > > > >> > > > *
> > > > > >> > > > > > > > >> > > > > >> (M2 -
> > > > > >> > > > > > > > >> > > > > >> > > > M1) /
> > > > > >> > > > > > > > >> > > > > >> > > > > 2.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should
> always
> > be
> > > > > >> larger
> > > > > >> > > than
> > > > > >> > > > > M1,
> > > > > >> > > > > > > > which
> > > > > >> > > > > > > > >> > means
> > > > > >> > > > > > > > >> > > > as
> > > > > >> > > > > > > > >> > > > > >> long
> > > > > >> > > > > > > > >> > > > > >> > > as N
> > > > > >> > > > > > > > >> > > > > >> > > > > is positive,
> > > > > >> > > > > > > > >> > > > > >> > > > > we would see improvements on
> > the
> > > > > >> average
> > > > > >> > > > > latency.
> > > > > >> > > > > > > > >> > > > > >> > > > > There does not need to be
> > > > > significant
> > > > > >> > > backlog
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> requests
> > > > > >> > > > > > > > >> > in
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > request
> > > > > >> > > > > > > > >> > > > > >> > > > > queue,
> > > > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of
> disk
> > > > > >> performance
> > > > > >> > > to
> > > > > >> > > > > have
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > > improvement.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Regards,
> > > > > >> > > > > > > > >> > > > > >> > > > > Lucas
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing
> the
> > > > > >> timeout on
> > > > > >> > > the
> > > > > >> > > > > > > > producer
> > > > > >> > > > > > > > >> > side
> > > > > >> > > > > > > > >> > > > can
> > > > > >> > > > > > > > >> > > > > >> > trigger
> > > > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate
> requests
> > > > > >> > > > > > > > >> > > > > >> > > > > when the corresponding
> leader
> > > > broker
> > > > > >> is
> > > > > >> > > > > > overloaded,
> > > > > >> > > > > > > > >> > > > exacerbating
> > > > > >> > > > > > > > >> > > > > >> the
> > > > > >> > > > > > > > >> > > > > >> > > > > situation.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18
> > PM,
> > > > Dong
> > > > > >> Lin
> > > > > >> > <
> > > > > >> > > > > > > > >> > > lindong28@gmail.com
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the
> detailed
> > > > > >> > > documentation
> > > > > >> > > > of
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > > experiment.
> > > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think
> > having
> > > a
> > > > > >> > separate
> > > > > >> > > > > queue
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > >> > > > controller
> > > > > >> > > > > > > > >> > > > > >> > > requests
> > > > > >> > > > > > > > >> > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you
> > > mentioned
> > > > > in
> > > > > >> the
> > > > > >> > > > > summary
> > > > > >> > > > > > > > >> section
> > > > > >> > > > > > > > >> > of
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > Google
> > > > > >> > > > > > > > >> > > > > >> > > > > doc,
> > > > > >> > > > > > > > >> > > > > >> > > > > > controller requests are
> > > > generally
> > > > > >> more
> > > > > >> > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Eno,

I fully agree with Becket here. If the motivation section makes sense, and
we know we can get burnt by this problem,
then the exact numbers (which vary case by case according to the config
settings and traffic pattern)
are no longer as important.

Thanks,
Lucas


On Tue, Aug 21, 2018 at 9:39 AM Becket Qin <be...@gmail.com> wrote:

> Hi Eno,
>
> Thanks for the comments. This KIP is not really about improving the
> performance in general. It is about ensuring the cluster state can still be
> updated quickly even if the brokers are under heavy load.
>
> We have seen quite often that it took dozens of seconds for a broker to
> process the requests sent by the controller when the cluster is under heavy
> load. This leads to the issues Lucas mentioned in the motivation part.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> > On Aug 20, 2018, at 11:33 PM, Eno Thereska <en...@gmail.com>
> wrote:
> >
> > Hi folks,
> >
> > I looked at the previous numbers that Lucas provided (thanks!) but it's
> > still not clear to me whether the performance benefits justify the added
> > complexity. I'm looking for some intuition here (a graph would be great
> but
> > not required): for a small/medium/large cluster, what are the expected
> > percentage of control requests today that will benefit from the change?
> > It's a bit hard to go through this level of detail without knowing the
> > expected end-to-end benefit. The best folks to answer this might be ones
> > running such clusters, and ideally should pitch in with some data.
> >
> > Thanks
> > Eno
> >
> > On Mon, Aug 20, 2018 at 7:29 AM, Becket Qin <be...@gmail.com>
> wrote:
> >
> >> Hi Lucas,
> >>
> >> In KIP-103, we introduced a convention to define and look up the
> listeners.
> >> So it would be good if the later KIPs can follow the same convention.
> >>
> >> From what I understand, the advertised.listeners is actually designed
> for
> >> our purpose, i.e. providing a list of listeners that can be used in
> >> different cases. In KIP-103 it was used to separate internal traffic
> from
> >> the external traffic. It is not just for the user traffic or data
> >> only. So adding
> >> a controller listener is not repurposing the config. Also, ZK structure
> is
> >> only visible to brokers, the clients will still only see the listeners
> they
> >> are seeing today.
> >>
> >> For this KIP, we are essentially trying to separate the controller
> traffic
> >> from the inter-broker data traffic. So adding a new
> >> listener.name.for.controller config seems reasonable. The behavior would
> >> be:
> >> 1. If the listener.name.for.controller is set, the broker-controller
> >> communication will go through that listener.
> >> 2. Otherwise, the controller traffic falls back to use
> >> inter.broker.listener.name or inter.broker.security.protocol, which is
> the
> >> current behavior.
> >>
> >> Regarding updating the security protocol with one line change v.s
> two-lines
> >> change, I am a little confused, can you elaborate?
> >>
> >> Regarding the possibility of hurry and misreading. It is the system
> admin's
> >> responsibility to configure the right listener to ensure that different
> >> kinds of traffic are using the correct endpoints. So I think it is
> better
> >> that we always follow the same of convention instead of doing it in
> >> different ways.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >>
> >> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >>
> >>> Thanks for the review, Becket.
> >>>
> >>> (1) After comparing the two approaches, I still feel the current
> writeup
> >> is
> >>> a little better.
> >>> a. The current writeup asks for an explicit endpoint while reusing the
> >>> existing "inter.broker.listener.name" with the exactly same semantic,
> >>> and your proposed change asks for a new listener name for controller
> >> while
> >>> reusing the existing "advertised.listeners" config with a slight
> semantic
> >>> change since a new controller endpoint needs to be added to it.
> >>> Hence conceptually the current writeup requires one config change
> instead
> >>> of two.
> >>> Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> >>> instead of two, e.g. "INTERNAL" and "CONTROLLER",
> >>> if an operator decides to switch from PLAINTEXT to SSL for internal
> >>> traffic, chances are that she wants to upgrade
> >>> both controller connections and data connections, she only needs to
> >> update
> >>> one line in
> >>> the "listener.security.protocol.map" config, and avoids possible
> >> mistakes.
> >>>
> >>>
> >>> b. When this KIP is picked up by an operator who is in a hurry without
> >>> reading the docs, if she sees a
> >>> new listener name for controller is required, and chances are there is
> >>> already a list of listeners,
> >>> it's possible for her to simply choose an existing listener name,
> without
> >>> explicitly creating
> >>> the new CONTROLLER listener and endpoints. If this is done, Kafka will
> be
> >>> run with the existing
> >>> behavior, defeating the purpose of this KIP.
> >>> In comparison, if she sees a separate endpoint is being asked, I feel
> >> it's
> >>> unlikely for her to
> >>> copy and paste an existing endpoint.
> >>>
> >>> Please let me know your comments.
> >>>
> >>> (2) Good catch, it's a typo, and it's been fixed.
> >>>
> >>> Thanks,
> >>> Lucas
> >>>
> >>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for the comments, Joel.

I addressed all but the last one, where Jun also shared a comment in the
Vote thread to
change it to "controller.listener.name". I actually feel CONTROLLER is
better since it's a well defined
concept in Kafka, while it's easier to confuse people with CONTROL since
in the code we refer to some request used for transactional producing as a
CONTROL batch.

Thanks,
Lucas

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Joel Koshy <jj...@gmail.com>.

I had some offline discussions with Lucas on this KIP. While it is much
more work than the original proposals, separating the control plane
entirely removes any interference with the data plane as summarized under
the rejected alternatives section.

Just a few minor comments:

   - Can you update the link to the discussion thread and vote thread?
   - The idle ratio metrics are fairly important for monitoring. I think we
   agreed that these would only apply to the data plane (otherwise there will
   always be some skew due to the controller plane). If so, can you clarify
   that somewhere in the doc?
   - Personally, I prefer the term CONTROL to CONTROLLER in the configs.
   CONTROLLER makes it sound like it is a special listener on the controller.
   CONTROL clarifies that this is a listener for receiving control plane
   requests from the controller.


Thanks,

Joel

On Wed, Aug 22, 2018 at 12:45 AM, Eno Thereska <en...@gmail.com>
wrote:

> Ok thanks, if you guys are seeing this at LinkedIn then the motivation
> makes more sense.
>
> Eno
>
> On Tue, Aug 21, 2018 at 5:39 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Eno,
> >
> > Thanks for the comments. This KIP is not really about improving the
> > performance in general. It is about ensuring the cluster state can still
> be
> > updated quickly even if the brokers are under heavy load.
> >
> > We have seen quite often that it took dozens of seconds for a broker to
> > process the requests sent by the controller when the cluster is under
> heavy
> > load. This leads to the issues Lucas mentioned in the motivation part.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > > On Aug 20, 2018, at 11:33 PM, Eno Thereska <en...@gmail.com>
> > wrote:
> > >
> > > Hi folks,
> > >
> > > I looked at the previous numbers that Lucas provided (thanks!) but it's
> > > still not clear to me whether the performance benefits justify the
> added
> > > complexity. I'm looking for some intuition here (a graph would be great
> > but
> > > not required): for a small/medium/large cluster, what are the expected
> > > percentage of control requests today that will benefit from the change?
> > > It's a bit hard to go through this level of detail without knowing the
> > > expected end-to-end benefit. The best folks to answer this might be
> ones
> > > running such clusters, and ideally should pitch in with some data.
> > >
> > > Thanks
> > > Eno
> > >
> > > On Mon, Aug 20, 2018 at 7:29 AM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > >> Hi Lucas,
> > >>
> > >> In KIP-103, we introduced a convention to define and look up the
> > listeners.
> > >> So it would be good if the later KIPs can follow the same convention.
> > >>
> > >> From what I understand, the advertised.listeners is actually designed
> > for
> > >> our purpose, i.e. providing a list of listeners that can be used in
> > >> different cases. In KIP-103 it was used to separate internal traffic
> > from
> > >> the external traffic. It is not just for the user traffic or data
> > >> only. So adding
> > >> a controller listener is not repurposing the config. Also, ZK
> structure
> > is
> > >> only visible to brokers, the clients will still only see the listeners
> > they
> > >> are seeing today.
> > >>
> > >> For this KIP, we are essentially trying to separate the controller
> > traffic
> > >> from the inter-broker data traffic. So adding a new
> > >> listener.name.for.controller config seems reasonable. The behavior
> would
> > >> be:
> > >> 1. If the listener.name.for.controller is set, the broker-controller
> > >> communication will go through that listener.
> > >> 2. Otherwise, the controller traffic falls back to use
> > >> inter.broker.listener.name or inter.broker.security.protocol, which
> is
> > the
> > >> current behavior.
> > >>
> > >> Regarding updating the security protocol with one line change v.s
> > two-lines
> > >> change, I am a little confused, can you elaborate?
> > >>
> > >> Regarding the possibility of hurry and misreading. It is the system
> > admin's
> > >> responsibility to configure the right listener to ensure that
> different
> > >> kinds of traffic are using the correct endpoints. So I think it is
> > better
> > >> that we always follow the same of convention instead of doing it in
> > >> different ways.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >>
> > >>
> > >> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >>
> > >>> Thanks for the review, Becket.
> > >>>
> > >>> (1) After comparing the two approaches, I still feel the current
> > writeup
> > >> is
> > >>> a little better.
> > >>> a. The current writeup asks for an explicit endpoint while reusing
> the
> > >>> existing "inter.broker.listener.name" with the exactly same
> semantic,
> > >>> and your proposed change asks for a new listener name for controller
> > >> while
> > >>> reusing the existing "advertised.listeners" config with a slight
> > semantic
> > >>> change since a new controller endpoint needs to be added to it.
> > >>> Hence conceptually the current writeup requires one config change
> > instead
> > >>> of two.
> > >>> Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> > >>> instead of two, e.g. "INTERNAL" and "CONTROLLER",
> > >>> if an operator decides to switch from PLAINTEXT to SSL for internal
> > >>> traffic, chances are that she wants to upgrade
> > >>> both controller connections and data connections, she only needs to
> > >> update
> > >>> one line in
> > >>> the "listener.security.protocol.map" config, and avoids possible
> > >> mistakes.
> > >>>
> > >>>
> > >>> b. When this KIP is picked up by an operator who is in a hurry
> without
> > >>> reading the docs, if she sees a
> > >>> new listener name for controller is required, and chances are there
> is
> > >>> already a list of listeners,
> > >>> it's possible for her to simply choose an existing listener name,
> > without
> > >>> explicitly creating
> > >>> the new CONTROLLER listener and endpoints. If this is done, Kafka
> will
> > be
> > >>> run with the existing
> > >>> behavior, defeating the purpose of this KIP.
> > >>> In comparison, if she sees a separate endpoint is being asked, I feel
> > >> it's
> > >>> unlikely for her to
> > >>> copy and paste an existing endpoint.
> > >>>
> > >>> Please let me know your comments.
> > >>>
> > >>> (2) Good catch, it's a typo, and it's been fixed.
> > >>>
> > >>> Thanks,
> > >>> Lucas
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Eno Thereska <en...@gmail.com>.

Ok thanks, if you guys are seeing this at LinkedIn then the motivation
makes more sense.

Eno

On Tue, Aug 21, 2018 at 5:39 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Eno,
>
> Thanks for the comments. This KIP is not really about improving the
> performance in general. It is about ensuring the cluster state can still be
> updated quickly even if the brokers are under heavy load.
>
> We have seen quite often that it took dozens of seconds for a broker to
> process the requests sent by the controller when the cluster is under heavy
> load. This leads to the issues Lucas mentioned in the motivation part.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> > On Aug 20, 2018, at 11:33 PM, Eno Thereska <en...@gmail.com>
> wrote:
> >
> > Hi folks,
> >
> > I looked at the previous numbers that Lucas provided (thanks!) but it's
> > still not clear to me whether the performance benefits justify the added
> > complexity. I'm looking for some intuition here (a graph would be great
> but
> > not required): for a small/medium/large cluster, what are the expected
> > percentage of control requests today that will benefit from the change?
> > It's a bit hard to go through this level of detail without knowing the
> > expected end-to-end benefit. The best folks to answer this might be ones
> > running such clusters, and ideally should pitch in with some data.
> >
> > Thanks
> > Eno
> >
> > On Mon, Aug 20, 2018 at 7:29 AM, Becket Qin <be...@gmail.com>
> wrote:
> >
> >> Hi Lucas,
> >>
> >> In KIP-103, we introduced a convention to define and look up the
> listeners.
> >> So it would be good if the later KIPs can follow the same convention.
> >>
> >> From what I understand, the advertised.listeners is actually designed
> for
> >> our purpose, i.e. providing a list of listeners that can be used in
> >> different cases. In KIP-103 it was used to separate internal traffic
> from
> >> the external traffic. It is not just for the user traffic or data
> >> only. So adding
> >> a controller listener is not repurposing the config. Also, ZK structure
> is
> >> only visible to brokers, the clients will still only see the listeners
> they
> >> are seeing today.
> >>
> >> For this KIP, we are essentially trying to separate the controller
> traffic
> >> from the inter-broker data traffic. So adding a new
> >> listener.name.for.controller config seems reasonable. The behavior would
> >> be:
> >> 1. If the listener.name.for.controller is set, the broker-controller
> >> communication will go through that listener.
> >> 2. Otherwise, the controller traffic falls back to use
> >> inter.broker.listener.name or inter.broker.security.protocol, which is
> the
> >> current behavior.
> >>
> >> Regarding updating the security protocol with one line change v.s
> two-lines
> >> change, I am a little confused, can you elaborate?
> >>
> >> Regarding the possibility of hurry and misreading. It is the system
> admin's
> >> responsibility to configure the right listener to ensure that different
> >> kinds of traffic are using the correct endpoints. So I think it is
> better
> >> that we always follow the same of convention instead of doing it in
> >> different ways.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >>
> >> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >>
> >>> Thanks for the review, Becket.
> >>>
> >>> (1) After comparing the two approaches, I still feel the current
> writeup
> >> is
> >>> a little better.
> >>> a. The current writeup asks for an explicit endpoint while reusing the
> >>> existing "inter.broker.listener.name" with the exactly same semantic,
> >>> and your proposed change asks for a new listener name for controller
> >> while
> >>> reusing the existing "advertised.listeners" config with a slight
> semantic
> >>> change since a new controller endpoint needs to be added to it.
> >>> Hence conceptually the current writeup requires one config change
> instead
> >>> of two.
> >>> Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> >>> instead of two, e.g. "INTERNAL" and "CONTROLLER",
> >>> if an operator decides to switch from PLAINTEXT to SSL for internal
> >>> traffic, chances are that she wants to upgrade
> >>> both controller connections and data connections, she only needs to
> >> update
> >>> one line in
> >>> the "listener.security.protocol.map" config, and avoids possible
> >> mistakes.
> >>>
> >>>
> >>> b. When this KIP is picked up by an operator who is in a hurry without
> >>> reading the docs, if she sees a
> >>> new listener name for controller is required, and chances are there is
> >>> already a list of listeners,
> >>> it's possible for her to simply choose an existing listener name,
> without
> >>> explicitly creating
> >>> the new CONTROLLER listener and endpoints. If this is done, Kafka will
> be
> >>> run with the existing
> >>> behavior, defeating the purpose of this KIP.
> >>> In comparison, if she sees a separate endpoint is being asked, I feel
> >> it's
> >>> unlikely for her to
> >>> copy and paste an existing endpoint.
> >>>
> >>> Please let me know your comments.
> >>>
> >>> (2) Good catch, it's a typo, and it's been fixed.
> >>>
> >>> Thanks,
> >>> Lucas
> >>>
> >>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Eno,

Thanks for the comments. This KIP is not really about improving the performance in general. It is about ensuring the cluster state can still be updated quickly even if the brokers are under heavy load.

We have seen quite often that it took dozens of seconds for a broker to process the requests sent by the controller when the cluster is under heavy load. This leads to the issues Lucas mentioned in the motivation part.

Thanks,

Jiangjie (Becket) Qin

> On Aug 20, 2018, at 11:33 PM, Eno Thereska <en...@gmail.com> wrote:
> 
> Hi folks,
> 
> I looked at the previous numbers that Lucas provided (thanks!) but it's
> still not clear to me whether the performance benefits justify the added
> complexity. I'm looking for some intuition here (a graph would be great but
> not required): for a small/medium/large cluster, what are the expected
> percentage of control requests today that will benefit from the change?
> It's a bit hard to go through this level of detail without knowing the
> expected end-to-end benefit. The best folks to answer this might be ones
> running such clusters, and ideally should pitch in with some data.
> 
> Thanks
> Eno
> 
> On Mon, Aug 20, 2018 at 7:29 AM, Becket Qin <be...@gmail.com> wrote:
> 
>> Hi Lucas,
>> 
>> In KIP-103, we introduced a convention to define and look up the listeners.
>> So it would be good if the later KIPs can follow the same convention.
>> 
>> From what I understand, the advertised.listeners is actually designed for
>> our purpose, i.e. providing a list of listeners that can be used in
>> different cases. In KIP-103 it was used to separate internal traffic from
>> the external traffic. It is not just for the user traffic or data
>> only. So adding
>> a controller listener is not repurposing the config. Also, ZK structure is
>> only visible to brokers, the clients will still only see the listeners they
>> are seeing today.
>> 
>> For this KIP, we are essentially trying to separate the controller traffic
>> from the inter-broker data traffic. So adding a new
>> listener.name.for.controller config seems reasonable. The behavior would
>> be:
>> 1. If the listener.name.for.controller is set, the broker-controller
>> communication will go through that listener.
>> 2. Otherwise, the controller traffic falls back to use
>> inter.broker.listener.name or inter.broker.security.protocol, which is the
>> current behavior.
>> 
>> Regarding updating the security protocol with one line change v.s two-lines
>> change, I am a little confused, can you elaborate?
>> 
>> Regarding the possibility of hurry and misreading. It is the system admin's
>> responsibility to configure the right listener to ensure that different
>> kinds of traffic are using the correct endpoints. So I think it is better
>> that we always follow the same of convention instead of doing it in
>> different ways.
>> 
>> Thanks,
>> 
>> Jiangjie (Becket) Qin
>> 
>> 
>> 
>> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com> wrote:
>> 
>>> Thanks for the review, Becket.
>>> 
>>> (1) After comparing the two approaches, I still feel the current writeup
>> is
>>> a little better.
>>> a. The current writeup asks for an explicit endpoint while reusing the
>>> existing "inter.broker.listener.name" with the exactly same semantic,
>>> and your proposed change asks for a new listener name for controller
>> while
>>> reusing the existing "advertised.listeners" config with a slight semantic
>>> change since a new controller endpoint needs to be added to it.
>>> Hence conceptually the current writeup requires one config change instead
>>> of two.
>>> Also with one listener name, e.g. INTERNAL, for inter broker traffic,
>>> instead of two, e.g. "INTERNAL" and "CONTROLLER",
>>> if an operator decides to switch from PLAINTEXT to SSL for internal
>>> traffic, chances are that she wants to upgrade
>>> both controller connections and data connections, she only needs to
>> update
>>> one line in
>>> the "listener.security.protocol.map" config, and avoids possible
>> mistakes.
>>> 
>>> 
>>> b. When this KIP is picked up by an operator who is in a hurry without
>>> reading the docs, if she sees a
>>> new listener name for controller is required, and chances are there is
>>> already a list of listeners,
>>> it's possible for her to simply choose an existing listener name, without
>>> explicitly creating
>>> the new CONTROLLER listener and endpoints. If this is done, Kafka will be
>>> run with the existing
>>> behavior, defeating the purpose of this KIP.
>>> In comparison, if she sees a separate endpoint is being asked, I feel
>> it's
>>> unlikely for her to
>>> copy and paste an existing endpoint.
>>> 
>>> Please let me know your comments.
>>> 
>>> (2) Good catch, it's a typo, and it's been fixed.
>>> 
>>> Thanks,
>>> Lucas
>>> 
>>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Eno Thereska <en...@gmail.com>.

Hi folks,

I looked at the previous numbers that Lucas provided (thanks!) but it's
still not clear to me whether the performance benefits justify the added
complexity. I'm looking for some intuition here (a graph would be great but
not required): for a small/medium/large cluster, what are the expected
percentage of control requests today that will benefit from the change?
It's a bit hard to go through this level of detail without knowing the
expected end-to-end benefit. The best folks to answer this might be ones
running such clusters, and ideally should pitch in with some data.

Thanks
Eno

On Mon, Aug 20, 2018 at 7:29 AM, Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> In KIP-103, we introduced a convention to define and look up the listeners.
> So it would be good if the later KIPs can follow the same convention.
>
> From what I understand, the advertised.listeners is actually designed for
> our purpose, i.e. providing a list of listeners that can be used in
> different cases. In KIP-103 it was used to separate internal traffic from
> the external traffic. It is not just for the user traffic or data
> only. So adding
> a controller listener is not repurposing the config. Also, ZK structure is
> only visible to brokers, the clients will still only see the listeners they
> are seeing today.
>
> For this KIP, we are essentially trying to separate the controller traffic
> from the inter-broker data traffic. So adding a new
> listener.name.for.controller config seems reasonable. The behavior would
> be:
> 1. If the listener.name.for.controller is set, the broker-controller
> communication will go through that listener.
> 2. Otherwise, the controller traffic falls back to use
> inter.broker.listener.name or inter.broker.security.protocol, which is the
> current behavior.
>
> Regarding updating the security protocol with one line change v.s two-lines
> change, I am a little confused, can you elaborate?
>
> Regarding the possibility of hurry and misreading. It is the system admin's
> responsibility to configure the right listener to ensure that different
> kinds of traffic are using the correct endpoints. So I think it is better
> that we always follow the same of convention instead of doing it in
> different ways.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the review, Becket.
> >
> > (1) After comparing the two approaches, I still feel the current writeup
> is
> > a little better.
> > a. The current writeup asks for an explicit endpoint while reusing the
> > existing "inter.broker.listener.name" with the exactly same semantic,
> > and your proposed change asks for a new listener name for controller
> while
> > reusing the existing "advertised.listeners" config with a slight semantic
> > change since a new controller endpoint needs to be added to it.
> > Hence conceptually the current writeup requires one config change instead
> > of two.
> > Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> > instead of two, e.g. "INTERNAL" and "CONTROLLER",
> > if an operator decides to switch from PLAINTEXT to SSL for internal
> > traffic, chances are that she wants to upgrade
> > both controller connections and data connections, she only needs to
> update
> > one line in
> > the "listener.security.protocol.map" config, and avoids possible
> mistakes.
> >
> >
> > b. When this KIP is picked up by an operator who is in a hurry without
> > reading the docs, if she sees a
> > new listener name for controller is required, and chances are there is
> > already a list of listeners,
> > it's possible for her to simply choose an existing listener name, without
> > explicitly creating
> > the new CONTROLLER listener and endpoints. If this is done, Kafka will be
> > run with the existing
> > behavior, defeating the purpose of this KIP.
> > In comparison, if she sees a separate endpoint is being asked, I feel
> it's
> > unlikely for her to
> > copy and paste an existing endpoint.
> >
> > Please let me know your comments.
> >
> > (2) Good catch, it's a typo, and it's been fixed.
> >
> > Thanks,
> > Lucas
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks Becket. Following the convention of KIP-103 makes sense.
I've updated the KIP with your proposed changes. Please take another look.

Lucas

On Mon, Aug 20, 2018 at 7:29 AM Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> In KIP-103, we introduced a convention to define and look up the listeners.
> So it would be good if the later KIPs can follow the same convention.
>
> From what I understand, the advertised.listeners is actually designed for
> our purpose, i.e. providing a list of listeners that can be used in
> different cases. In KIP-103 it was used to separate internal traffic from
> the external traffic. It is not just for the user traffic or data
> only. So adding
> a controller listener is not repurposing the config. Also, ZK structure is
> only visible to brokers, the clients will still only see the listeners they
> are seeing today.
>
> For this KIP, we are essentially trying to separate the controller traffic
> from the inter-broker data traffic. So adding a new
> listener.name.for.controller config seems reasonable. The behavior would
> be:
> 1. If the listener.name.for.controller is set, the broker-controller
> communication will go through that listener.
> 2. Otherwise, the controller traffic falls back to use
> inter.broker.listener.name or inter.broker.security.protocol, which is the
> current behavior.
>
> Regarding updating the security protocol with one line change v.s two-lines
> change, I am a little confused, can you elaborate?
>
> Regarding the possibility of hurry and misreading. It is the system admin's
> responsibility to configure the right listener to ensure that different
> kinds of traffic are using the correct endpoints. So I think it is better
> that we always follow the same of convention instead of doing it in
> different ways.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the review, Becket.
> >
> > (1) After comparing the two approaches, I still feel the current writeup
> is
> > a little better.
> > a. The current writeup asks for an explicit endpoint while reusing the
> > existing "inter.broker.listener.name" with the exactly same semantic,
> > and your proposed change asks for a new listener name for controller
> while
> > reusing the existing "advertised.listeners" config with a slight semantic
> > change since a new controller endpoint needs to be added to it.
> > Hence conceptually the current writeup requires one config change instead
> > of two.
> > Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> > instead of two, e.g. "INTERNAL" and "CONTROLLER",
> > if an operator decides to switch from PLAINTEXT to SSL for internal
> > traffic, chances are that she wants to upgrade
> > both controller connections and data connections, she only needs to
> update
> > one line in
> > the "listener.security.protocol.map" config, and avoids possible
> mistakes.
> >
> >
> > b. When this KIP is picked up by an operator who is in a hurry without
> > reading the docs, if she sees a
> > new listener name for controller is required, and chances are there is
> > already a list of listeners,
> > it's possible for her to simply choose an existing listener name, without
> > explicitly creating
> > the new CONTROLLER listener and endpoints. If this is done, Kafka will be
> > run with the existing
> > behavior, defeating the purpose of this KIP.
> > In comparison, if she sees a separate endpoint is being asked, I feel
> it's
> > unlikely for her to
> > copy and paste an existing endpoint.
> >
> > Please let me know your comments.
> >
> > (2) Good catch, it's a typo, and it's been fixed.
> >
> > Thanks,
> > Lucas
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

In KIP-103, we introduced a convention to define and look up the listeners.
So it would be good if the later KIPs can follow the same convention.

From what I understand, the advertised.listeners is actually designed for
our purpose, i.e. providing a list of listeners that can be used in
different cases. In KIP-103 it was used to separate internal traffic from
the external traffic. It is not just for the user traffic or data
only. So adding
a controller listener is not repurposing the config. Also, ZK structure is
only visible to brokers, the clients will still only see the listeners they
are seeing today.

For this KIP, we are essentially trying to separate the controller traffic
from the inter-broker data traffic. So adding a new
listener.name.for.controller config seems reasonable. The behavior would
be:
1. If the listener.name.for.controller is set, the broker-controller
communication will go through that listener.
2. Otherwise, the controller traffic falls back to use
inter.broker.listener.name or inter.broker.security.protocol, which is the
current behavior.

Regarding updating the security protocol with one line change v.s two-lines
change, I am a little confused, can you elaborate?

Regarding the possibility of hurry and misreading. It is the system admin's
responsibility to configure the right listener to ensure that different
kinds of traffic are using the correct endpoints. So I think it is better
that we always follow the same of convention instead of doing it in
different ways.

Thanks,

Jiangjie (Becket) Qin

On Fri, Aug 17, 2018 at 4:34 AM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the review, Becket.
>
> (1) After comparing the two approaches, I still feel the current writeup is
> a little better.
> a. The current writeup asks for an explicit endpoint while reusing the
> existing "inter.broker.listener.name" with the exactly same semantic,
> and your proposed change asks for a new listener name for controller while
> reusing the existing "advertised.listeners" config with a slight semantic
> change since a new controller endpoint needs to be added to it.
> Hence conceptually the current writeup requires one config change instead
> of two.
> Also with one listener name, e.g. INTERNAL, for inter broker traffic,
> instead of two, e.g. "INTERNAL" and "CONTROLLER",
> if an operator decides to switch from PLAINTEXT to SSL for internal
> traffic, chances are that she wants to upgrade
> both controller connections and data connections, she only needs to update
> one line in
> the "listener.security.protocol.map" config, and avoids possible mistakes.
>
>
> b. When this KIP is picked up by an operator who is in a hurry without
> reading the docs, if she sees a
> new listener name for controller is required, and chances are there is
> already a list of listeners,
> it's possible for her to simply choose an existing listener name, without
> explicitly creating
> the new CONTROLLER listener and endpoints. If this is done, Kafka will be
> run with the existing
> behavior, defeating the purpose of this KIP.
> In comparison, if she sees a separate endpoint is being asked, I feel it's
> unlikely for her to
> copy and paste an existing endpoint.
>
> Please let me know your comments.
>
> (2) Good catch, it's a typo, and it's been fixed.
>
> Thanks,
> Lucas
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for the review, Becket.

(1) After comparing the two approaches, I still feel the current writeup is
a little better.
a. The current writeup asks for an explicit endpoint while reusing the
existing "inter.broker.listener.name" with the exactly same semantic,
and your proposed change asks for a new listener name for controller while
reusing the existing "advertised.listeners" config with a slight semantic
change since a new controller endpoint needs to be added to it.
Hence conceptually the current writeup requires one config change instead
of two.
Also with one listener name, e.g. INTERNAL, for inter broker traffic,
instead of two, e.g. "INTERNAL" and "CONTROLLER",
if an operator decides to switch from PLAINTEXT to SSL for internal
traffic, chances are that she wants to upgrade
both controller connections and data connections, she only needs to update
one line in
the "listener.security.protocol.map" config, and avoids possible mistakes.


b. When this KIP is picked up by an operator who is in a hurry without
reading the docs, if she sees a
new listener name for controller is required, and chances are there is
already a list of listeners,
it's possible for her to simply choose an existing listener name, without
explicitly creating
the new CONTROLLER listener and endpoints. If this is done, Kafka will be
run with the existing
behavior, defeating the purpose of this KIP.
In comparison, if she sees a separate endpoint is being asked, I feel it's
unlikely for her to
copy and paste an existing endpoint.

Please let me know your comments.

(2) Good catch, it's a typo, and it's been fixed.

Thanks,
Lucas

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

Thanks for updating the wiki. The updated motivation description looks good
to me. One additional benefit of having a separate port for controller
messages is that we can protect the control plane with something like IP
table.

Reading the proposed change a bit more, I found it is a little weird to add
the endpoints_for_controller section. It seems we only need to add a new
endpoint named CONTROLLER to the endpoints section, and also add the
corresponding protocol mapping to the listener security protocol map
section. In the broker config, the listener-for-controller should just be
listener-name-for-controller which is CONTROLLER.

Another minor thing is that, in the "How can a controller learn about the
dedicated endpoints exposed by brokers" section, you said

For instance, with the sample json payload listed above, if the controller
first determines inter-broker-lister-name to be "INTERNAL", then it knows
to use the host name "host1.example.com", port 9093 and the security
protocol PLAINTEXT to connect to the broker.

should the port be 9092 instead?

Thanks,

Jiangjie (Becket) Qin

On Tue, Aug 14, 2018 at 5:06 AM, Lucas Wang <lu...@gmail.com> wrote:

> @Becket
>
> Makes sense. I've updated the KIP by adding the following paragraph to the
> motivation section
>
> > Today there is no separate between controller requests and regular data
> > plane requests. Specifically (1) a controller in a cluster uses the same
> > advertised endpoints to connect to brokers as what clients and regular
> > brokers use for exchanging data (2) on the broker side, the same network
> > (processor) thread could be multiplexed by handling a controller
> connection
> > and many other data plane connections (3) after a controller request is
> > read from the socket, it is enqueued into the single FIFO requestQueue,
> > which is used for all types of requests (4) request handler threads poll
> > requests from the requestQueue and handles the controller requests with
> the
> > same priority as regular data requests.
> >
> > Because of the multiplexing at every stage of request handling,
> controller
> > requests could be significantly delayed under the following scenarios:
> >
> >    1. The requestQueue is full, and therefore blocks a network
> >    (processor) thread that has read a controller request from the socket.
> >    2. A controller request is enqueued into the requestQueue after a
> >    backlog of data requests, and experiences a long queuing time in the
> >    requestQueue.
> >
> >
> Please let me know if that looks ok or any other change you'd like to make.
> Thanks!
>
> Lucas
>
> On Mon, Aug 13, 2018 at 6:33 AM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Lucas,
> >
> > Thanks for the explanation. It might be a nitpick, but it seems better to
> > mention in the motivation part that today the client requests and
> > controller requests are not only sharing the same queue, but also a bunch
> > of things else, so that we can avoid asking people to read the rejected
> > alternatives.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Aug 10, 2018 at 6:23 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > @Becket,
> > >
> > > I've asked for review by Jun and Joel in the vote thread.
> > > Regarding the separate thread and port, I did talk about it in the
> > rejected
> > > alternative design 1.
> > > Please let me know if you'd like more elaboration or moving it to the
> > > motivation, etc.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Wed, Aug 8, 2018 at 3:59 PM, Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Hi Lucas,
> > > >
> > > > Yes, a separate Jira is OK.
> > > >
> > > > Since the proposal has significantly changed since the initial vote
> > > > started. We probably should let the others who have already voted
> know
> > > and
> > > > ensure they are happy with the updated proposal.
> > > > Also, it seems the motivation part of the KIP wiki is still just
> > talking
> > > > about the separate queue and not fully cover the changes we make now,
> > > e.g.
> > > > separate thread, port, etc. We might want to explain a bit more so
> for
> > > > people who did not follow the discussion mail thread also understand
> > the
> > > > whole proposal.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Aug 8, 2018 at 12:44 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Becket,
> > > > >
> > > > > Thanks for the review. The current write up in the KIP won’t change
> > the
> > > > > ordering behavior. Are you ok with addressing that as a separate
> > > > > independent issue (I’ll create a separate ticket for it)?
> > > > > If so, can you please give me a +1 on the vote thread?
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > > On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
> > > > > >
> > > > > > It might be an implementation detail, but do we still plan to use
> > the
> > > > > > correlation id to ensure the request processing order?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for your review, Dong.
> > > > > > > Ack that these configs will have a bigger impact for users.
> > > > > > >
> > > > > > > On the other hand, I would argue that the request queue
> becoming
> > > full
> > > > > > > may or may not be a rare scenario.
> > > > > > > How often the request queue gets full depends on the request
> > > incoming
> > > > > > rate,
> > > > > > > the request processing rate, and the size of the request queue.
> > > > > > > When that happens, the dedicated endpoints design can better
> > handle
> > > > > > > it than any of the previously discussed options.
> > > > > > >
> > > > > > > Another reason I made the change was that I have the same taste
> > > > > > > as Becket that it's a better separation of the control plane
> from
> > > the
> > > > > > data
> > > > > > > plane.
> > > > > > >
> > > > > > > Finally, I want to clarify that this change is NOT motivated by
> > the
> > > > > > > out-of-order
> > > > > > > processing discussion. The latter problem is orthogonal to this
> > > KIP,
> > > > > and
> > > > > > it
> > > > > > > can happen in any of the design options we discussed for this
> KIP
> > > so
> > > > > far.
> > > > > > > So I'd like to address out-of-order processing separately in
> > > another
> > > > > > > thread,
> > > > > > > and avoid mentioning it in this KIP.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <lindong28@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hey Lucas,
> > > > > > > >
> > > > > > > > Thanks for the update.
> > > > > > > >
> > > > > > > > The current KIP propose new broker configs
> > > > "listeners.for.controller"
> > > > > > and
> > > > > > > > "advertised.listeners.for.controller". This is going to be a
> > big
> > > > > change
> > > > > > > > since listeners are among the most important configs that
> every
> > > > user
> > > > > > > needs
> > > > > > > > to change. According to the rejected alternative section, it
> > > seems
> > > > > that
> > > > > > > the
> > > > > > > > reason to add these two configs is to improve performance
> when
> > > the
> > > > > data
> > > > > > > > request queue is full rather than for correctness. It should
> > be a
> > > > > very
> > > > > > > rare
> > > > > > > > scenario and I am not sure we should add configs for all
> users
> > > just
> > > > > to
> > > > > > > > improve the performance in such rare scenario.
> > > > > > > >
> > > > > > > > Also, if the new design is based on the issues which are
> > > discovered
> > > > > in
> > > > > > > the
> > > > > > > > recent discussion, e.g. out of order processing if we don't
> > use a
> > > > > > > dedicated
> > > > > > > > thread for controller request, it may be useful to explain
> the
> > > > > problem
> > > > > > in
> > > > > > > > the motivation section.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dong
> > > > > > > >
> > > > > > > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > A kind reminder for review of this KIP.
> > > > > > > > >
> > > > > > > > > Thank you very much!
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > I've updated the KIP by adding the dedicated endpoints
> for
> > > > > > controller
> > > > > > > > > > connections,
> > > > > > > > > > and pinning threads for controller requests.
> > > > > > > > > > Also I've updated the title of this KIP. Please take a
> look
> > > and
> > > > > let
> > > > > > > me
> > > > > > > > > > know your feedback.
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Lucas,
> > > > > > > > > >> I agree, if we want to go forward with a separate
> > controller
> > > > > plane
> > > > > > > and
> > > > > > > > > >> data
> > > > > > > > > >> plane and completely isolate them, having a separate
> port
> > > for
> > > > > > > > controller
> > > > > > > > > >> with a separate Acceptor and a Processor sounds ideal to
> > me.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >>
> > > > > > > > > >> Mayuresh
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <
> > > > > becket.qin@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Lucas,
> > > > > > > > > >> >
> > > > > > > > > >> > Yes, I agree that a dedicated end to end control flow
> > > would
> > > > be
> > > > > > > > ideal.
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks,
> > > > > > > > > >> >
> > > > > > > > > >> > Jiangjie (Becket) Qin
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Thanks for the comment, Becket.
> > > > > > > > > >> > > So far, we've been trying to avoid making any
> request
> > > > > handler
> > > > > > > > thread
> > > > > > > > > >> > > special.
> > > > > > > > > >> > > But if we were to follow that path in order to make
> > the
> > > > two
> > > > > > > planes
> > > > > > > > > >> more
> > > > > > > > > >> > > isolated,
> > > > > > > > > >> > > what do you think about also having a dedicated
> > > processor
> > > > > > > thread,
> > > > > > > > > >> > > and dedicated port for the controller?
> > > > > > > > > >> > >
> > > > > > > > > >> > > Today one processor thread can handle multiple
> > > > connections,
> > > > > > > let's
> > > > > > > > > say
> > > > > > > > > >> 100
> > > > > > > > > >> > > connections
> > > > > > > > > >> > >
> > > > > > > > > >> > > represented by connection0, ... connection99, among
> > > which
> > > > > > > > > >> connection0-98
> > > > > > > > > >> > > are from clients, while connection99 is from
> > > > > > > > > >> > >
> > > > > > > > > >> > > the controller. Further let's say after one selector
> > > > > polling,
> > > > > > > > there
> > > > > > > > > >> are
> > > > > > > > > >> > > incoming requests on all connections.
> > > > > > > > > >> > >
> > > > > > > > > >> > > When the request queue is full, (either the data
> > request
> > > > > being
> > > > > > > > full
> > > > > > > > > in
> > > > > > > > > >> > the
> > > > > > > > > >> > > two queue design, or
> > > > > > > > > >> > >
> > > > > > > > > >> > > the one single queue being full in the deque
> design),
> > > the
> > > > > > > > processor
> > > > > > > > > >> > thread
> > > > > > > > > >> > > will be blocked first
> > > > > > > > > >> > >
> > > > > > > > > >> > > when trying to enqueue the data request from
> > > connection0,
> > > > > then
> > > > > > > > > >> possibly
> > > > > > > > > >> > > blocked for the data request
> > > > > > > > > >> > >
> > > > > > > > > >> > > from connection1, ... etc even though the controller
> > > > request
> > > > > > is
> > > > > > > > > ready
> > > > > > > > > >> to
> > > > > > > > > >> > be
> > > > > > > > > >> > > enqueued.
> > > > > > > > > >> > >
> > > > > > > > > >> > > To solve this problem, it seems we would need to
> have
> > a
> > > > > > separate
> > > > > > > > > port
> > > > > > > > > >> > > dedicated to
> > > > > > > > > >> > >
> > > > > > > > > >> > > the controller, a dedicated processor thread, a
> > > dedicated
> > > > > > > > controller
> > > > > > > > > >> > > request queue,
> > > > > > > > > >> > >
> > > > > > > > > >> > > and pinning of one request handler thread for
> > controller
> > > > > > > requests.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks,
> > > > > > > > > >> > > Lucas
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > > > > > > becket.qin@gmail.com
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Personally I am not fond of the dequeue approach
> > > simply
> > > > > > > because
> > > > > > > > it
> > > > > > > > > >> is
> > > > > > > > > >> > > > against the basic idea of isolating the controller
> > > plane
> > > > > and
> > > > > > > > data
> > > > > > > > > >> > plane.
> > > > > > > > > >> > > > With a single dequeue, theoretically speaking the
> > > > > controller
> > > > > > > > > >> requests
> > > > > > > > > >> > can
> > > > > > > > > >> > > > starve the clients requests. I would prefer the
> > > approach
> > > > > > with
> > > > > > > a
> > > > > > > > > >> > separate
> > > > > > > > > >> > > > controller request queue and a dedicated
> controller
> > > > > request
> > > > > > > > > handler
> > > > > > > > > >> > > thread.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Jiangjie (Becket) Qin
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Sure, I can summarize the usage of correlation
> id.
> > > But
> > > > > > > before
> > > > > > > > I
> > > > > > > > > do
> > > > > > > > > >> > > that,
> > > > > > > > > >> > > > it
> > > > > > > > > >> > > > > seems
> > > > > > > > > >> > > > > the same out-of-order processing can also happen
> > to
> > > > > > Produce
> > > > > > > > > >> requests
> > > > > > > > > >> > > sent
> > > > > > > > > >> > > > > by producers,
> > > > > > > > > >> > > > > following the same example you described
> earlier.
> > > > > > > > > >> > > > > If that's the case, I think this probably
> > deserves a
> > > > > > > separate
> > > > > > > > > doc
> > > > > > > > > >> and
> > > > > > > > > >> > > > > design independent of this KIP.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Lucas
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > > > > > > lindong28@gmail.com
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Hey Lucas,
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Could you update the KIP if you are confident
> > with
> > > > the
> > > > > > > > > approach
> > > > > > > > > >> > which
> > > > > > > > > >> > > > > uses
> > > > > > > > > >> > > > > > correlation id? The idea around correlation id
> > is
> > > > kind
> > > > > > of
> > > > > > > > > >> scattered
> > > > > > > > > >> > > > > across
> > > > > > > > > >> > > > > > multiple emails. It will be useful if other
> > > reviews
> > > > > can
> > > > > > > read
> > > > > > > > > the
> > > > > > > > > >> > KIP
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > > > understand the latest proposal.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Thanks,
> > > > > > > > > >> > > > > > Dong
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh
> > Gharat
> > > <
> > > > > > > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > > I like the idea of the dequeue
> implementation
> > by
> > > > > > Lucas.
> > > > > > > > This
> > > > > > > > > >> will
> > > > > > > > > >> > > > help
> > > > > > > > > >> > > > > us
> > > > > > > > > >> > > > > > > avoid additional queue for controller and
> > > > additional
> > > > > > > > configs
> > > > > > > > > >> in
> > > > > > > > > >> > > > Kafka.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Thanks,
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Mayuresh
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > > > > > > >> becket.qin@gmail.com
> > > > > > > > > >> > >
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > Hi Jun,
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > The usage of correlation ID might still be
> > > > useful
> > > > > to
> > > > > > > > > address
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > cases
> > > > > > > > > >> > > > > > > > that the controller epoch and leader epoch
> > > check
> > > > > are
> > > > > > > not
> > > > > > > > > >> > > sufficient
> > > > > > > > > >> > > > > to
> > > > > > > > > >> > > > > > > > guarantee correct behavior. For example,
> if
> > > the
> > > > > > > > controller
> > > > > > > > > >> > sends
> > > > > > > > > >> > > a
> > > > > > > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> > > > > > StopReplicaRequest,
> > > > > > > > and
> > > > > > > > > >> the
> > > > > > > > > >> > > > broker
> > > > > > > > > >> > > > > > > > processes it in the reverse order, the
> > replica
> > > > may
> > > > > > > still
> > > > > > > > > be
> > > > > > > > > >> > > wrongly
> > > > > > > > > >> > > > > > > > recreated, right?
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > Thanks,
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > > > > > > jun@confluent.io
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Hmm, since we already use controller
> epoch
> > > and
> > > > > > > leader
> > > > > > > > > >> epoch
> > > > > > > > > >> > for
> > > > > > > > > >> > > > > > > properly
> > > > > > > > > >> > > > > > > > > caching the latest partition state, do
> we
> > > > really
> > > > > > > need
> > > > > > > > > >> > > correlation
> > > > > > > > > >> > > > > id
> > > > > > > > > >> > > > > > > for
> > > > > > > > > >> > > > > > > > > ordering the controller requests?
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Thanks,
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Jun
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket
> > Qin
> > > <
> > > > > > > > > >> > > > becket.qin@gmail.com>
> > > > > > > > > >> > > > > > > > wrote:
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> Good idea. The correlation id should
> > work.
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> In the ControllerChannelManager, a
> > request
> > > > will
> > > > > > be
> > > > > > > > > resent
> > > > > > > > > >> > > until
> > > > > > > > > >> > > > a
> > > > > > > > > >> > > > > > > > response
> > > > > > > > > >> > > > > > > > >> is received. So if the controller to
> > broker
> > > > > > > > connection
> > > > > > > > > >> > > > disconnects
> > > > > > > > > >> > > > > > > after
> > > > > > > > > >> > > > > > > > >> controller sends R1_a, but before the
> > > > response
> > > > > of
> > > > > > > > R1_a
> > > > > > > > > is
> > > > > > > > > >> > > > > received,
> > > > > > > > > >> > > > > > a
> > > > > > > > > >> > > > > > > > >> disconnection may cause the controller
> to
> > > > > resend
> > > > > > > > R1_b.
> > > > > > > > > >> i.e.
> > > > > > > > > >> > > > until
> > > > > > > > > >> > > > > R1
> > > > > > > > > >> > > > > > > is
> > > > > > > > > >> > > > > > > > >> acked, R2 won't be sent by the
> > controller.
> > > > > > > > > >> > > > > > > > >> This gives two guarantees:
> > > > > > > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b <
> R2.
> > > > > > > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen,
> > R1
> > > > must
> > > > > > > have
> > > > > > > > > been
> > > > > > > > > >> > > > > processed
> > > > > > > > > >> > > > > > at
> > > > > > > > > >> > > > > > > > >> least once.
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> So on the broker side, with a single
> > thread
> > > > > > > > controller
> > > > > > > > > >> > request
> > > > > > > > > >> > > > > > > handler,
> > > > > > > > > >> > > > > > > > the
> > > > > > > > > >> > > > > > > > >> logic should be:
> > > > > > > > > >> > > > > > > > >> 1. Process what ever request seen in
> the
> > > > > > controller
> > > > > > > > > >> request
> > > > > > > > > >> > > > queue
> > > > > > > > > >> > > > > > > > >> 2. For the given epoch, drop request if
> > its
> > > > > > > > correlation
> > > > > > > > > >> id
> > > > > > > > > >> > is
> > > > > > > > > >> > > > > > smaller
> > > > > > > > > >> > > > > > > > than
> > > > > > > > > >> > > > > > > > >> that of the last processed request.
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> Thanks,
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun
> Rao
> > <
> > > > > > > > > >> jun@confluent.io>
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > > > > > >>
> > > > > > > > > >> > > > > > > > >>> I agree that there is no strong
> ordering
> > > > when
> > > > > > > there
> > > > > > > > > are
> > > > > > > > > >> > more
> > > > > > > > > >> > > > than
> > > > > > > > > >> > > > > > one
> > > > > > > > > >> > > > > > > > >>> socket connections. Currently, we rely
> > on
> > > > > > > > > >> controllerEpoch
> > > > > > > > > >> > and
> > > > > > > > > >> > > > > > > > leaderEpoch
> > > > > > > > > >> > > > > > > > >>> to ensure that the receiving broker
> > picks
> > > up
> > > > > the
> > > > > > > > > latest
> > > > > > > > > >> > state
> > > > > > > > > >> > > > for
> > > > > > > > > >> > > > > > > each
> > > > > > > > > >> > > > > > > > >>> partition.
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>> One potential issue with the dequeue
> > > > approach
> > > > > is
> > > > > > > > that
> > > > > > > > > if
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > queue
> > > > > > > > > >> > > > > > is
> > > > > > > > > >> > > > > > > > >> full,
> > > > > > > > > >> > > > > > > > >>> there is no guarantee that the
> > controller
> > > > > > requests
> > > > > > > > > will
> > > > > > > > > >> be
> > > > > > > > > >> > > > > enqueued
> > > > > > > > > >> > > > > > > > >>> quickly.
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>> Thanks,
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>> Jun
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM,
> > Mayuresh
> > > > > > Gharat <
> > > > > > > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > > > > >> > > > > > > > >>>> wrote:
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>>> Yea, the correlationId is only set
> to 0
> > > in
> > > > > the
> > > > > > > > > >> > NetworkClient
> > > > > > > > > >> > > > > > > > >> constructor.
> > > > > > > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient
> > > > between
> > > > > > > > > >> Controller
> > > > > > > > > >> > and
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > > > > >> broker,
> > > > > > > > > >> > > > > > > > >>> a
> > > > > > > > > >> > > > > > > > >>>> disconnection should not cause it to
> > > reset
> > > > to
> > > > > > 0,
> > > > > > > in
> > > > > > > > > >> which
> > > > > > > > > >> > > case
> > > > > > > > > >> > > > > it
> > > > > > > > > >> > > > > > > can
> > > > > > > > > >> > > > > > > > >> be
> > > > > > > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > > > > > > >> > > > > > > > >>>>
> > > > > > > > > >> > > > > > > > >>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>
> > > > > > > > > >> > > > > > > > >>>> Mayuresh
> > > > > > > > > >> > > > > > > > >>>>
> > > > > > > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas
> > > Wang
> > > > <
> > > > > > > > > >> > > > > lucasatucla@gmail.com
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > >>> wrote:
> > > > > > > > > >> > > > > > > > >>>>
> > > > > > > > > >> > > > > > > > >>>>> @Dong,
> > > > > > > > > >> > > > > > > > >>>>> Great example and explanation,
> thanks!
> > > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > > >> > > > > > > > >>>>> @All
> > > > > > > > > >> > > > > > > > >>>>> Regarding the example given by Dong,
> > it
> > > > > seems
> > > > > > > even
> > > > > > > > > if
> > > > > > > > > >> we
> > > > > > > > > >> > > use
> > > > > > > > > >> > > > a
> > > > > > > > > >> > > > > > > queue,
> > > > > > > > > >> > > > > > > > >>>> and a
> > > > > > > > > >> > > > > > > > >>>>> dedicated controller request
> handling
> > > > > thread,
> > > > > > > > > >> > > > > > > > >>>>> the same result can still happen
> > because
> > > > > R1_a
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > >> > sent
> > > > > > > > > >> > > on
> > > > > > > > > >> > > > > one
> > > > > > > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be
> sent
> > > on
> > > > a
> > > > > > > > > different
> > > > > > > > > >> > > > > connection,
> > > > > > > > > >> > > > > > > > >>>>> and there is no ordering between
> > > different
> > > > > > > > > >> connections on
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > > > broker
> > > > > > > > > >> > > > > > > > >>>> side.
> > > > > > > > > >> > > > > > > > >>>>> I was discussing with Mayuresh
> > offline,
> > > > and
> > > > > it
> > > > > > > > seems
> > > > > > > > > >> > > > > correlation
> > > > > > > > > >> > > > > > id
> > > > > > > > > >> > > > > > > > >>>> within
> > > > > > > > > >> > > > > > > > >>>>> the same NetworkClient object is
> > > > > monotonically
> > > > > > > > > >> increasing
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > > never
> > > > > > > > > >> > > > > > > > >>>> reset,
> > > > > > > > > >> > > > > > > > >>>>> hence a broker can leverage that to
> > > > properly
> > > > > > > > reject
> > > > > > > > > >> > > obsolete
> > > > > > > > > >> > > > > > > > >> requests.
> > > > > > > > > >> > > > > > > > >>>>> Thoughts?
> > > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > > >> > > > > > > > >>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>> Lucas
> > > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM,
> > > Mayuresh
> > > > > > > Gharat
> > > > > > > > <
> > > > > > > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is
> reset
> > in
> > > > > case
> > > > > > of
> > > > > > > > > >> > connection
> > > > > > > > > >> > > > > > loss, I
> > > > > > > > > >> > > > > > > > >>>> think.
> > > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > > >> > > > > > > > >>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > > >> > > > > > > > >>>>>> Mayuresh
> > > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM
> > > Mayuresh
> > > > > > > Gharat
> > > > > > > > <
> > > > > > > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>> I agree with Dong that
> out-of-order
> > > > > > processing
> > > > > > > > can
> > > > > > > > > >> > happen
> > > > > > > > > >> > > > > with
> > > > > > > > > >> > > > > > > > >>>> having 2
> > > > > > > > > >> > > > > > > > >>>>>>> separate queues as well and it can
> > > even
> > > > > > happen
> > > > > > > > > >> today.
> > > > > > > > > >> > > > > > > > >>>>>>> Can we use the correlationId in
> the
> > > > > request
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > >> > > > > controller
> > > > > > > > > >> > > > > > > > >> to
> > > > > > > > > >> > > > > > > > >>>> the
> > > > > > > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM
> > Becket
> > > > > Qin <
> > > > > > > > > >> > > > > > becket.qin@gmail.com
> > > > > > > > > >> > > > > > > > >>>
> > > > > > > > > >> > > > > > > > >>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a
> > > > > dedicated
> > > > > > > > > >> controller
> > > > > > > > > >> > > > > request
> > > > > > > > > >> > > > > > > > >>>> handling
> > > > > > > > > >> > > > > > > > >>>>>>>> thread would be a better
> isolation.
> > > It
> > > > > also
> > > > > > > > > solves
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >> reordering
> > > > > > > > > >> > > > > > > > >>>>> issue.
> > > > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM,
> > Joel
> > > > > > Koshy <
> > > > > > > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>> Good example. I think this
> > scenario
> > > > can
> > > > > > > occur
> > > > > > > > in
> > > > > > > > > >> the
> > > > > > > > > >> > > > > current
> > > > > > > > > >> > > > > > > > >>> code
> > > > > > > > > >> > > > > > > > >>>> as
> > > > > > > > > >> > > > > > > > >>>>>>>> well
> > > > > > > > > >> > > > > > > > >>>>>>>>> but with even lower probability
> > > given
> > > > > that
> > > > > > > > there
> > > > > > > > > >> are
> > > > > > > > > >> > > > other
> > > > > > > > > >> > > > > > > > >>>>>>>> non-controller
> > > > > > > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is
> still
> > > > > sketchy
> > > > > > > > though
> > > > > > > > > >> and
> > > > > > > > > >> > I
> > > > > > > > > >> > > > > think
> > > > > > > > > >> > > > > > a
> > > > > > > > > >> > > > > > > > >>>> safer
> > > > > > > > > >> > > > > > > > >>>>>>>>> approach would be separate
> queues
> > > and
> > > > > > > pinning
> > > > > > > > > >> > > controller
> > > > > > > > > >> > > > > > > > >> request
> > > > > > > > > >> > > > > > > > >>>>>>>> handling
> > > > > > > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12
> PM,
> > > Dong
> > > > > > Lin <
> > > > > > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > > > > > >> > > > > > > > >>>>
> > > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> I think you are right that
> there
> > > may
> > > > be
> > > > > > > > > >> out-of-order
> > > > > > > > > >> > > > > > > > >>> processing.
> > > > > > > > > >> > > > > > > > >>>>>>>> However,
> > > > > > > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order
> > > processing
> > > > > may
> > > > > > > > also
> > > > > > > > > >> > happen
> > > > > > > > > >> > > > even
> > > > > > > > > >> > > > > > > > >> if
> > > > > > > > > >> > > > > > > > >>> we
> > > > > > > > > >> > > > > > > > >>>>>> use a
> > > > > > > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> > > > > > disconnected
> > > > > > > > > before
> > > > > > > > > >> > > > > receiving
> > > > > > > > > >> > > > > > > > >>>>>> response.
> > > > > > > > > >> > > > > > > > >>>>>>>>> Then
> > > > > > > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2.
> Both
> > > > > requests
> > > > > > > now
> > > > > > > > > >> stay
> > > > > > > > > >> > in
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>> request queue in the order they
> > are
> > > > > sent.
> > > > > > > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the
> > > request
> > > > > > queue
> > > > > > > > and
> > > > > > > > > >> then
> > > > > > > > > >> > > > > thread2
> > > > > > > > > >> > > > > > > > >>>> takes
> > > > > > > > > >> > > > > > > > >>>>>> R2
> > > > > > > > > >> > > > > > > > >>>>>>>>> from
> > > > > > > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the
> > > same
> > > > > > time.
> > > > > > > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed
> in
> > > > > > parallel.
> > > > > > > > > >> There is
> > > > > > > > > >> > > > > chance
> > > > > > > > > >> > > > > > > > >>> that
> > > > > > > > > >> > > > > > > > >>>>>> R2's
> > > > > > > > > >> > > > > > > > >>>>>>>>>> processing is completed before
> > R1.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can
> > > happen
> > > > > for
> > > > > > > > both
> > > > > > > > > >> > > > approaches
> > > > > > > > > >> > > > > > > > >> with
> > > > > > > > > >> > > > > > > > >>>>> very
> > > > > > > > > >> > > > > > > > >>>>>>>> low
> > > > > > > > > >> > > > > > > > >>>>>>>>>> probability, it may not be
> > > worthwhile
> > > > > to
> > > > > > > add
> > > > > > > > > the
> > > > > > > > > >> > extra
> > > > > > > > > >> > > > > > > > >> queue.
> > > > > > > > > >> > > > > > > > >>>> What
> > > > > > > > > >> > > > > > > > >>>>>> do
> > > > > > > > > >> > > > > > > > >>>>>>>> you
> > > > > > > > > >> > > > > > > > >>>>>>>>>> think?
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17
> PM,
> > > > Becket
> > > > > > > Qin <
> > > > > > > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a
> > > > dequeue
> > > > > > was
> > > > > > > > > >> bright
> > > > > > > > > >> > up
> > > > > > > > > >> > > > some
> > > > > > > > > >> > > > > > > > >>> time
> > > > > > > > > >> > > > > > > > >>>>> ago
> > > > > > > > > >> > > > > > > > >>>>>>>> when
> > > > > > > > > >> > > > > > > > >>>>>>>>>> we
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> initially thinking of
> > prioritizing
> > > > the
> > > > > > > > > request.
> > > > > > > > > >> The
> > > > > > > > > >> > > > > > > > >> concern
> > > > > > > > > >> > > > > > > > >>>> was
> > > > > > > > > >> > > > > > > > >>>>>> that
> > > > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> controller requests are
> supposed
> > > to
> > > > be
> > > > > > > > > >> processed in
> > > > > > > > > >> > > > > order.
> > > > > > > > > >> > > > > > > > >>> If
> > > > > > > > > >> > > > > > > > >>>> we
> > > > > > > > > >> > > > > > > > >>>>>> can
> > > > > > > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> that there is one controller
> > > request
> > > > > in
> > > > > > > the
> > > > > > > > > >> request
> > > > > > > > > >> > > > > > > > >> channel,
> > > > > > > > > >> > > > > > > > >>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>> order
> > > > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases
> that
> > > > there
> > > > > > are
> > > > > > > > > more
> > > > > > > > > >> > than
> > > > > > > > > >> > > > one
> > > > > > > > > >> > > > > > > > >>>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>> request
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the
> > > > > controller
> > > > > > > > > request
> > > > > > > > > >> > order
> > > > > > > > > >> > > > may
> > > > > > > > > >> > > > > > > > >>>> change
> > > > > > > > > >> > > > > > > > >>>>>> and
> > > > > > > > > >> > > > > > > > >>>>>>>>>> cause
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think
> > about
> > > > the
> > > > > > > > > following
> > > > > > > > > >> > > > sequence:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully
> sent
> > a
> > > > > > request
> > > > > > > R1
> > > > > > > > > to
> > > > > > > > > >> > > broker
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put
> > the
> > > > > > request
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > >> > head
> > > > > > > > > >> > > > of
> > > > > > > > > >> > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker
> > connection
> > > > > > failed
> > > > > > > > and
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > > > > > > >> > > > > > > > >>>>>>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request
> R2
> > > to
> > > > > the
> > > > > > > > broker
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add
> it
> > > to
> > > > > the
> > > > > > > head
> > > > > > > > > of
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >> request
> > > > > > > > > >> > > > > > > > >>>>> queue.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2
> will
> > be
> > > > > > > processed
> > > > > > > > > >> before
> > > > > > > > > >> > > R1
> > > > > > > > > >> > > > is
> > > > > > > > > >> > > > > > > > >>>>>> processed,
> > > > > > > > > >> > > > > > > > >>>>>>>>>> which
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23
> AM,
> > > > Joel
> > > > > > > Koshy
> > > > > > > > <
> > > > > > > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea.
> > It
> > > > > > appears
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > >> a
> > > > > > > > > >> > > > simpler
> > > > > > > > > >> > > > > > > > >>>> less
> > > > > > > > > >> > > > > > > > >>>>>>>>> invasive
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should
> work.
> > > > > > > > > >> Jun/Becket/others,
> > > > > > > > > >> > > do
> > > > > > > > > >> > > > > > > > >> you
> > > > > > > > > >> > > > > > > > >>>> see
> > > > > > > > > >> > > > > > > > >>>>>> any
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03
> > PM,
> > > > > Lucas
> > > > > > > > Wang
> > > > > > > > > <
> > > > > > > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting
> idea
> > > > that
> > > > > I
> > > > > > > > > haven't
> > > > > > > > > >> > > thought
> > > > > > > > > >> > > > > > > > >>>>> before.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our
> problem
> > at
> > > > > hand
> > > > > > > > pretty
> > > > > > > > > >> > well,
> > > > > > > > > >> > > > and
> > > > > > > > > >> > > > > > > > >>>> also
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a
> new
> > > size
> > > > > > > metric
> > > > > > > > > and
> > > > > > > > > >> > > > capacity
> > > > > > > > > >> > > > > > > > >>>>> config
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request
> > > queue.
> > > > In
> > > > > > > fact,
> > > > > > > > > if
> > > > > > > > > >> we
> > > > > > > > > >> > > were
> > > > > > > > > >> > > > > > > > >> to
> > > > > > > > > >> > > > > > > > >>>>> adopt
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no
> > public
> > > > > > > interface
> > > > > > > > > >> change,
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > > > > >> we
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it
> > > seems
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> the java class
> > > LinkedBlockingQueue
> > > > > can
> > > > > > > > > readily
> > > > > > > > > >> > > > satisfy
> > > > > > > > > >> > > > > > > > >>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity,
> and
> > > also
> > > > > > > > allowing
> > > > > > > > > >> > > inserting
> > > > > > > > > >> > > > > > > > >> at
> > > > > > > > > >> > > > > > > > >>>>> both
> > > > > > > > > >> > > > > > > > >>>>>>>> ends.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this
> > > > design
> > > > > is
> > > > > > > > tied
> > > > > > > > > to
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > > > > >>>>> coincidence
> > > > > > > > > >> > > > > > > > >>>>>>>> that
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> we have two request
> priorities
> > > and
> > > > > > there
> > > > > > > > are
> > > > > > > > > >> two
> > > > > > > > > >> > > ends
> > > > > > > > > >> > > > > > > > >>> to a
> > > > > > > > > >> > > > > > > > >>>>>>>> deque.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed
> > > > design,
> > > > > it
> > > > > > > > seems
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >> network
> > > > > > > > > >> > > > > > > > >>>>> layer
> > > > > > > > > >> > > > > > > > >>>>>>>> is
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with
> > upper
> > > > > layer
> > > > > > > > logic,
> > > > > > > > > >> e.g.
> > > > > > > > > >> > > if
> > > > > > > > > >> > > > > > > > >> we
> > > > > > > > > >> > > > > > > > >>>> were
> > > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>> add
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in
> the
> > > > > future
> > > > > > > for
> > > > > > > > > some
> > > > > > > > > >> > > > reason,
> > > > > > > > > >> > > > > > > > >>> we
> > > > > > > > > >> > > > > > > > >>>>>> would
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the
> design
> > of
> > > > > > > separate
> > > > > > > > > >> queues,
> > > > > > > > > >> > > one
> > > > > > > > > >> > > > > > > > >>> for
> > > > > > > > > >> > > > > > > > >>>>> each
> > > > > > > > > >> > > > > > > > >>>>>>>>>> priority
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both
> > > > designs
> > > > > > and
> > > > > > > > > lean
> > > > > > > > > >> > > toward
> > > > > > > > > >> > > > > > > > >>> your
> > > > > > > > > >> > > > > > > > >>>>>>>> suggested
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others
> think.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's
> > suggested
> > > > new
> > > > > > > > design,
> > > > > > > > > >> I'm
> > > > > > > > > >> > > > > > > > >>> answering
> > > > > > > > > >> > > > > > > > >>>>>> your
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> question
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I
> > > think
> > > > > > your
> > > > > > > > > >> > suggestion
> > > > > > > > > >> > > > > > > > >> makes
> > > > > > > > > >> > > > > > > > >>>>>> sense,
> > > > > > > > > >> > > > > > > > >>>>>>>> and
> > > > > > > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity
> > > config
> > > > > and
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default
> > > value
> > > > of
> > > > > > 20
> > > > > > > > > being
> > > > > > > > > >> > > > > > > > >> sufficient
> > > > > > > > > >> > > > > > > > >>>>>> enough.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57
> > AM,
> > > > > > > Mayuresh
> > > > > > > > > >> Gharat
> > > > > > > > > >> > <
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent
> > here
> > > > is
> > > > > to
> > > > > > > > > >> prioritize
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> request
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change
> > the
> > > > > > request
> > > > > > > > > queue
> > > > > > > > > >> > to a
> > > > > > > > > >> > > > > > > > >>>>> dequeue,
> > > > > > > > > >> > > > > > > > >>>>>>>> where
> > > > > > > > > >> > > > > > > > >>>>>>>>>> you
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal
> > > requests
> > > > > > > > (produce,
> > > > > > > > > >> > > > > > > > >>>> consume,..etc)
> > > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>> end
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a
> > > > > controller
> > > > > > > > > request,
> > > > > > > > > >> > you
> > > > > > > > > >> > > > > > > > >>> insert
> > > > > > > > > >> > > > > > > > >>>>> it
> > > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> head
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures
> that
> > > the
> > > > > > > > controller
> > > > > > > > > >> > > request
> > > > > > > > > >> > > > > > > > >>> will
> > > > > > > > > >> > > > > > > > >>>>> be
> > > > > > > > > >> > > > > > > > >>>>>>>> given
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other
> requests.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one
> > > > request
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > >> > > socket
> > > > > > > > > >> > > > > > > > >>> and
> > > > > > > > > >> > > > > > > > >>>>> mute
> > > > > > > > > >> > > > > > > > >>>>>>>> it
> > > > > > > > > >> > > > > > > > >>>>>>>>> and
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling
> the
> > > > > request,
> > > > > > > > this
> > > > > > > > > >> would
> > > > > > > > > >> > > > > > > > >>> ensure
> > > > > > > > > >> > > > > > > > >>>>> that
> > > > > > > > > >> > > > > > > > >>>>>>>> we
> > > > > > > > > >> > > > > > > > >>>>>>>>>> don't
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests
> > out
> > > of
> > > > > > > order.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can
> > avoid
> > > > the
> > > > > > > > second
> > > > > > > > > >> queue
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > > > > >>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>> additional
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at
> 3:05
> > AM
> > > > > > Becket
> > > > > > > > Qin
> > > > > > > > > <
> > > > > > > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail
> > > > explanation.
> > > > > I
> > > > > > > > agree
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >>> current
> > > > > > > > > >> > > > > > > > >>>>>> design
> > > > > > > > > >> > > > > > > > >>>>>>>>>> makes
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about
> > whether
> > > > the
> > > > > > new
> > > > > > > > > config
> > > > > > > > > >> > for
> > > > > > > > > >> > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I
> > > cannot
> > > > > > think
> > > > > > > > of a
> > > > > > > > > >> case
> > > > > > > > > >> > > in
> > > > > > > > > >> > > > > > > > >>>> which
> > > > > > > > > >> > > > > > > > >>>>>>>> users
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> would
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at
> 6:00
> > > PM,
> > > > > > > Becket
> > > > > > > > > Qin
> > > > > > > > > >> <
> > > > > > > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can
> be
> > > > > > rephrased
> > > > > > > to
> > > > > > > > > >> "do we
> > > > > > > > > >> > > > > > > > >>>> expect
> > > > > > > > > >> > > > > > > > >>>>>>>> user to
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request
> > queue
> > > > > > > capacity"?
> > > > > > > > > If
> > > > > > > > > >> we
> > > > > > > > > >> > > > > > > > >>> agree
> > > > > > > > > >> > > > > > > > >>>>> that
> > > > > > > > > >> > > > > > > > >>>>>>>> 20
> > > > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default
> > number
> > > > and
> > > > > we
> > > > > > > do
> > > > > > > > > not
> > > > > > > > > >> > > > > > > > >> expect
> > > > > > > > > >> > > > > > > > >>>> user
> > > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>>>> change
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose
> > > this
> > > > > as a
> > > > > > > > > config?
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at
> > 2:29
> > > > AM,
> > > > > > > Lucas
> > > > > > > > > >> Wang <
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the
> comment.
> > > You
> > > > > are
> > > > > > > > right
> > > > > > > > > >> that
> > > > > > > > > >> > > > > > > > >>>>> normally
> > > > > > > > > >> > > > > > > > >>>>>>>> there
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request
> > > because
> > > > > of
> > > > > > > > > muting,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended
> to
> > > say
> > > > > > there
> > > > > > > > > would
> > > > > > > > > >> be
> > > > > > > > > >> > > > > > > > >> many
> > > > > > > > > >> > > > > > > > >>>>>>>> enqueued
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP
> > > again,
> > > > > and
> > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > >> > sure
> > > > > > > > > >> > > > > > > > >>>> which
> > > > > > > > > >> > > > > > > > >>>>>> part
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise
> if
> > > you
> > > > > > point
> > > > > > > it
> > > > > > > > > out
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > > > > >>>>> section.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not
> > > happen
> > > > > in
> > > > > > > > normal
> > > > > > > > > >> > > > > > > > >>>> conditions,
> > > > > > > > > >> > > > > > > > >>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>> current
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude
> multiple
> > > > > > > controllers
> > > > > > > > > >> > running
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence
> if
> > > we
> > > > > > don't
> > > > > > > > have
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make
> its
> > > > > > capacity
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > >> 1,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling
> > > > > requests
> > > > > > > from
> > > > > > > > > >> > > > > > > > >> different
> > > > > > > > > >> > > > > > > > >>>>>>>> controllers
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> will
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those
> > > > troublesome
> > > > > > > > times,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not
> what
> > > we
> > > > > > want.
> > > > > > > On
> > > > > > > > > the
> > > > > > > > > >> > > > > > > > >> other
> > > > > > > > > >> > > > > > > > >>>>> hand,
> > > > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default
> > value,
> > > > say
> > > > > > 20,
> > > > > > > > > >> guards
> > > > > > > > > >> > us
> > > > > > > > > >> > > > > > > > >>> from
> > > > > > > > > >> > > > > > > > >>>>>>>> issues
> > > > > > > > > >> > > > > > > > >>>>>>>>> in
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and
> IMO
> > > > there
> > > > > > > isn't
> > > > > > > > > >> much
> > > > > > > > > >> > > > > > > > >>>> downside
> > > > > > > > > >> > > > > > > > >>>>> of
> > > > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this
> sentence
> > is
> > > > an
> > > > > > > > obsolete
> > > > > > > > > >> > > > > > > > >>> statement
> > > > > > > > > >> > > > > > > > >>>>>> based
> > > > > > > > > >> > > > > > > > >>>>>>>> on
> > > > > > > > > >> > > > > > > > >>>>>>>>> a
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the
> > > > wording
> > > > > > in
> > > > > > > > the
> > > > > > > > > >> KIP.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> > > 10:33
> > > > > AM,
> > > > > > > > > Mayuresh
> > > > > > > > > >> > > > > > > > >>> Gharat <
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > gharatmayuresh15@gmail.com>
> > > > > > wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to
> understand
> > > why
> > > > > you
> > > > > > > > think
> > > > > > > > > >> "The
> > > > > > > > > >> > > > > > > > >>>> memory
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number
> of
> > > > > queued
> > > > > > > > > requests
> > > > > > > > > >> > can
> > > > > > > > > >> > > > > > > > >>> go
> > > > > > > > > >> > > > > > > > >>>> up
> > > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > > >> > > > > > > > >>>>>>>> 2x"
> > > > > > > > > >> > > > > > > > >>>>>>>>>> in
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the
> > > > requests
> > > > > > from
> > > > > > > > > >> > > > > > > > >> controller
> > > > > > > > > >> > > > > > > > >>>> to a
> > > > > > > > > >> > > > > > > > >>>>>>>> Broker
> > > > > > > > > >> > > > > > > > >>>>>>>>>> are
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> > > 5:06
> > > > AM
> > > > > > > > Becket
> > > > > > > > > >> Qin <
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP,
> > Lucas.
> > > > > > > > Separating
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > > > >>>> control
> > > > > > > > > >> > > > > > > > >>>>>>>> plane
> > > > > > > > > >> > > > > > > > >>>>>>>>>> from
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you
> mentioned
> > > > that
> > > > > > the
> > > > > > > > > >> > > > > > > > >> controller
> > > > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> may
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will
> > this
> > > > be a
> > > > > > > > common
> > > > > > > > > >> case?
> > > > > > > > > >> > > > > > > > >>> The
> > > > > > > > > >> > > > > > > > >>>>>>>>> controller
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the
> > > > SocketServer.
> > > > > > The
> > > > > > > > > >> > > > > > > > >>> SocketServer
> > > > > > > > > >> > > > > > > > >>>>>> will
> > > > > > > > > >> > > > > > > > >>>>>>>>> mute
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and
> > put
> > > > into
> > > > > > the
> > > > > > > > > >> request
> > > > > > > > > >> > > > > > > > >>>>> channel.
> > > > > > > > > >> > > > > > > > >>>>>>>> So
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection
> > > between
> > > > > > > > controller
> > > > > > > > > >> and
> > > > > > > > > >> > > > > > > > >>> each
> > > > > > > > > >> > > > > > > > >>>>>>>> broker,
> > > > > > > > > >> > > > > > > > >>>>>>>>> on
> > > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > > >> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Becket

Makes sense. I've updated the KIP by adding the following paragraph to the
motivation section

> Today there is no separate between controller requests and regular data
> plane requests. Specifically (1) a controller in a cluster uses the same
> advertised endpoints to connect to brokers as what clients and regular
> brokers use for exchanging data (2) on the broker side, the same network
> (processor) thread could be multiplexed by handling a controller connection
> and many other data plane connections (3) after a controller request is
> read from the socket, it is enqueued into the single FIFO requestQueue,
> which is used for all types of requests (4) request handler threads poll
> requests from the requestQueue and handles the controller requests with the
> same priority as regular data requests.
>
> Because of the multiplexing at every stage of request handling, controller
> requests could be significantly delayed under the following scenarios:
>
>    1. The requestQueue is full, and therefore blocks a network
>    (processor) thread that has read a controller request from the socket.
>    2. A controller request is enqueued into the requestQueue after a
>    backlog of data requests, and experiences a long queuing time in the
>    requestQueue.
>
>
Please let me know if that looks ok or any other change you'd like to make.
Thanks!

Lucas

On Mon, Aug 13, 2018 at 6:33 AM, Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> Thanks for the explanation. It might be a nitpick, but it seems better to
> mention in the motivation part that today the client requests and
> controller requests are not only sharing the same queue, but also a bunch
> of things else, so that we can avoid asking people to read the rejected
> alternatives.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
>
>
> On Fri, Aug 10, 2018 at 6:23 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > @Becket,
> >
> > I've asked for review by Jun and Joel in the vote thread.
> > Regarding the separate thread and port, I did talk about it in the
> rejected
> > alternative design 1.
> > Please let me know if you'd like more elaboration or moving it to the
> > motivation, etc.
> >
> > Thanks,
> > Lucas
> >
> > On Wed, Aug 8, 2018 at 3:59 PM, Becket Qin <be...@gmail.com> wrote:
> >
> > > Hi Lucas,
> > >
> > > Yes, a separate Jira is OK.
> > >
> > > Since the proposal has significantly changed since the initial vote
> > > started. We probably should let the others who have already voted know
> > and
> > > ensure they are happy with the updated proposal.
> > > Also, it seems the motivation part of the KIP wiki is still just
> talking
> > > about the separate queue and not fully cover the changes we make now,
> > e.g.
> > > separate thread, port, etc. We might want to explain a bit more so for
> > > people who did not follow the discussion mail thread also understand
> the
> > > whole proposal.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Aug 8, 2018 at 12:44 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Becket,
> > > >
> > > > Thanks for the review. The current write up in the KIP won’t change
> the
> > > > ordering behavior. Are you ok with addressing that as a separate
> > > > independent issue (I’ll create a separate ticket for it)?
> > > > If so, can you please give me a +1 on the vote thread?
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
> > > > >
> > > > > It might be an implementation detail, but do we still plan to use
> the
> > > > > correlation id to ensure the request processing order?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Thanks for your review, Dong.
> > > > > > Ack that these configs will have a bigger impact for users.
> > > > > >
> > > > > > On the other hand, I would argue that the request queue becoming
> > full
> > > > > > may or may not be a rare scenario.
> > > > > > How often the request queue gets full depends on the request
> > incoming
> > > > > rate,
> > > > > > the request processing rate, and the size of the request queue.
> > > > > > When that happens, the dedicated endpoints design can better
> handle
> > > > > > it than any of the previously discussed options.
> > > > > >
> > > > > > Another reason I made the change was that I have the same taste
> > > > > > as Becket that it's a better separation of the control plane from
> > the
> > > > > data
> > > > > > plane.
> > > > > >
> > > > > > Finally, I want to clarify that this change is NOT motivated by
> the
> > > > > > out-of-order
> > > > > > processing discussion. The latter problem is orthogonal to this
> > KIP,
> > > > and
> > > > > it
> > > > > > can happen in any of the design options we discussed for this KIP
> > so
> > > > far.
> > > > > > So I'd like to address out-of-order processing separately in
> > another
> > > > > > thread,
> > > > > > and avoid mentioning it in this KIP.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hey Lucas,
> > > > > > >
> > > > > > > Thanks for the update.
> > > > > > >
> > > > > > > The current KIP propose new broker configs
> > > "listeners.for.controller"
> > > > > and
> > > > > > > "advertised.listeners.for.controller". This is going to be a
> big
> > > > change
> > > > > > > since listeners are among the most important configs that every
> > > user
> > > > > > needs
> > > > > > > to change. According to the rejected alternative section, it
> > seems
> > > > that
> > > > > > the
> > > > > > > reason to add these two configs is to improve performance when
> > the
> > > > data
> > > > > > > request queue is full rather than for correctness. It should
> be a
> > > > very
> > > > > > rare
> > > > > > > scenario and I am not sure we should add configs for all users
> > just
> > > > to
> > > > > > > improve the performance in such rare scenario.
> > > > > > >
> > > > > > > Also, if the new design is based on the issues which are
> > discovered
> > > > in
> > > > > > the
> > > > > > > recent discussion, e.g. out of order processing if we don't
> use a
> > > > > > dedicated
> > > > > > > thread for controller request, it may be useful to explain the
> > > > problem
> > > > > in
> > > > > > > the motivation section.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dong
> > > > > > >
> > > > > > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > A kind reminder for review of this KIP.
> > > > > > > >
> > > > > > > > Thank you very much!
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I've updated the KIP by adding the dedicated endpoints for
> > > > > controller
> > > > > > > > > connections,
> > > > > > > > > and pinning threads for controller requests.
> > > > > > > > > Also I've updated the title of this KIP. Please take a look
> > and
> > > > let
> > > > > > me
> > > > > > > > > know your feedback.
> > > > > > > > >
> > > > > > > > > Thanks a lot for your time!
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > >> Hi Lucas,
> > > > > > > > >> I agree, if we want to go forward with a separate
> controller
> > > > plane
> > > > > > and
> > > > > > > > >> data
> > > > > > > > >> plane and completely isolate them, having a separate port
> > for
> > > > > > > controller
> > > > > > > > >> with a separate Acceptor and a Processor sounds ideal to
> me.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Mayuresh
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <
> > > > becket.qin@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Lucas,
> > > > > > > > >> >
> > > > > > > > >> > Yes, I agree that a dedicated end to end control flow
> > would
> > > be
> > > > > > > ideal.
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> >
> > > > > > > > >> > Jiangjie (Becket) Qin
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Thanks for the comment, Becket.
> > > > > > > > >> > > So far, we've been trying to avoid making any request
> > > > handler
> > > > > > > thread
> > > > > > > > >> > > special.
> > > > > > > > >> > > But if we were to follow that path in order to make
> the
> > > two
> > > > > > planes
> > > > > > > > >> more
> > > > > > > > >> > > isolated,
> > > > > > > > >> > > what do you think about also having a dedicated
> > processor
> > > > > > thread,
> > > > > > > > >> > > and dedicated port for the controller?
> > > > > > > > >> > >
> > > > > > > > >> > > Today one processor thread can handle multiple
> > > connections,
> > > > > > let's
> > > > > > > > say
> > > > > > > > >> 100
> > > > > > > > >> > > connections
> > > > > > > > >> > >
> > > > > > > > >> > > represented by connection0, ... connection99, among
> > which
> > > > > > > > >> connection0-98
> > > > > > > > >> > > are from clients, while connection99 is from
> > > > > > > > >> > >
> > > > > > > > >> > > the controller. Further let's say after one selector
> > > > polling,
> > > > > > > there
> > > > > > > > >> are
> > > > > > > > >> > > incoming requests on all connections.
> > > > > > > > >> > >
> > > > > > > > >> > > When the request queue is full, (either the data
> request
> > > > being
> > > > > > > full
> > > > > > > > in
> > > > > > > > >> > the
> > > > > > > > >> > > two queue design, or
> > > > > > > > >> > >
> > > > > > > > >> > > the one single queue being full in the deque design),
> > the
> > > > > > > processor
> > > > > > > > >> > thread
> > > > > > > > >> > > will be blocked first
> > > > > > > > >> > >
> > > > > > > > >> > > when trying to enqueue the data request from
> > connection0,
> > > > then
> > > > > > > > >> possibly
> > > > > > > > >> > > blocked for the data request
> > > > > > > > >> > >
> > > > > > > > >> > > from connection1, ... etc even though the controller
> > > request
> > > > > is
> > > > > > > > ready
> > > > > > > > >> to
> > > > > > > > >> > be
> > > > > > > > >> > > enqueued.
> > > > > > > > >> > >
> > > > > > > > >> > > To solve this problem, it seems we would need to have
> a
> > > > > separate
> > > > > > > > port
> > > > > > > > >> > > dedicated to
> > > > > > > > >> > >
> > > > > > > > >> > > the controller, a dedicated processor thread, a
> > dedicated
> > > > > > > controller
> > > > > > > > >> > > request queue,
> > > > > > > > >> > >
> > > > > > > > >> > > and pinning of one request handler thread for
> controller
> > > > > > requests.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > > Lucas
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > > > > > becket.qin@gmail.com
> > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Personally I am not fond of the dequeue approach
> > simply
> > > > > > because
> > > > > > > it
> > > > > > > > >> is
> > > > > > > > >> > > > against the basic idea of isolating the controller
> > plane
> > > > and
> > > > > > > data
> > > > > > > > >> > plane.
> > > > > > > > >> > > > With a single dequeue, theoretically speaking the
> > > > controller
> > > > > > > > >> requests
> > > > > > > > >> > can
> > > > > > > > >> > > > starve the clients requests. I would prefer the
> > approach
> > > > > with
> > > > > > a
> > > > > > > > >> > separate
> > > > > > > > >> > > > controller request queue and a dedicated controller
> > > > request
> > > > > > > > handler
> > > > > > > > >> > > thread.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Jiangjie (Becket) Qin
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Sure, I can summarize the usage of correlation id.
> > But
> > > > > > before
> > > > > > > I
> > > > > > > > do
> > > > > > > > >> > > that,
> > > > > > > > >> > > > it
> > > > > > > > >> > > > > seems
> > > > > > > > >> > > > > the same out-of-order processing can also happen
> to
> > > > > Produce
> > > > > > > > >> requests
> > > > > > > > >> > > sent
> > > > > > > > >> > > > > by producers,
> > > > > > > > >> > > > > following the same example you described earlier.
> > > > > > > > >> > > > > If that's the case, I think this probably
> deserves a
> > > > > > separate
> > > > > > > > doc
> > > > > > > > >> and
> > > > > > > > >> > > > > design independent of this KIP.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Lucas
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > > > > > lindong28@gmail.com
> > > > > > > > >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > Hey Lucas,
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Could you update the KIP if you are confident
> with
> > > the
> > > > > > > > approach
> > > > > > > > >> > which
> > > > > > > > >> > > > > uses
> > > > > > > > >> > > > > > correlation id? The idea around correlation id
> is
> > > kind
> > > > > of
> > > > > > > > >> scattered
> > > > > > > > >> > > > > across
> > > > > > > > >> > > > > > multiple emails. It will be useful if other
> > reviews
> > > > can
> > > > > > read
> > > > > > > > the
> > > > > > > > >> > KIP
> > > > > > > > >> > > to
> > > > > > > > >> > > > > > understand the latest proposal.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Thanks,
> > > > > > > > >> > > > > > Dong
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh
> Gharat
> > <
> > > > > > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > > I like the idea of the dequeue implementation
> by
> > > > > Lucas.
> > > > > > > This
> > > > > > > > >> will
> > > > > > > > >> > > > help
> > > > > > > > >> > > > > us
> > > > > > > > >> > > > > > > avoid additional queue for controller and
> > > additional
> > > > > > > configs
> > > > > > > > >> in
> > > > > > > > >> > > > Kafka.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Thanks,
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Mayuresh
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > > > > > >> becket.qin@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > > Hi Jun,
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > The usage of correlation ID might still be
> > > useful
> > > > to
> > > > > > > > address
> > > > > > > > >> > the
> > > > > > > > >> > > > > cases
> > > > > > > > >> > > > > > > > that the controller epoch and leader epoch
> > check
> > > > are
> > > > > > not
> > > > > > > > >> > > sufficient
> > > > > > > > >> > > > > to
> > > > > > > > >> > > > > > > > guarantee correct behavior. For example, if
> > the
> > > > > > > controller
> > > > > > > > >> > sends
> > > > > > > > >> > > a
> > > > > > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> > > > > StopReplicaRequest,
> > > > > > > and
> > > > > > > > >> the
> > > > > > > > >> > > > broker
> > > > > > > > >> > > > > > > > processes it in the reverse order, the
> replica
> > > may
> > > > > > still
> > > > > > > > be
> > > > > > > > >> > > wrongly
> > > > > > > > >> > > > > > > > recreated, right?
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Thanks,
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > > > > > jun@confluent.io
> > > > > > > > >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Hmm, since we already use controller epoch
> > and
> > > > > > leader
> > > > > > > > >> epoch
> > > > > > > > >> > for
> > > > > > > > >> > > > > > > properly
> > > > > > > > >> > > > > > > > > caching the latest partition state, do we
> > > really
> > > > > > need
> > > > > > > > >> > > correlation
> > > > > > > > >> > > > > id
> > > > > > > > >> > > > > > > for
> > > > > > > > >> > > > > > > > > ordering the controller requests?
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Thanks,
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Jun
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket
> Qin
> > <
> > > > > > > > >> > > > becket.qin@gmail.com>
> > > > > > > > >> > > > > > > > wrote:
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> Good idea. The correlation id should
> work.
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> In the ControllerChannelManager, a
> request
> > > will
> > > > > be
> > > > > > > > resent
> > > > > > > > >> > > until
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > > > > response
> > > > > > > > >> > > > > > > > >> is received. So if the controller to
> broker
> > > > > > > connection
> > > > > > > > >> > > > disconnects
> > > > > > > > >> > > > > > > after
> > > > > > > > >> > > > > > > > >> controller sends R1_a, but before the
> > > response
> > > > of
> > > > > > > R1_a
> > > > > > > > is
> > > > > > > > >> > > > > received,
> > > > > > > > >> > > > > > a
> > > > > > > > >> > > > > > > > >> disconnection may cause the controller to
> > > > resend
> > > > > > > R1_b.
> > > > > > > > >> i.e.
> > > > > > > > >> > > > until
> > > > > > > > >> > > > > R1
> > > > > > > > >> > > > > > > is
> > > > > > > > >> > > > > > > > >> acked, R2 won't be sent by the
> controller.
> > > > > > > > >> > > > > > > > >> This gives two guarantees:
> > > > > > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen,
> R1
> > > must
> > > > > > have
> > > > > > > > been
> > > > > > > > >> > > > > processed
> > > > > > > > >> > > > > > at
> > > > > > > > >> > > > > > > > >> least once.
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> So on the broker side, with a single
> thread
> > > > > > > controller
> > > > > > > > >> > request
> > > > > > > > >> > > > > > > handler,
> > > > > > > > >> > > > > > > > the
> > > > > > > > >> > > > > > > > >> logic should be:
> > > > > > > > >> > > > > > > > >> 1. Process what ever request seen in the
> > > > > controller
> > > > > > > > >> request
> > > > > > > > >> > > > queue
> > > > > > > > >> > > > > > > > >> 2. For the given epoch, drop request if
> its
> > > > > > > correlation
> > > > > > > > >> id
> > > > > > > > >> > is
> > > > > > > > >> > > > > > smaller
> > > > > > > > >> > > > > > > > than
> > > > > > > > >> > > > > > > > >> that of the last processed request.
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> Thanks,
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao
> <
> > > > > > > > >> jun@confluent.io>
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > > > > >>
> > > > > > > > >> > > > > > > > >>> I agree that there is no strong ordering
> > > when
> > > > > > there
> > > > > > > > are
> > > > > > > > >> > more
> > > > > > > > >> > > > than
> > > > > > > > >> > > > > > one
> > > > > > > > >> > > > > > > > >>> socket connections. Currently, we rely
> on
> > > > > > > > >> controllerEpoch
> > > > > > > > >> > and
> > > > > > > > >> > > > > > > > leaderEpoch
> > > > > > > > >> > > > > > > > >>> to ensure that the receiving broker
> picks
> > up
> > > > the
> > > > > > > > latest
> > > > > > > > >> > state
> > > > > > > > >> > > > for
> > > > > > > > >> > > > > > > each
> > > > > > > > >> > > > > > > > >>> partition.
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>> One potential issue with the dequeue
> > > approach
> > > > is
> > > > > > > that
> > > > > > > > if
> > > > > > > > >> > the
> > > > > > > > >> > > > > queue
> > > > > > > > >> > > > > > is
> > > > > > > > >> > > > > > > > >> full,
> > > > > > > > >> > > > > > > > >>> there is no guarantee that the
> controller
> > > > > requests
> > > > > > > > will
> > > > > > > > >> be
> > > > > > > > >> > > > > enqueued
> > > > > > > > >> > > > > > > > >>> quickly.
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>> Thanks,
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>> Jun
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM,
> Mayuresh
> > > > > Gharat <
> > > > > > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > > > >> > > > > > > > >>>> wrote:
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0
> > in
> > > > the
> > > > > > > > >> > NetworkClient
> > > > > > > > >> > > > > > > > >> constructor.
> > > > > > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient
> > > between
> > > > > > > > >> Controller
> > > > > > > > >> > and
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > > > > >> broker,
> > > > > > > > >> > > > > > > > >>> a
> > > > > > > > >> > > > > > > > >>>> disconnection should not cause it to
> > reset
> > > to
> > > > > 0,
> > > > > > in
> > > > > > > > >> which
> > > > > > > > >> > > case
> > > > > > > > >> > > > > it
> > > > > > > > >> > > > > > > can
> > > > > > > > >> > > > > > > > >> be
> > > > > > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > > > > > >> > > > > > > > >>>>
> > > > > > > > >> > > > > > > > >>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>
> > > > > > > > >> > > > > > > > >>>> Mayuresh
> > > > > > > > >> > > > > > > > >>>>
> > > > > > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas
> > Wang
> > > <
> > > > > > > > >> > > > > lucasatucla@gmail.com
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > > >>> wrote:
> > > > > > > > >> > > > > > > > >>>>
> > > > > > > > >> > > > > > > > >>>>> @Dong,
> > > > > > > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > >> > > > > > > > >>>>> @All
> > > > > > > > >> > > > > > > > >>>>> Regarding the example given by Dong,
> it
> > > > seems
> > > > > > even
> > > > > > > > if
> > > > > > > > >> we
> > > > > > > > >> > > use
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > > > queue,
> > > > > > > > >> > > > > > > > >>>> and a
> > > > > > > > >> > > > > > > > >>>>> dedicated controller request handling
> > > > thread,
> > > > > > > > >> > > > > > > > >>>>> the same result can still happen
> because
> > > > R1_a
> > > > > > will
> > > > > > > > be
> > > > > > > > >> > sent
> > > > > > > > >> > > on
> > > > > > > > >> > > > > one
> > > > > > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent
> > on
> > > a
> > > > > > > > different
> > > > > > > > >> > > > > connection,
> > > > > > > > >> > > > > > > > >>>>> and there is no ordering between
> > different
> > > > > > > > >> connections on
> > > > > > > > >> > > the
> > > > > > > > >> > > > > > > broker
> > > > > > > > >> > > > > > > > >>>> side.
> > > > > > > > >> > > > > > > > >>>>> I was discussing with Mayuresh
> offline,
> > > and
> > > > it
> > > > > > > seems
> > > > > > > > >> > > > > correlation
> > > > > > > > >> > > > > > id
> > > > > > > > >> > > > > > > > >>>> within
> > > > > > > > >> > > > > > > > >>>>> the same NetworkClient object is
> > > > monotonically
> > > > > > > > >> increasing
> > > > > > > > >> > > and
> > > > > > > > >> > > > > > never
> > > > > > > > >> > > > > > > > >>>> reset,
> > > > > > > > >> > > > > > > > >>>>> hence a broker can leverage that to
> > > properly
> > > > > > > reject
> > > > > > > > >> > > obsolete
> > > > > > > > >> > > > > > > > >> requests.
> > > > > > > > >> > > > > > > > >>>>> Thoughts?
> > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > >> > > > > > > > >>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>> Lucas
> > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM,
> > Mayuresh
> > > > > > Gharat
> > > > > > > <
> > > > > > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >> > > > > > > > >>>>>
> > > > > > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset
> in
> > > > case
> > > > > of
> > > > > > > > >> > connection
> > > > > > > > >> > > > > > loss, I
> > > > > > > > >> > > > > > > > >>>> think.
> > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > >> > > > > > > > >>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > >> > > > > > > > >>>>>> Mayuresh
> > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM
> > Mayuresh
> > > > > > Gharat
> > > > > > > <
> > > > > > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order
> > > > > processing
> > > > > > > can
> > > > > > > > >> > happen
> > > > > > > > >> > > > > with
> > > > > > > > >> > > > > > > > >>>> having 2
> > > > > > > > >> > > > > > > > >>>>>>> separate queues as well and it can
> > even
> > > > > happen
> > > > > > > > >> today.
> > > > > > > > >> > > > > > > > >>>>>>> Can we use the correlationId in the
> > > > request
> > > > > > from
> > > > > > > > the
> > > > > > > > >> > > > > controller
> > > > > > > > >> > > > > > > > >> to
> > > > > > > > >> > > > > > > > >>>> the
> > > > > > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM
> Becket
> > > > Qin <
> > > > > > > > >> > > > > > becket.qin@gmail.com
> > > > > > > > >> > > > > > > > >>>
> > > > > > > > >> > > > > > > > >>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a
> > > > dedicated
> > > > > > > > >> controller
> > > > > > > > >> > > > > request
> > > > > > > > >> > > > > > > > >>>> handling
> > > > > > > > >> > > > > > > > >>>>>>>> thread would be a better isolation.
> > It
> > > > also
> > > > > > > > solves
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >> reordering
> > > > > > > > >> > > > > > > > >>>>> issue.
> > > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM,
> Joel
> > > > > Koshy <
> > > > > > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>> Good example. I think this
> scenario
> > > can
> > > > > > occur
> > > > > > > in
> > > > > > > > >> the
> > > > > > > > >> > > > > current
> > > > > > > > >> > > > > > > > >>> code
> > > > > > > > >> > > > > > > > >>>> as
> > > > > > > > >> > > > > > > > >>>>>>>> well
> > > > > > > > >> > > > > > > > >>>>>>>>> but with even lower probability
> > given
> > > > that
> > > > > > > there
> > > > > > > > >> are
> > > > > > > > >> > > > other
> > > > > > > > >> > > > > > > > >>>>>>>> non-controller
> > > > > > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still
> > > > sketchy
> > > > > > > though
> > > > > > > > >> and
> > > > > > > > >> > I
> > > > > > > > >> > > > > think
> > > > > > > > >> > > > > > a
> > > > > > > > >> > > > > > > > >>>> safer
> > > > > > > > >> > > > > > > > >>>>>>>>> approach would be separate queues
> > and
> > > > > > pinning
> > > > > > > > >> > > controller
> > > > > > > > >> > > > > > > > >> request
> > > > > > > > >> > > > > > > > >>>>>>>> handling
> > > > > > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM,
> > Dong
> > > > > Lin <
> > > > > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > > > > >> > > > > > > > >>>>
> > > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> I think you are right that there
> > may
> > > be
> > > > > > > > >> out-of-order
> > > > > > > > >> > > > > > > > >>> processing.
> > > > > > > > >> > > > > > > > >>>>>>>> However,
> > > > > > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order
> > processing
> > > > may
> > > > > > > also
> > > > > > > > >> > happen
> > > > > > > > >> > > > even
> > > > > > > > >> > > > > > > > >> if
> > > > > > > > >> > > > > > > > >>> we
> > > > > > > > >> > > > > > > > >>>>>> use a
> > > > > > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> > > > > disconnected
> > > > > > > > before
> > > > > > > > >> > > > > receiving
> > > > > > > > >> > > > > > > > >>>>>> response.
> > > > > > > > >> > > > > > > > >>>>>>>>> Then
> > > > > > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both
> > > > requests
> > > > > > now
> > > > > > > > >> stay
> > > > > > > > >> > in
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>> request queue in the order they
> are
> > > > sent.
> > > > > > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the
> > request
> > > > > queue
> > > > > > > and
> > > > > > > > >> then
> > > > > > > > >> > > > > thread2
> > > > > > > > >> > > > > > > > >>>> takes
> > > > > > > > >> > > > > > > > >>>>>> R2
> > > > > > > > >> > > > > > > > >>>>>>>>> from
> > > > > > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the
> > same
> > > > > time.
> > > > > > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in
> > > > > parallel.
> > > > > > > > >> There is
> > > > > > > > >> > > > > chance
> > > > > > > > >> > > > > > > > >>> that
> > > > > > > > >> > > > > > > > >>>>>> R2's
> > > > > > > > >> > > > > > > > >>>>>>>>>> processing is completed before
> R1.
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can
> > happen
> > > > for
> > > > > > > both
> > > > > > > > >> > > > approaches
> > > > > > > > >> > > > > > > > >> with
> > > > > > > > >> > > > > > > > >>>>> very
> > > > > > > > >> > > > > > > > >>>>>>>> low
> > > > > > > > >> > > > > > > > >>>>>>>>>> probability, it may not be
> > worthwhile
> > > > to
> > > > > > add
> > > > > > > > the
> > > > > > > > >> > extra
> > > > > > > > >> > > > > > > > >> queue.
> > > > > > > > >> > > > > > > > >>>> What
> > > > > > > > >> > > > > > > > >>>>>> do
> > > > > > > > >> > > > > > > > >>>>>>>> you
> > > > > > > > >> > > > > > > > >>>>>>>>>> think?
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM,
> > > Becket
> > > > > > Qin <
> > > > > > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > > > > > >> > > > > > > > >>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a
> > > dequeue
> > > > > was
> > > > > > > > >> bright
> > > > > > > > >> > up
> > > > > > > > >> > > > some
> > > > > > > > >> > > > > > > > >>> time
> > > > > > > > >> > > > > > > > >>>>> ago
> > > > > > > > >> > > > > > > > >>>>>>>> when
> > > > > > > > >> > > > > > > > >>>>>>>>>> we
> > > > > > > > >> > > > > > > > >>>>>>>>>>> initially thinking of
> prioritizing
> > > the
> > > > > > > > request.
> > > > > > > > >> The
> > > > > > > > >> > > > > > > > >> concern
> > > > > > > > >> > > > > > > > >>>> was
> > > > > > > > >> > > > > > > > >>>>>> that
> > > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed
> > to
> > > be
> > > > > > > > >> processed in
> > > > > > > > >> > > > > order.
> > > > > > > > >> > > > > > > > >>> If
> > > > > > > > >> > > > > > > > >>>> we
> > > > > > > > >> > > > > > > > >>>>>> can
> > > > > > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > > > > > >> > > > > > > > >>>>>>>>>>> that there is one controller
> > request
> > > > in
> > > > > > the
> > > > > > > > >> request
> > > > > > > > >> > > > > > > > >> channel,
> > > > > > > > >> > > > > > > > >>>> the
> > > > > > > > >> > > > > > > > >>>>>>>> order
> > > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that
> > > there
> > > > > are
> > > > > > > > more
> > > > > > > > >> > than
> > > > > > > > >> > > > one
> > > > > > > > >> > > > > > > > >>>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>> request
> > > > > > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the
> > > > controller
> > > > > > > > request
> > > > > > > > >> > order
> > > > > > > > >> > > > may
> > > > > > > > >> > > > > > > > >>>> change
> > > > > > > > >> > > > > > > > >>>>>> and
> > > > > > > > >> > > > > > > > >>>>>>>>>> cause
> > > > > > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think
> about
> > > the
> > > > > > > > following
> > > > > > > > >> > > > sequence:
> > > > > > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent
> a
> > > > > request
> > > > > > R1
> > > > > > > > to
> > > > > > > > >> > > broker
> > > > > > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put
> the
> > > > > request
> > > > > > to
> > > > > > > > the
> > > > > > > > >> > head
> > > > > > > > >> > > > of
> > > > > > > > >> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > > > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker
> connection
> > > > > failed
> > > > > > > and
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >> controller
> > > > > > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > > > > > >> > > > > > > > >>>>>>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > > > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2
> > to
> > > > the
> > > > > > > broker
> > > > > > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it
> > to
> > > > the
> > > > > > head
> > > > > > > > of
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >> request
> > > > > > > > >> > > > > > > > >>>>> queue.
> > > > > > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will
> be
> > > > > > processed
> > > > > > > > >> before
> > > > > > > > >> > > R1
> > > > > > > > >> > > > is
> > > > > > > > >> > > > > > > > >>>>>> processed,
> > > > > > > > >> > > > > > > > >>>>>>>>>> which
> > > > > > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM,
> > > Joel
> > > > > > Koshy
> > > > > > > <
> > > > > > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea.
> It
> > > > > appears
> > > > > > to
> > > > > > > > be
> > > > > > > > >> a
> > > > > > > > >> > > > simpler
> > > > > > > > >> > > > > > > > >>>> less
> > > > > > > > >> > > > > > > > >>>>>>>>> invasive
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > > > > > > >> Jun/Becket/others,
> > > > > > > > >> > > do
> > > > > > > > >> > > > > > > > >> you
> > > > > > > > >> > > > > > > > >>>> see
> > > > > > > > >> > > > > > > > >>>>>> any
> > > > > > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03
> PM,
> > > > Lucas
> > > > > > > Wang
> > > > > > > > <
> > > > > > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea
> > > that
> > > > I
> > > > > > > > haven't
> > > > > > > > >> > > thought
> > > > > > > > >> > > > > > > > >>>>> before.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem
> at
> > > > hand
> > > > > > > pretty
> > > > > > > > >> > well,
> > > > > > > > >> > > > and
> > > > > > > > >> > > > > > > > >>>> also
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new
> > size
> > > > > > metric
> > > > > > > > and
> > > > > > > > >> > > > capacity
> > > > > > > > >> > > > > > > > >>>>> config
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request
> > queue.
> > > In
> > > > > > fact,
> > > > > > > > if
> > > > > > > > >> we
> > > > > > > > >> > > were
> > > > > > > > >> > > > > > > > >> to
> > > > > > > > >> > > > > > > > >>>>> adopt
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no
> public
> > > > > > interface
> > > > > > > > >> change,
> > > > > > > > >> > > and
> > > > > > > > >> > > > > > > > >> we
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it
> > seems
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> the java class
> > LinkedBlockingQueue
> > > > can
> > > > > > > > readily
> > > > > > > > >> > > > satisfy
> > > > > > > > >> > > > > > > > >>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and
> > also
> > > > > > > allowing
> > > > > > > > >> > > inserting
> > > > > > > > >> > > > > > > > >> at
> > > > > > > > >> > > > > > > > >>>>> both
> > > > > > > > >> > > > > > > > >>>>>>>> ends.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this
> > > design
> > > > is
> > > > > > > tied
> > > > > > > > to
> > > > > > > > >> > the
> > > > > > > > >> > > > > > > > >>>>> coincidence
> > > > > > > > >> > > > > > > > >>>>>>>> that
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities
> > and
> > > > > there
> > > > > > > are
> > > > > > > > >> two
> > > > > > > > >> > > ends
> > > > > > > > >> > > > > > > > >>> to a
> > > > > > > > >> > > > > > > > >>>>>>>> deque.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed
> > > design,
> > > > it
> > > > > > > seems
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >> network
> > > > > > > > >> > > > > > > > >>>>> layer
> > > > > > > > >> > > > > > > > >>>>>>>> is
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with
> upper
> > > > layer
> > > > > > > logic,
> > > > > > > > >> e.g.
> > > > > > > > >> > > if
> > > > > > > > >> > > > > > > > >> we
> > > > > > > > >> > > > > > > > >>>> were
> > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>> add
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the
> > > > future
> > > > > > for
> > > > > > > > some
> > > > > > > > >> > > > reason,
> > > > > > > > >> > > > > > > > >>> we
> > > > > > > > >> > > > > > > > >>>>>> would
> > > > > > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design
> of
> > > > > > separate
> > > > > > > > >> queues,
> > > > > > > > >> > > one
> > > > > > > > >> > > > > > > > >>> for
> > > > > > > > >> > > > > > > > >>>>> each
> > > > > > > > >> > > > > > > > >>>>>>>>>> priority
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both
> > > designs
> > > > > and
> > > > > > > > lean
> > > > > > > > >> > > toward
> > > > > > > > >> > > > > > > > >>> your
> > > > > > > > >> > > > > > > > >>>>>>>> suggested
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's
> suggested
> > > new
> > > > > > > design,
> > > > > > > > >> I'm
> > > > > > > > >> > > > > > > > >>> answering
> > > > > > > > >> > > > > > > > >>>>>> your
> > > > > > > > >> > > > > > > > >>>>>>>>>>> question
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I
> > think
> > > > > your
> > > > > > > > >> > suggestion
> > > > > > > > >> > > > > > > > >> makes
> > > > > > > > >> > > > > > > > >>>>>> sense,
> > > > > > > > >> > > > > > > > >>>>>>>> and
> > > > > > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity
> > config
> > > > and
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default
> > value
> > > of
> > > > > 20
> > > > > > > > being
> > > > > > > > >> > > > > > > > >> sufficient
> > > > > > > > >> > > > > > > > >>>>>> enough.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57
> AM,
> > > > > > Mayuresh
> > > > > > > > >> Gharat
> > > > > > > > >> > <
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent
> here
> > > is
> > > > to
> > > > > > > > >> prioritize
> > > > > > > > >> > > the
> > > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>>> request
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change
> the
> > > > > request
> > > > > > > > queue
> > > > > > > > >> > to a
> > > > > > > > >> > > > > > > > >>>>> dequeue,
> > > > > > > > >> > > > > > > > >>>>>>>> where
> > > > > > > > >> > > > > > > > >>>>>>>>>> you
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal
> > requests
> > > > > > > (produce,
> > > > > > > > >> > > > > > > > >>>> consume,..etc)
> > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>> end
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a
> > > > controller
> > > > > > > > request,
> > > > > > > > >> > you
> > > > > > > > >> > > > > > > > >>> insert
> > > > > > > > >> > > > > > > > >>>>> it
> > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>> head
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that
> > the
> > > > > > > controller
> > > > > > > > >> > > request
> > > > > > > > >> > > > > > > > >>> will
> > > > > > > > >> > > > > > > > >>>>> be
> > > > > > > > >> > > > > > > > >>>>>>>> given
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one
> > > request
> > > > > > from
> > > > > > > > the
> > > > > > > > >> > > socket
> > > > > > > > >> > > > > > > > >>> and
> > > > > > > > >> > > > > > > > >>>>> mute
> > > > > > > > >> > > > > > > > >>>>>>>> it
> > > > > > > > >> > > > > > > > >>>>>>>>> and
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the
> > > > request,
> > > > > > > this
> > > > > > > > >> would
> > > > > > > > >> > > > > > > > >>> ensure
> > > > > > > > >> > > > > > > > >>>>> that
> > > > > > > > >> > > > > > > > >>>>>>>> we
> > > > > > > > >> > > > > > > > >>>>>>>>>> don't
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests
> out
> > of
> > > > > > order.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can
> avoid
> > > the
> > > > > > > second
> > > > > > > > >> queue
> > > > > > > > >> > > and
> > > > > > > > >> > > > > > > > >>> the
> > > > > > > > >> > > > > > > > >>>>>>>>> additional
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05
> AM
> > > > > Becket
> > > > > > > Qin
> > > > > > > > <
> > > > > > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail
> > > explanation.
> > > > I
> > > > > > > agree
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >>> current
> > > > > > > > >> > > > > > > > >>>>>> design
> > > > > > > > >> > > > > > > > >>>>>>>>>> makes
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about
> whether
> > > the
> > > > > new
> > > > > > > > config
> > > > > > > > >> > for
> > > > > > > > >> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I
> > cannot
> > > > > think
> > > > > > > of a
> > > > > > > > >> case
> > > > > > > > >> > > in
> > > > > > > > >> > > > > > > > >>>> which
> > > > > > > > >> > > > > > > > >>>>>>>> users
> > > > > > > > >> > > > > > > > >>>>>>>>>>> would
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00
> > PM,
> > > > > > Becket
> > > > > > > > Qin
> > > > > > > > >> <
> > > > > > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be
> > > > > rephrased
> > > > > > to
> > > > > > > > >> "do we
> > > > > > > > >> > > > > > > > >>>> expect
> > > > > > > > >> > > > > > > > >>>>>>>> user to
> > > > > > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request
> queue
> > > > > > capacity"?
> > > > > > > > If
> > > > > > > > >> we
> > > > > > > > >> > > > > > > > >>> agree
> > > > > > > > >> > > > > > > > >>>>> that
> > > > > > > > >> > > > > > > > >>>>>>>> 20
> > > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default
> number
> > > and
> > > > we
> > > > > > do
> > > > > > > > not
> > > > > > > > >> > > > > > > > >> expect
> > > > > > > > >> > > > > > > > >>>> user
> > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>>>> change
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose
> > this
> > > > as a
> > > > > > > > config?
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at
> 2:29
> > > AM,
> > > > > > Lucas
> > > > > > > > >> Wang <
> > > > > > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment.
> > You
> > > > are
> > > > > > > right
> > > > > > > > >> that
> > > > > > > > >> > > > > > > > >>>>> normally
> > > > > > > > >> > > > > > > > >>>>>>>> there
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request
> > because
> > > > of
> > > > > > > > muting,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to
> > say
> > > > > there
> > > > > > > > would
> > > > > > > > >> be
> > > > > > > > >> > > > > > > > >> many
> > > > > > > > >> > > > > > > > >>>>>>>> enqueued
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP
> > again,
> > > > and
> > > > > > I'm
> > > > > > > > not
> > > > > > > > >> > sure
> > > > > > > > >> > > > > > > > >>>> which
> > > > > > > > >> > > > > > > > >>>>>> part
> > > > > > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if
> > you
> > > > > point
> > > > > > it
> > > > > > > > out
> > > > > > > > >> > the
> > > > > > > > >> > > > > > > > >>>>> section.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not
> > happen
> > > > in
> > > > > > > normal
> > > > > > > > >> > > > > > > > >>>> conditions,
> > > > > > > > >> > > > > > > > >>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>> current
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> > > > > > controllers
> > > > > > > > >> > running
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if
> > we
> > > > > don't
> > > > > > > have
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its
> > > > > capacity
> > > > > > to
> > > > > > > > be
> > > > > > > > >> 1,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling
> > > > requests
> > > > > > from
> > > > > > > > >> > > > > > > > >> different
> > > > > > > > >> > > > > > > > >>>>>>>> controllers
> > > > > > > > >> > > > > > > > >>>>>>>>>>> will
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those
> > > troublesome
> > > > > > > times,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what
> > we
> > > > > want.
> > > > > > On
> > > > > > > > the
> > > > > > > > >> > > > > > > > >> other
> > > > > > > > >> > > > > > > > >>>>> hand,
> > > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default
> value,
> > > say
> > > > > 20,
> > > > > > > > >> guards
> > > > > > > > >> > us
> > > > > > > > >> > > > > > > > >>> from
> > > > > > > > >> > > > > > > > >>>>>>>> issues
> > > > > > > > >> > > > > > > > >>>>>>>>> in
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO
> > > there
> > > > > > isn't
> > > > > > > > >> much
> > > > > > > > >> > > > > > > > >>>> downside
> > > > > > > > >> > > > > > > > >>>>> of
> > > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence
> is
> > > an
> > > > > > > obsolete
> > > > > > > > >> > > > > > > > >>> statement
> > > > > > > > >> > > > > > > > >>>>>> based
> > > > > > > > >> > > > > > > > >>>>>>>> on
> > > > > > > > >> > > > > > > > >>>>>>>>> a
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the
> > > wording
> > > > > in
> > > > > > > the
> > > > > > > > >> KIP.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> > 10:33
> > > > AM,
> > > > > > > > Mayuresh
> > > > > > > > >> > > > > > > > >>> Gharat <
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> gharatmayuresh15@gmail.com>
> > > > > wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand
> > why
> > > > you
> > > > > > > think
> > > > > > > > >> "The
> > > > > > > > >> > > > > > > > >>>> memory
> > > > > > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of
> > > > queued
> > > > > > > > requests
> > > > > > > > >> > can
> > > > > > > > >> > > > > > > > >>> go
> > > > > > > > >> > > > > > > > >>>> up
> > > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > > >> > > > > > > > >>>>>>>> 2x"
> > > > > > > > >> > > > > > > > >>>>>>>>>> in
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the
> > > requests
> > > > > from
> > > > > > > > >> > > > > > > > >> controller
> > > > > > > > >> > > > > > > > >>>> to a
> > > > > > > > >> > > > > > > > >>>>>>>> Broker
> > > > > > > > >> > > > > > > > >>>>>>>>>> are
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> > 5:06
> > > AM
> > > > > > > Becket
> > > > > > > > >> Qin <
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP,
> Lucas.
> > > > > > > Separating
> > > > > > > > >> the
> > > > > > > > >> > > > > > > > >>>> control
> > > > > > > > >> > > > > > > > >>>>>>>> plane
> > > > > > > > >> > > > > > > > >>>>>>>>>> from
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned
> > > that
> > > > > the
> > > > > > > > >> > > > > > > > >> controller
> > > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > > >> > > > > > > > >>>>>>>>>>> may
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will
> this
> > > be a
> > > > > > > common
> > > > > > > > >> case?
> > > > > > > > >> > > > > > > > >>> The
> > > > > > > > >> > > > > > > > >>>>>>>>> controller
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the
> > > SocketServer.
> > > > > The
> > > > > > > > >> > > > > > > > >>> SocketServer
> > > > > > > > >> > > > > > > > >>>>>> will
> > > > > > > > >> > > > > > > > >>>>>>>>> mute
> > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and
> put
> > > into
> > > > > the
> > > > > > > > >> request
> > > > > > > > >> > > > > > > > >>>>> channel.
> > > > > > > > >> > > > > > > > >>>>>>>> So
> > > > > > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection
> > between
> > > > > > > controller
> > > > > > > > >> and
> > > > > > > > >> > > > > > > > >>> each
> > > > > > > > >> > > > > > > > >>>>>>>> broker,
> > > > > > > > >> > > > > > > > >>>>>>>>> on
> > > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

Thanks for the explanation. It might be a nitpick, but it seems better to
mention in the motivation part that today the client requests and
controller requests are not only sharing the same queue, but also a bunch
of things else, so that we can avoid asking people to read the rejected
alternatives.

Thanks,

Jiangjie (Becket) Qin







On Fri, Aug 10, 2018 at 6:23 AM, Lucas Wang <lu...@gmail.com> wrote:

> @Becket,
>
> I've asked for review by Jun and Joel in the vote thread.
> Regarding the separate thread and port, I did talk about it in the rejected
> alternative design 1.
> Please let me know if you'd like more elaboration or moving it to the
> motivation, etc.
>
> Thanks,
> Lucas
>
> On Wed, Aug 8, 2018 at 3:59 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Lucas,
> >
> > Yes, a separate Jira is OK.
> >
> > Since the proposal has significantly changed since the initial vote
> > started. We probably should let the others who have already voted know
> and
> > ensure they are happy with the updated proposal.
> > Also, it seems the motivation part of the KIP wiki is still just talking
> > about the separate queue and not fully cover the changes we make now,
> e.g.
> > separate thread, port, etc. We might want to explain a bit more so for
> > people who did not follow the discussion mail thread also understand the
> > whole proposal.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Aug 8, 2018 at 12:44 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Becket,
> > >
> > > Thanks for the review. The current write up in the KIP won’t change the
> > > ordering behavior. Are you ok with addressing that as a separate
> > > independent issue (I’ll create a separate ticket for it)?
> > > If so, can you please give me a +1 on the vote thread?
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
> > > >
> > > > It might be an implementation detail, but do we still plan to use the
> > > > correlation id to ensure the request processing order?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for your review, Dong.
> > > > > Ack that these configs will have a bigger impact for users.
> > > > >
> > > > > On the other hand, I would argue that the request queue becoming
> full
> > > > > may or may not be a rare scenario.
> > > > > How often the request queue gets full depends on the request
> incoming
> > > > rate,
> > > > > the request processing rate, and the size of the request queue.
> > > > > When that happens, the dedicated endpoints design can better handle
> > > > > it than any of the previously discussed options.
> > > > >
> > > > > Another reason I made the change was that I have the same taste
> > > > > as Becket that it's a better separation of the control plane from
> the
> > > > data
> > > > > plane.
> > > > >
> > > > > Finally, I want to clarify that this change is NOT motivated by the
> > > > > out-of-order
> > > > > processing discussion. The latter problem is orthogonal to this
> KIP,
> > > and
> > > > it
> > > > > can happen in any of the design options we discussed for this KIP
> so
> > > far.
> > > > > So I'd like to address out-of-order processing separately in
> another
> > > > > thread,
> > > > > and avoid mentioning it in this KIP.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hey Lucas,
> > > > > >
> > > > > > Thanks for the update.
> > > > > >
> > > > > > The current KIP propose new broker configs
> > "listeners.for.controller"
> > > > and
> > > > > > "advertised.listeners.for.controller". This is going to be a big
> > > change
> > > > > > since listeners are among the most important configs that every
> > user
> > > > > needs
> > > > > > to change. According to the rejected alternative section, it
> seems
> > > that
> > > > > the
> > > > > > reason to add these two configs is to improve performance when
> the
> > > data
> > > > > > request queue is full rather than for correctness. It should be a
> > > very
> > > > > rare
> > > > > > scenario and I am not sure we should add configs for all users
> just
> > > to
> > > > > > improve the performance in such rare scenario.
> > > > > >
> > > > > > Also, if the new design is based on the issues which are
> discovered
> > > in
> > > > > the
> > > > > > recent discussion, e.g. out of order processing if we don't use a
> > > > > dedicated
> > > > > > thread for controller request, it may be useful to explain the
> > > problem
> > > > in
> > > > > > the motivation section.
> > > > > >
> > > > > > Thanks,
> > > > > > Dong
> > > > > >
> > > > > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > A kind reminder for review of this KIP.
> > > > > > >
> > > > > > > Thank you very much!
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I've updated the KIP by adding the dedicated endpoints for
> > > > controller
> > > > > > > > connections,
> > > > > > > > and pinning threads for controller requests.
> > > > > > > > Also I've updated the title of this KIP. Please take a look
> and
> > > let
> > > > > me
> > > > > > > > know your feedback.
> > > > > > > >
> > > > > > > > Thanks a lot for your time!
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > >
> > > > > > > >> Hi Lucas,
> > > > > > > >> I agree, if we want to go forward with a separate controller
> > > plane
> > > > > and
> > > > > > > >> data
> > > > > > > >> plane and completely isolate them, having a separate port
> for
> > > > > > controller
> > > > > > > >> with a separate Acceptor and a Processor sounds ideal to me.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Mayuresh
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> > Hi Lucas,
> > > > > > > >> >
> > > > > > > >> > Yes, I agree that a dedicated end to end control flow
> would
> > be
> > > > > > ideal.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > Jiangjie (Becket) Qin
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > Thanks for the comment, Becket.
> > > > > > > >> > > So far, we've been trying to avoid making any request
> > > handler
> > > > > > thread
> > > > > > > >> > > special.
> > > > > > > >> > > But if we were to follow that path in order to make the
> > two
> > > > > planes
> > > > > > > >> more
> > > > > > > >> > > isolated,
> > > > > > > >> > > what do you think about also having a dedicated
> processor
> > > > > thread,
> > > > > > > >> > > and dedicated port for the controller?
> > > > > > > >> > >
> > > > > > > >> > > Today one processor thread can handle multiple
> > connections,
> > > > > let's
> > > > > > > say
> > > > > > > >> 100
> > > > > > > >> > > connections
> > > > > > > >> > >
> > > > > > > >> > > represented by connection0, ... connection99, among
> which
> > > > > > > >> connection0-98
> > > > > > > >> > > are from clients, while connection99 is from
> > > > > > > >> > >
> > > > > > > >> > > the controller. Further let's say after one selector
> > > polling,
> > > > > > there
> > > > > > > >> are
> > > > > > > >> > > incoming requests on all connections.
> > > > > > > >> > >
> > > > > > > >> > > When the request queue is full, (either the data request
> > > being
> > > > > > full
> > > > > > > in
> > > > > > > >> > the
> > > > > > > >> > > two queue design, or
> > > > > > > >> > >
> > > > > > > >> > > the one single queue being full in the deque design),
> the
> > > > > > processor
> > > > > > > >> > thread
> > > > > > > >> > > will be blocked first
> > > > > > > >> > >
> > > > > > > >> > > when trying to enqueue the data request from
> connection0,
> > > then
> > > > > > > >> possibly
> > > > > > > >> > > blocked for the data request
> > > > > > > >> > >
> > > > > > > >> > > from connection1, ... etc even though the controller
> > request
> > > > is
> > > > > > > ready
> > > > > > > >> to
> > > > > > > >> > be
> > > > > > > >> > > enqueued.
> > > > > > > >> > >
> > > > > > > >> > > To solve this problem, it seems we would need to have a
> > > > separate
> > > > > > > port
> > > > > > > >> > > dedicated to
> > > > > > > >> > >
> > > > > > > >> > > the controller, a dedicated processor thread, a
> dedicated
> > > > > > controller
> > > > > > > >> > > request queue,
> > > > > > > >> > >
> > > > > > > >> > > and pinning of one request handler thread for controller
> > > > > requests.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Lucas
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > > > > becket.qin@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Personally I am not fond of the dequeue approach
> simply
> > > > > because
> > > > > > it
> > > > > > > >> is
> > > > > > > >> > > > against the basic idea of isolating the controller
> plane
> > > and
> > > > > > data
> > > > > > > >> > plane.
> > > > > > > >> > > > With a single dequeue, theoretically speaking the
> > > controller
> > > > > > > >> requests
> > > > > > > >> > can
> > > > > > > >> > > > starve the clients requests. I would prefer the
> approach
> > > > with
> > > > > a
> > > > > > > >> > separate
> > > > > > > >> > > > controller request queue and a dedicated controller
> > > request
> > > > > > > handler
> > > > > > > >> > > thread.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > >
> > > > > > > >> > > > Jiangjie (Becket) Qin
> > > > > > > >> > > >
> > > > > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Sure, I can summarize the usage of correlation id.
> But
> > > > > before
> > > > > > I
> > > > > > > do
> > > > > > > >> > > that,
> > > > > > > >> > > > it
> > > > > > > >> > > > > seems
> > > > > > > >> > > > > the same out-of-order processing can also happen to
> > > > Produce
> > > > > > > >> requests
> > > > > > > >> > > sent
> > > > > > > >> > > > > by producers,
> > > > > > > >> > > > > following the same example you described earlier.
> > > > > > > >> > > > > If that's the case, I think this probably deserves a
> > > > > separate
> > > > > > > doc
> > > > > > > >> and
> > > > > > > >> > > > > design independent of this KIP.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Lucas
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > > > > lindong28@gmail.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Hey Lucas,
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Could you update the KIP if you are confident with
> > the
> > > > > > > approach
> > > > > > > >> > which
> > > > > > > >> > > > > uses
> > > > > > > >> > > > > > correlation id? The idea around correlation id is
> > kind
> > > > of
> > > > > > > >> scattered
> > > > > > > >> > > > > across
> > > > > > > >> > > > > > multiple emails. It will be useful if other
> reviews
> > > can
> > > > > read
> > > > > > > the
> > > > > > > >> > KIP
> > > > > > > >> > > to
> > > > > > > >> > > > > > understand the latest proposal.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Thanks,
> > > > > > > >> > > > > > Dong
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat
> <
> > > > > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > I like the idea of the dequeue implementation by
> > > > Lucas.
> > > > > > This
> > > > > > > >> will
> > > > > > > >> > > > help
> > > > > > > >> > > > > us
> > > > > > > >> > > > > > > avoid additional queue for controller and
> > additional
> > > > > > configs
> > > > > > > >> in
> > > > > > > >> > > > Kafka.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Thanks,
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Mayuresh
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > > > > >> becket.qin@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Hi Jun,
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > The usage of correlation ID might still be
> > useful
> > > to
> > > > > > > address
> > > > > > > >> > the
> > > > > > > >> > > > > cases
> > > > > > > >> > > > > > > > that the controller epoch and leader epoch
> check
> > > are
> > > > > not
> > > > > > > >> > > sufficient
> > > > > > > >> > > > > to
> > > > > > > >> > > > > > > > guarantee correct behavior. For example, if
> the
> > > > > > controller
> > > > > > > >> > sends
> > > > > > > >> > > a
> > > > > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> > > > StopReplicaRequest,
> > > > > > and
> > > > > > > >> the
> > > > > > > >> > > > broker
> > > > > > > >> > > > > > > > processes it in the reverse order, the replica
> > may
> > > > > still
> > > > > > > be
> > > > > > > >> > > wrongly
> > > > > > > >> > > > > > > > recreated, right?
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Thanks,
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > > > > jun@confluent.io
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Hmm, since we already use controller epoch
> and
> > > > > leader
> > > > > > > >> epoch
> > > > > > > >> > for
> > > > > > > >> > > > > > > properly
> > > > > > > >> > > > > > > > > caching the latest partition state, do we
> > really
> > > > > need
> > > > > > > >> > > correlation
> > > > > > > >> > > > > id
> > > > > > > >> > > > > > > for
> > > > > > > >> > > > > > > > > ordering the controller requests?
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Thanks,
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Jun
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin
> <
> > > > > > > >> > > > becket.qin@gmail.com>
> > > > > > > >> > > > > > > > wrote:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> Good idea. The correlation id should work.
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> In the ControllerChannelManager, a request
> > will
> > > > be
> > > > > > > resent
> > > > > > > >> > > until
> > > > > > > >> > > > a
> > > > > > > >> > > > > > > > response
> > > > > > > >> > > > > > > > >> is received. So if the controller to broker
> > > > > > connection
> > > > > > > >> > > > disconnects
> > > > > > > >> > > > > > > after
> > > > > > > >> > > > > > > > >> controller sends R1_a, but before the
> > response
> > > of
> > > > > > R1_a
> > > > > > > is
> > > > > > > >> > > > > received,
> > > > > > > >> > > > > > a
> > > > > > > >> > > > > > > > >> disconnection may cause the controller to
> > > resend
> > > > > > R1_b.
> > > > > > > >> i.e.
> > > > > > > >> > > > until
> > > > > > > >> > > > > R1
> > > > > > > >> > > > > > > is
> > > > > > > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > > > > > >> > > > > > > > >> This gives two guarantees:
> > > > > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1
> > must
> > > > > have
> > > > > > > been
> > > > > > > >> > > > > processed
> > > > > > > >> > > > > > at
> > > > > > > >> > > > > > > > >> least once.
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> So on the broker side, with a single thread
> > > > > > controller
> > > > > > > >> > request
> > > > > > > >> > > > > > > handler,
> > > > > > > >> > > > > > > > the
> > > > > > > >> > > > > > > > >> logic should be:
> > > > > > > >> > > > > > > > >> 1. Process what ever request seen in the
> > > > controller
> > > > > > > >> request
> > > > > > > >> > > > queue
> > > > > > > >> > > > > > > > >> 2. For the given epoch, drop request if its
> > > > > > correlation
> > > > > > > >> id
> > > > > > > >> > is
> > > > > > > >> > > > > > smaller
> > > > > > > >> > > > > > > > than
> > > > > > > >> > > > > > > > >> that of the last processed request.
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> Thanks,
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > > > > > > >> jun@confluent.io>
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > > > > >>
> > > > > > > >> > > > > > > > >>> I agree that there is no strong ordering
> > when
> > > > > there
> > > > > > > are
> > > > > > > >> > more
> > > > > > > >> > > > than
> > > > > > > >> > > > > > one
> > > > > > > >> > > > > > > > >>> socket connections. Currently, we rely on
> > > > > > > >> controllerEpoch
> > > > > > > >> > and
> > > > > > > >> > > > > > > > leaderEpoch
> > > > > > > >> > > > > > > > >>> to ensure that the receiving broker picks
> up
> > > the
> > > > > > > latest
> > > > > > > >> > state
> > > > > > > >> > > > for
> > > > > > > >> > > > > > > each
> > > > > > > >> > > > > > > > >>> partition.
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>> One potential issue with the dequeue
> > approach
> > > is
> > > > > > that
> > > > > > > if
> > > > > > > >> > the
> > > > > > > >> > > > > queue
> > > > > > > >> > > > > > is
> > > > > > > >> > > > > > > > >> full,
> > > > > > > >> > > > > > > > >>> there is no guarantee that the controller
> > > > requests
> > > > > > > will
> > > > > > > >> be
> > > > > > > >> > > > > enqueued
> > > > > > > >> > > > > > > > >>> quickly.
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>> Thanks,
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>> Jun
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh
> > > > Gharat <
> > > > > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > > >> > > > > > > > >>>> wrote:
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0
> in
> > > the
> > > > > > > >> > NetworkClient
> > > > > > > >> > > > > > > > >> constructor.
> > > > > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient
> > between
> > > > > > > >> Controller
> > > > > > > >> > and
> > > > > > > >> > > > the
> > > > > > > >> > > > > > > > >> broker,
> > > > > > > >> > > > > > > > >>> a
> > > > > > > >> > > > > > > > >>>> disconnection should not cause it to
> reset
> > to
> > > > 0,
> > > > > in
> > > > > > > >> which
> > > > > > > >> > > case
> > > > > > > >> > > > > it
> > > > > > > >> > > > > > > can
> > > > > > > >> > > > > > > > >> be
> > > > > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > > > > >> > > > > > > > >>>>
> > > > > > > >> > > > > > > > >>>> Thanks,
> > > > > > > >> > > > > > > > >>>>
> > > > > > > >> > > > > > > > >>>> Mayuresh
> > > > > > > >> > > > > > > > >>>>
> > > > > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas
> Wang
> > <
> > > > > > > >> > > > > lucasatucla@gmail.com
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > >>> wrote:
> > > > > > > >> > > > > > > > >>>>
> > > > > > > >> > > > > > > > >>>>> @Dong,
> > > > > > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > > > > >> > > > > > > > >>>>>
> > > > > > > >> > > > > > > > >>>>> @All
> > > > > > > >> > > > > > > > >>>>> Regarding the example given by Dong, it
> > > seems
> > > > > even
> > > > > > > if
> > > > > > > >> we
> > > > > > > >> > > use
> > > > > > > >> > > > a
> > > > > > > >> > > > > > > queue,
> > > > > > > >> > > > > > > > >>>> and a
> > > > > > > >> > > > > > > > >>>>> dedicated controller request handling
> > > thread,
> > > > > > > >> > > > > > > > >>>>> the same result can still happen because
> > > R1_a
> > > > > will
> > > > > > > be
> > > > > > > >> > sent
> > > > > > > >> > > on
> > > > > > > >> > > > > one
> > > > > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent
> on
> > a
> > > > > > > different
> > > > > > > >> > > > > connection,
> > > > > > > >> > > > > > > > >>>>> and there is no ordering between
> different
> > > > > > > >> connections on
> > > > > > > >> > > the
> > > > > > > >> > > > > > > broker
> > > > > > > >> > > > > > > > >>>> side.
> > > > > > > >> > > > > > > > >>>>> I was discussing with Mayuresh offline,
> > and
> > > it
> > > > > > seems
> > > > > > > >> > > > > correlation
> > > > > > > >> > > > > > id
> > > > > > > >> > > > > > > > >>>> within
> > > > > > > >> > > > > > > > >>>>> the same NetworkClient object is
> > > monotonically
> > > > > > > >> increasing
> > > > > > > >> > > and
> > > > > > > >> > > > > > never
> > > > > > > >> > > > > > > > >>>> reset,
> > > > > > > >> > > > > > > > >>>>> hence a broker can leverage that to
> > properly
> > > > > > reject
> > > > > > > >> > > obsolete
> > > > > > > >> > > > > > > > >> requests.
> > > > > > > >> > > > > > > > >>>>> Thoughts?
> > > > > > > >> > > > > > > > >>>>>
> > > > > > > >> > > > > > > > >>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>> Lucas
> > > > > > > >> > > > > > > > >>>>>
> > > > > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM,
> Mayuresh
> > > > > Gharat
> > > > > > <
> > > > > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > >> > > > > > > > >>>>>
> > > > > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in
> > > case
> > > > of
> > > > > > > >> > connection
> > > > > > > >> > > > > > loss, I
> > > > > > > >> > > > > > > > >>>> think.
> > > > > > > >> > > > > > > > >>>>>>
> > > > > > > >> > > > > > > > >>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>
> > > > > > > >> > > > > > > > >>>>>> Mayuresh
> > > > > > > >> > > > > > > > >>>>>>
> > > > > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM
> Mayuresh
> > > > > Gharat
> > > > > > <
> > > > > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>
> > > > > > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order
> > > > processing
> > > > > > can
> > > > > > > >> > happen
> > > > > > > >> > > > > with
> > > > > > > >> > > > > > > > >>>> having 2
> > > > > > > >> > > > > > > > >>>>>>> separate queues as well and it can
> even
> > > > happen
> > > > > > > >> today.
> > > > > > > >> > > > > > > > >>>>>>> Can we use the correlationId in the
> > > request
> > > > > from
> > > > > > > the
> > > > > > > >> > > > > controller
> > > > > > > >> > > > > > > > >> to
> > > > > > > >> > > > > > > > >>>> the
> > > > > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > >> > > > > > > > >>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket
> > > Qin <
> > > > > > > >> > > > > > becket.qin@gmail.com
> > > > > > > >> > > > > > > > >>>
> > > > > > > >> > > > > > > > >>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a
> > > dedicated
> > > > > > > >> controller
> > > > > > > >> > > > > request
> > > > > > > >> > > > > > > > >>>> handling
> > > > > > > >> > > > > > > > >>>>>>>> thread would be a better isolation.
> It
> > > also
> > > > > > > solves
> > > > > > > >> the
> > > > > > > >> > > > > > > > >> reordering
> > > > > > > >> > > > > > > > >>>>> issue.
> > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel
> > > > Koshy <
> > > > > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>> Good example. I think this scenario
> > can
> > > > > occur
> > > > > > in
> > > > > > > >> the
> > > > > > > >> > > > > current
> > > > > > > >> > > > > > > > >>> code
> > > > > > > >> > > > > > > > >>>> as
> > > > > > > >> > > > > > > > >>>>>>>> well
> > > > > > > >> > > > > > > > >>>>>>>>> but with even lower probability
> given
> > > that
> > > > > > there
> > > > > > > >> are
> > > > > > > >> > > > other
> > > > > > > >> > > > > > > > >>>>>>>> non-controller
> > > > > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still
> > > sketchy
> > > > > > though
> > > > > > > >> and
> > > > > > > >> > I
> > > > > > > >> > > > > think
> > > > > > > >> > > > > > a
> > > > > > > >> > > > > > > > >>>> safer
> > > > > > > >> > > > > > > > >>>>>>>>> approach would be separate queues
> and
> > > > > pinning
> > > > > > > >> > > controller
> > > > > > > >> > > > > > > > >> request
> > > > > > > >> > > > > > > > >>>>>>>> handling
> > > > > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM,
> Dong
> > > > Lin <
> > > > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > > > >> > > > > > > > >>>>
> > > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> I think you are right that there
> may
> > be
> > > > > > > >> out-of-order
> > > > > > > >> > > > > > > > >>> processing.
> > > > > > > >> > > > > > > > >>>>>>>> However,
> > > > > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order
> processing
> > > may
> > > > > > also
> > > > > > > >> > happen
> > > > > > > >> > > > even
> > > > > > > >> > > > > > > > >> if
> > > > > > > >> > > > > > > > >>> we
> > > > > > > >> > > > > > > > >>>>>> use a
> > > > > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> > > > disconnected
> > > > > > > before
> > > > > > > >> > > > > receiving
> > > > > > > >> > > > > > > > >>>>>> response.
> > > > > > > >> > > > > > > > >>>>>>>>> Then
> > > > > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both
> > > requests
> > > > > now
> > > > > > > >> stay
> > > > > > > >> > in
> > > > > > > >> > > > the
> > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>> request queue in the order they are
> > > sent.
> > > > > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the
> request
> > > > queue
> > > > > > and
> > > > > > > >> then
> > > > > > > >> > > > > thread2
> > > > > > > >> > > > > > > > >>>> takes
> > > > > > > >> > > > > > > > >>>>>> R2
> > > > > > > >> > > > > > > > >>>>>>>>> from
> > > > > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the
> same
> > > > time.
> > > > > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in
> > > > parallel.
> > > > > > > >> There is
> > > > > > > >> > > > > chance
> > > > > > > >> > > > > > > > >>> that
> > > > > > > >> > > > > > > > >>>>>> R2's
> > > > > > > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can
> happen
> > > for
> > > > > > both
> > > > > > > >> > > > approaches
> > > > > > > >> > > > > > > > >> with
> > > > > > > >> > > > > > > > >>>>> very
> > > > > > > >> > > > > > > > >>>>>>>> low
> > > > > > > >> > > > > > > > >>>>>>>>>> probability, it may not be
> worthwhile
> > > to
> > > > > add
> > > > > > > the
> > > > > > > >> > extra
> > > > > > > >> > > > > > > > >> queue.
> > > > > > > >> > > > > > > > >>>> What
> > > > > > > >> > > > > > > > >>>>>> do
> > > > > > > >> > > > > > > > >>>>>>>> you
> > > > > > > >> > > > > > > > >>>>>>>>>> think?
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM,
> > Becket
> > > > > Qin <
> > > > > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > > > > >> > > > > > > > >>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a
> > dequeue
> > > > was
> > > > > > > >> bright
> > > > > > > >> > up
> > > > > > > >> > > > some
> > > > > > > >> > > > > > > > >>> time
> > > > > > > >> > > > > > > > >>>>> ago
> > > > > > > >> > > > > > > > >>>>>>>> when
> > > > > > > >> > > > > > > > >>>>>>>>>> we
> > > > > > > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing
> > the
> > > > > > > request.
> > > > > > > >> The
> > > > > > > >> > > > > > > > >> concern
> > > > > > > >> > > > > > > > >>>> was
> > > > > > > >> > > > > > > > >>>>>> that
> > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed
> to
> > be
> > > > > > > >> processed in
> > > > > > > >> > > > > order.
> > > > > > > >> > > > > > > > >>> If
> > > > > > > >> > > > > > > > >>>> we
> > > > > > > >> > > > > > > > >>>>>> can
> > > > > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > > > > >> > > > > > > > >>>>>>>>>>> that there is one controller
> request
> > > in
> > > > > the
> > > > > > > >> request
> > > > > > > >> > > > > > > > >> channel,
> > > > > > > >> > > > > > > > >>>> the
> > > > > > > >> > > > > > > > >>>>>>>> order
> > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that
> > there
> > > > are
> > > > > > > more
> > > > > > > >> > than
> > > > > > > >> > > > one
> > > > > > > >> > > > > > > > >>>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>> request
> > > > > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the
> > > controller
> > > > > > > request
> > > > > > > >> > order
> > > > > > > >> > > > may
> > > > > > > >> > > > > > > > >>>> change
> > > > > > > >> > > > > > > > >>>>>> and
> > > > > > > >> > > > > > > > >>>>>>>>>> cause
> > > > > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think about
> > the
> > > > > > > following
> > > > > > > >> > > > sequence:
> > > > > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a
> > > > request
> > > > > R1
> > > > > > > to
> > > > > > > >> > > broker
> > > > > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the
> > > > request
> > > > > to
> > > > > > > the
> > > > > > > >> > head
> > > > > > > >> > > > of
> > > > > > > >> > > > > > > > >> the
> > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection
> > > > failed
> > > > > > and
> > > > > > > >> the
> > > > > > > >> > > > > > > > >> controller
> > > > > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > > > > >> > > > > > > > >>>>>>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2
> to
> > > the
> > > > > > broker
> > > > > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it
> to
> > > the
> > > > > head
> > > > > > > of
> > > > > > > >> the
> > > > > > > >> > > > > > > > >> request
> > > > > > > >> > > > > > > > >>>>> queue.
> > > > > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be
> > > > > processed
> > > > > > > >> before
> > > > > > > >> > > R1
> > > > > > > >> > > > is
> > > > > > > >> > > > > > > > >>>>>> processed,
> > > > > > > >> > > > > > > > >>>>>>>>>> which
> > > > > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM,
> > Joel
> > > > > Koshy
> > > > > > <
> > > > > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It
> > > > appears
> > > > > to
> > > > > > > be
> > > > > > > >> a
> > > > > > > >> > > > simpler
> > > > > > > >> > > > > > > > >>>> less
> > > > > > > >> > > > > > > > >>>>>>>>> invasive
> > > > > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > > > > > >> Jun/Becket/others,
> > > > > > > >> > > do
> > > > > > > >> > > > > > > > >> you
> > > > > > > >> > > > > > > > >>>> see
> > > > > > > >> > > > > > > > >>>>>> any
> > > > > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM,
> > > Lucas
> > > > > > Wang
> > > > > > > <
> > > > > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea
> > that
> > > I
> > > > > > > haven't
> > > > > > > >> > > thought
> > > > > > > >> > > > > > > > >>>>> before.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at
> > > hand
> > > > > > pretty
> > > > > > > >> > well,
> > > > > > > >> > > > and
> > > > > > > >> > > > > > > > >>>> also
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new
> size
> > > > > metric
> > > > > > > and
> > > > > > > >> > > > capacity
> > > > > > > >> > > > > > > > >>>>> config
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request
> queue.
> > In
> > > > > fact,
> > > > > > > if
> > > > > > > >> we
> > > > > > > >> > > were
> > > > > > > >> > > > > > > > >> to
> > > > > > > >> > > > > > > > >>>>> adopt
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public
> > > > > interface
> > > > > > > >> change,
> > > > > > > >> > > and
> > > > > > > >> > > > > > > > >> we
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it
> seems
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> the java class
> LinkedBlockingQueue
> > > can
> > > > > > > readily
> > > > > > > >> > > > satisfy
> > > > > > > >> > > > > > > > >>> the
> > > > > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and
> also
> > > > > > allowing
> > > > > > > >> > > inserting
> > > > > > > >> > > > > > > > >> at
> > > > > > > >> > > > > > > > >>>>> both
> > > > > > > >> > > > > > > > >>>>>>>> ends.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this
> > design
> > > is
> > > > > > tied
> > > > > > > to
> > > > > > > >> > the
> > > > > > > >> > > > > > > > >>>>> coincidence
> > > > > > > >> > > > > > > > >>>>>>>> that
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities
> and
> > > > there
> > > > > > are
> > > > > > > >> two
> > > > > > > >> > > ends
> > > > > > > >> > > > > > > > >>> to a
> > > > > > > >> > > > > > > > >>>>>>>> deque.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed
> > design,
> > > it
> > > > > > seems
> > > > > > > >> the
> > > > > > > >> > > > > > > > >> network
> > > > > > > >> > > > > > > > >>>>> layer
> > > > > > > >> > > > > > > > >>>>>>>> is
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper
> > > layer
> > > > > > logic,
> > > > > > > >> e.g.
> > > > > > > >> > > if
> > > > > > > >> > > > > > > > >> we
> > > > > > > >> > > > > > > > >>>> were
> > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>> add
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the
> > > future
> > > > > for
> > > > > > > some
> > > > > > > >> > > > reason,
> > > > > > > >> > > > > > > > >>> we
> > > > > > > >> > > > > > > > >>>>>> would
> > > > > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of
> > > > > separate
> > > > > > > >> queues,
> > > > > > > >> > > one
> > > > > > > >> > > > > > > > >>> for
> > > > > > > >> > > > > > > > >>>>> each
> > > > > > > >> > > > > > > > >>>>>>>>>> priority
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both
> > designs
> > > > and
> > > > > > > lean
> > > > > > > >> > > toward
> > > > > > > >> > > > > > > > >>> your
> > > > > > > >> > > > > > > > >>>>>>>> suggested
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested
> > new
> > > > > > design,
> > > > > > > >> I'm
> > > > > > > >> > > > > > > > >>> answering
> > > > > > > >> > > > > > > > >>>>>> your
> > > > > > > >> > > > > > > > >>>>>>>>>>> question
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I
> think
> > > > your
> > > > > > > >> > suggestion
> > > > > > > >> > > > > > > > >> makes
> > > > > > > >> > > > > > > > >>>>>> sense,
> > > > > > > >> > > > > > > > >>>>>>>> and
> > > > > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity
> config
> > > and
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default
> value
> > of
> > > > 20
> > > > > > > being
> > > > > > > >> > > > > > > > >> sufficient
> > > > > > > >> > > > > > > > >>>>>> enough.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM,
> > > > > Mayuresh
> > > > > > > >> Gharat
> > > > > > > >> > <
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here
> > is
> > > to
> > > > > > > >> prioritize
> > > > > > > >> > > the
> > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>>> request
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the
> > > > request
> > > > > > > queue
> > > > > > > >> > to a
> > > > > > > >> > > > > > > > >>>>> dequeue,
> > > > > > > >> > > > > > > > >>>>>>>> where
> > > > > > > >> > > > > > > > >>>>>>>>>> you
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal
> requests
> > > > > > (produce,
> > > > > > > >> > > > > > > > >>>> consume,..etc)
> > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>> end
> > > > > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a
> > > controller
> > > > > > > request,
> > > > > > > >> > you
> > > > > > > >> > > > > > > > >>> insert
> > > > > > > >> > > > > > > > >>>>> it
> > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>> head
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that
> the
> > > > > > controller
> > > > > > > >> > > request
> > > > > > > >> > > > > > > > >>> will
> > > > > > > >> > > > > > > > >>>>> be
> > > > > > > >> > > > > > > > >>>>>>>> given
> > > > > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one
> > request
> > > > > from
> > > > > > > the
> > > > > > > >> > > socket
> > > > > > > >> > > > > > > > >>> and
> > > > > > > >> > > > > > > > >>>>> mute
> > > > > > > >> > > > > > > > >>>>>>>> it
> > > > > > > >> > > > > > > > >>>>>>>>> and
> > > > > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the
> > > request,
> > > > > > this
> > > > > > > >> would
> > > > > > > >> > > > > > > > >>> ensure
> > > > > > > >> > > > > > > > >>>>> that
> > > > > > > >> > > > > > > > >>>>>>>> we
> > > > > > > >> > > > > > > > >>>>>>>>>> don't
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out
> of
> > > > > order.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid
> > the
> > > > > > second
> > > > > > > >> queue
> > > > > > > >> > > and
> > > > > > > >> > > > > > > > >>> the
> > > > > > > >> > > > > > > > >>>>>>>>> additional
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM
> > > > Becket
> > > > > > Qin
> > > > > > > <
> > > > > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail
> > explanation.
> > > I
> > > > > > agree
> > > > > > > >> the
> > > > > > > >> > > > > > > > >>> current
> > > > > > > >> > > > > > > > >>>>>> design
> > > > > > > >> > > > > > > > >>>>>>>>>> makes
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether
> > the
> > > > new
> > > > > > > config
> > > > > > > >> > for
> > > > > > > >> > > > > > > > >> the
> > > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I
> cannot
> > > > think
> > > > > > of a
> > > > > > > >> case
> > > > > > > >> > > in
> > > > > > > >> > > > > > > > >>>> which
> > > > > > > >> > > > > > > > >>>>>>>> users
> > > > > > > >> > > > > > > > >>>>>>>>>>> would
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00
> PM,
> > > > > Becket
> > > > > > > Qin
> > > > > > > >> <
> > > > > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be
> > > > rephrased
> > > > > to
> > > > > > > >> "do we
> > > > > > > >> > > > > > > > >>>> expect
> > > > > > > >> > > > > > > > >>>>>>>> user to
> > > > > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue
> > > > > capacity"?
> > > > > > > If
> > > > > > > >> we
> > > > > > > >> > > > > > > > >>> agree
> > > > > > > >> > > > > > > > >>>>> that
> > > > > > > >> > > > > > > > >>>>>>>> 20
> > > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number
> > and
> > > we
> > > > > do
> > > > > > > not
> > > > > > > >> > > > > > > > >> expect
> > > > > > > >> > > > > > > > >>>> user
> > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>>>> change
> > > > > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose
> this
> > > as a
> > > > > > > config?
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29
> > AM,
> > > > > Lucas
> > > > > > > >> Wang <
> > > > > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment.
> You
> > > are
> > > > > > right
> > > > > > > >> that
> > > > > > > >> > > > > > > > >>>>> normally
> > > > > > > >> > > > > > > > >>>>>>>> there
> > > > > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request
> because
> > > of
> > > > > > > muting,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to
> say
> > > > there
> > > > > > > would
> > > > > > > >> be
> > > > > > > >> > > > > > > > >> many
> > > > > > > >> > > > > > > > >>>>>>>> enqueued
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP
> again,
> > > and
> > > > > I'm
> > > > > > > not
> > > > > > > >> > sure
> > > > > > > >> > > > > > > > >>>> which
> > > > > > > >> > > > > > > > >>>>>> part
> > > > > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if
> you
> > > > point
> > > > > it
> > > > > > > out
> > > > > > > >> > the
> > > > > > > >> > > > > > > > >>>>> section.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not
> happen
> > > in
> > > > > > normal
> > > > > > > >> > > > > > > > >>>> conditions,
> > > > > > > >> > > > > > > > >>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>> current
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> > > > > controllers
> > > > > > > >> > running
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if
> we
> > > > don't
> > > > > > have
> > > > > > > >> the
> > > > > > > >> > > > > > > > >>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its
> > > > capacity
> > > > > to
> > > > > > > be
> > > > > > > >> 1,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling
> > > requests
> > > > > from
> > > > > > > >> > > > > > > > >> different
> > > > > > > >> > > > > > > > >>>>>>>> controllers
> > > > > > > >> > > > > > > > >>>>>>>>>>> will
> > > > > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those
> > troublesome
> > > > > > times,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what
> we
> > > > want.
> > > > > On
> > > > > > > the
> > > > > > > >> > > > > > > > >> other
> > > > > > > >> > > > > > > > >>>>> hand,
> > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value,
> > say
> > > > 20,
> > > > > > > >> guards
> > > > > > > >> > us
> > > > > > > >> > > > > > > > >>> from
> > > > > > > >> > > > > > > > >>>>>>>> issues
> > > > > > > >> > > > > > > > >>>>>>>>> in
> > > > > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO
> > there
> > > > > isn't
> > > > > > > >> much
> > > > > > > >> > > > > > > > >>>> downside
> > > > > > > >> > > > > > > > >>>>> of
> > > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is
> > an
> > > > > > obsolete
> > > > > > > >> > > > > > > > >>> statement
> > > > > > > >> > > > > > > > >>>>>> based
> > > > > > > >> > > > > > > > >>>>>>>> on
> > > > > > > >> > > > > > > > >>>>>>>>> a
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the
> > wording
> > > > in
> > > > > > the
> > > > > > > >> KIP.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> 10:33
> > > AM,
> > > > > > > Mayuresh
> > > > > > > >> > > > > > > > >>> Gharat <
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com>
> > > > wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand
> why
> > > you
> > > > > > think
> > > > > > > >> "The
> > > > > > > >> > > > > > > > >>>> memory
> > > > > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of
> > > queued
> > > > > > > requests
> > > > > > > >> > can
> > > > > > > >> > > > > > > > >>> go
> > > > > > > >> > > > > > > > >>>> up
> > > > > > > >> > > > > > > > >>>>>> to
> > > > > > > >> > > > > > > > >>>>>>>> 2x"
> > > > > > > >> > > > > > > > >>>>>>>>>> in
> > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the
> > requests
> > > > from
> > > > > > > >> > > > > > > > >> controller
> > > > > > > >> > > > > > > > >>>> to a
> > > > > > > >> > > > > > > > >>>>>>>> Broker
> > > > > > > >> > > > > > > > >>>>>>>>>> are
> > > > > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at
> 5:06
> > AM
> > > > > > Becket
> > > > > > > >> Qin <
> > > > > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> > > > > > Separating
> > > > > > > >> the
> > > > > > > >> > > > > > > > >>>> control
> > > > > > > >> > > > > > > > >>>>>>>> plane
> > > > > > > >> > > > > > > > >>>>>>>>>> from
> > > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned
> > that
> > > > the
> > > > > > > >> > > > > > > > >> controller
> > > > > > > >> > > > > > > > >>>>>> request
> > > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > > >> > > > > > > > >>>>>>>>>>> may
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this
> > be a
> > > > > > common
> > > > > > > >> case?
> > > > > > > >> > > > > > > > >>> The
> > > > > > > >> > > > > > > > >>>>>>>>> controller
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the
> > SocketServer.
> > > > The
> > > > > > > >> > > > > > > > >>> SocketServer
> > > > > > > >> > > > > > > > >>>>>> will
> > > > > > > >> > > > > > > > >>>>>>>>> mute
> > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put
> > into
> > > > the
> > > > > > > >> request
> > > > > > > >> > > > > > > > >>>>> channel.
> > > > > > > >> > > > > > > > >>>>>>>> So
> > > > > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection
> between
> > > > > > controller
> > > > > > > >> and
> > > > > > > >> > > > > > > > >>> each
> > > > > > > >> > > > > > > > >>>>>>>> broker,
> > > > > > > >> > > > > > > > >>>>>>>>> on
> > > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > > >> > > > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Becket,

I've asked for review by Jun and Joel in the vote thread.
Regarding the separate thread and port, I did talk about it in the rejected
alternative design 1.
Please let me know if you'd like more elaboration or moving it to the
motivation, etc.

Thanks,
Lucas

On Wed, Aug 8, 2018 at 3:59 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> Yes, a separate Jira is OK.
>
> Since the proposal has significantly changed since the initial vote
> started. We probably should let the others who have already voted know and
> ensure they are happy with the updated proposal.
> Also, it seems the motivation part of the KIP wiki is still just talking
> about the separate queue and not fully cover the changes we make now, e.g.
> separate thread, port, etc. We might want to explain a bit more so for
> people who did not follow the discussion mail thread also understand the
> whole proposal.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Aug 8, 2018 at 12:44 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Becket,
> >
> > Thanks for the review. The current write up in the KIP won’t change the
> > ordering behavior. Are you ok with addressing that as a separate
> > independent issue (I’ll create a separate ticket for it)?
> > If so, can you please give me a +1 on the vote thread?
> >
> > Thanks,
> > Lucas
> >
> > On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com> wrote:
> >
> > > Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
> > >
> > > It might be an implementation detail, but do we still plan to use the
> > > correlation id to ensure the request processing order?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Thanks for your review, Dong.
> > > > Ack that these configs will have a bigger impact for users.
> > > >
> > > > On the other hand, I would argue that the request queue becoming full
> > > > may or may not be a rare scenario.
> > > > How often the request queue gets full depends on the request incoming
> > > rate,
> > > > the request processing rate, and the size of the request queue.
> > > > When that happens, the dedicated endpoints design can better handle
> > > > it than any of the previously discussed options.
> > > >
> > > > Another reason I made the change was that I have the same taste
> > > > as Becket that it's a better separation of the control plane from the
> > > data
> > > > plane.
> > > >
> > > > Finally, I want to clarify that this change is NOT motivated by the
> > > > out-of-order
> > > > processing discussion. The latter problem is orthogonal to this KIP,
> > and
> > > it
> > > > can happen in any of the design options we discussed for this KIP so
> > far.
> > > > So I'd like to address out-of-order processing separately in another
> > > > thread,
> > > > and avoid mentioning it in this KIP.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com>
> wrote:
> > > >
> > > > > Hey Lucas,
> > > > >
> > > > > Thanks for the update.
> > > > >
> > > > > The current KIP propose new broker configs
> "listeners.for.controller"
> > > and
> > > > > "advertised.listeners.for.controller". This is going to be a big
> > change
> > > > > since listeners are among the most important configs that every
> user
> > > > needs
> > > > > to change. According to the rejected alternative section, it seems
> > that
> > > > the
> > > > > reason to add these two configs is to improve performance when the
> > data
> > > > > request queue is full rather than for correctness. It should be a
> > very
> > > > rare
> > > > > scenario and I am not sure we should add configs for all users just
> > to
> > > > > improve the performance in such rare scenario.
> > > > >
> > > > > Also, if the new design is based on the issues which are discovered
> > in
> > > > the
> > > > > recent discussion, e.g. out of order processing if we don't use a
> > > > dedicated
> > > > > thread for controller request, it may be useful to explain the
> > problem
> > > in
> > > > > the motivation section.
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > A kind reminder for review of this KIP.
> > > > > >
> > > > > > Thank you very much!
> > > > > > Lucas
> > > > > >
> > > > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I've updated the KIP by adding the dedicated endpoints for
> > > controller
> > > > > > > connections,
> > > > > > > and pinning threads for controller requests.
> > > > > > > Also I've updated the title of this KIP. Please take a look and
> > let
> > > > me
> > > > > > > know your feedback.
> > > > > > >
> > > > > > > Thanks a lot for your time!
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > >
> > > > > > >> Hi Lucas,
> > > > > > >> I agree, if we want to go forward with a separate controller
> > plane
> > > > and
> > > > > > >> data
> > > > > > >> plane and completely isolate them, having a separate port for
> > > > > controller
> > > > > > >> with a separate Acceptor and a Processor sounds ideal to me.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Mayuresh
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi Lucas,
> > > > > > >> >
> > > > > > >> > Yes, I agree that a dedicated end to end control flow would
> be
> > > > > ideal.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Jiangjie (Becket) Qin
> > > > > > >> >
> > > > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Thanks for the comment, Becket.
> > > > > > >> > > So far, we've been trying to avoid making any request
> > handler
> > > > > thread
> > > > > > >> > > special.
> > > > > > >> > > But if we were to follow that path in order to make the
> two
> > > > planes
> > > > > > >> more
> > > > > > >> > > isolated,
> > > > > > >> > > what do you think about also having a dedicated processor
> > > > thread,
> > > > > > >> > > and dedicated port for the controller?
> > > > > > >> > >
> > > > > > >> > > Today one processor thread can handle multiple
> connections,
> > > > let's
> > > > > > say
> > > > > > >> 100
> > > > > > >> > > connections
> > > > > > >> > >
> > > > > > >> > > represented by connection0, ... connection99, among which
> > > > > > >> connection0-98
> > > > > > >> > > are from clients, while connection99 is from
> > > > > > >> > >
> > > > > > >> > > the controller. Further let's say after one selector
> > polling,
> > > > > there
> > > > > > >> are
> > > > > > >> > > incoming requests on all connections.
> > > > > > >> > >
> > > > > > >> > > When the request queue is full, (either the data request
> > being
> > > > > full
> > > > > > in
> > > > > > >> > the
> > > > > > >> > > two queue design, or
> > > > > > >> > >
> > > > > > >> > > the one single queue being full in the deque design), the
> > > > > processor
> > > > > > >> > thread
> > > > > > >> > > will be blocked first
> > > > > > >> > >
> > > > > > >> > > when trying to enqueue the data request from connection0,
> > then
> > > > > > >> possibly
> > > > > > >> > > blocked for the data request
> > > > > > >> > >
> > > > > > >> > > from connection1, ... etc even though the controller
> request
> > > is
> > > > > > ready
> > > > > > >> to
> > > > > > >> > be
> > > > > > >> > > enqueued.
> > > > > > >> > >
> > > > > > >> > > To solve this problem, it seems we would need to have a
> > > separate
> > > > > > port
> > > > > > >> > > dedicated to
> > > > > > >> > >
> > > > > > >> > > the controller, a dedicated processor thread, a dedicated
> > > > > controller
> > > > > > >> > > request queue,
> > > > > > >> > >
> > > > > > >> > > and pinning of one request handler thread for controller
> > > > requests.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Lucas
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > > > becket.qin@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Personally I am not fond of the dequeue approach simply
> > > > because
> > > > > it
> > > > > > >> is
> > > > > > >> > > > against the basic idea of isolating the controller plane
> > and
> > > > > data
> > > > > > >> > plane.
> > > > > > >> > > > With a single dequeue, theoretically speaking the
> > controller
> > > > > > >> requests
> > > > > > >> > can
> > > > > > >> > > > starve the clients requests. I would prefer the approach
> > > with
> > > > a
> > > > > > >> > separate
> > > > > > >> > > > controller request queue and a dedicated controller
> > request
> > > > > > handler
> > > > > > >> > > thread.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > >
> > > > > > >> > > > Jiangjie (Becket) Qin
> > > > > > >> > > >
> > > > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Sure, I can summarize the usage of correlation id. But
> > > > before
> > > > > I
> > > > > > do
> > > > > > >> > > that,
> > > > > > >> > > > it
> > > > > > >> > > > > seems
> > > > > > >> > > > > the same out-of-order processing can also happen to
> > > Produce
> > > > > > >> requests
> > > > > > >> > > sent
> > > > > > >> > > > > by producers,
> > > > > > >> > > > > following the same example you described earlier.
> > > > > > >> > > > > If that's the case, I think this probably deserves a
> > > > separate
> > > > > > doc
> > > > > > >> and
> > > > > > >> > > > > design independent of this KIP.
> > > > > > >> > > > >
> > > > > > >> > > > > Lucas
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > > > lindong28@gmail.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hey Lucas,
> > > > > > >> > > > > >
> > > > > > >> > > > > > Could you update the KIP if you are confident with
> the
> > > > > > approach
> > > > > > >> > which
> > > > > > >> > > > > uses
> > > > > > >> > > > > > correlation id? The idea around correlation id is
> kind
> > > of
> > > > > > >> scattered
> > > > > > >> > > > > across
> > > > > > >> > > > > > multiple emails. It will be useful if other reviews
> > can
> > > > read
> > > > > > the
> > > > > > >> > KIP
> > > > > > >> > > to
> > > > > > >> > > > > > understand the latest proposal.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Thanks,
> > > > > > >> > > > > > Dong
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > I like the idea of the dequeue implementation by
> > > Lucas.
> > > > > This
> > > > > > >> will
> > > > > > >> > > > help
> > > > > > >> > > > > us
> > > > > > >> > > > > > > avoid additional queue for controller and
> additional
> > > > > configs
> > > > > > >> in
> > > > > > >> > > > Kafka.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Mayuresh
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > > > >> becket.qin@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Hi Jun,
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > The usage of correlation ID might still be
> useful
> > to
> > > > > > address
> > > > > > >> > the
> > > > > > >> > > > > cases
> > > > > > >> > > > > > > > that the controller epoch and leader epoch check
> > are
> > > > not
> > > > > > >> > > sufficient
> > > > > > >> > > > > to
> > > > > > >> > > > > > > > guarantee correct behavior. For example, if the
> > > > > controller
> > > > > > >> > sends
> > > > > > >> > > a
> > > > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> > > StopReplicaRequest,
> > > > > and
> > > > > > >> the
> > > > > > >> > > > broker
> > > > > > >> > > > > > > > processes it in the reverse order, the replica
> may
> > > > still
> > > > > > be
> > > > > > >> > > wrongly
> > > > > > >> > > > > > > > recreated, right?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Thanks,
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > > > jun@confluent.io
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Hmm, since we already use controller epoch and
> > > > leader
> > > > > > >> epoch
> > > > > > >> > for
> > > > > > >> > > > > > > properly
> > > > > > >> > > > > > > > > caching the latest partition state, do we
> really
> > > > need
> > > > > > >> > > correlation
> > > > > > >> > > > > id
> > > > > > >> > > > > > > for
> > > > > > >> > > > > > > > > ordering the controller requests?
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Jun
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > > > > >> > > > becket.qin@gmail.com>
> > > > > > >> > > > > > > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> Good idea. The correlation id should work.
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> In the ControllerChannelManager, a request
> will
> > > be
> > > > > > resent
> > > > > > >> > > until
> > > > > > >> > > > a
> > > > > > >> > > > > > > > response
> > > > > > >> > > > > > > > >> is received. So if the controller to broker
> > > > > connection
> > > > > > >> > > > disconnects
> > > > > > >> > > > > > > after
> > > > > > >> > > > > > > > >> controller sends R1_a, but before the
> response
> > of
> > > > > R1_a
> > > > > > is
> > > > > > >> > > > > received,
> > > > > > >> > > > > > a
> > > > > > >> > > > > > > > >> disconnection may cause the controller to
> > resend
> > > > > R1_b.
> > > > > > >> i.e.
> > > > > > >> > > > until
> > > > > > >> > > > > R1
> > > > > > >> > > > > > > is
> > > > > > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > > > > >> > > > > > > > >> This gives two guarantees:
> > > > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1
> must
> > > > have
> > > > > > been
> > > > > > >> > > > > processed
> > > > > > >> > > > > > at
> > > > > > >> > > > > > > > >> least once.
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> So on the broker side, with a single thread
> > > > > controller
> > > > > > >> > request
> > > > > > >> > > > > > > handler,
> > > > > > >> > > > > > > > the
> > > > > > >> > > > > > > > >> logic should be:
> > > > > > >> > > > > > > > >> 1. Process what ever request seen in the
> > > controller
> > > > > > >> request
> > > > > > >> > > > queue
> > > > > > >> > > > > > > > >> 2. For the given epoch, drop request if its
> > > > > correlation
> > > > > > >> id
> > > > > > >> > is
> > > > > > >> > > > > > smaller
> > > > > > >> > > > > > > > than
> > > > > > >> > > > > > > > >> that of the last processed request.
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> Thanks,
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > > > > > >> jun@confluent.io>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > > > > >>
> > > > > > >> > > > > > > > >>> I agree that there is no strong ordering
> when
> > > > there
> > > > > > are
> > > > > > >> > more
> > > > > > >> > > > than
> > > > > > >> > > > > > one
> > > > > > >> > > > > > > > >>> socket connections. Currently, we rely on
> > > > > > >> controllerEpoch
> > > > > > >> > and
> > > > > > >> > > > > > > > leaderEpoch
> > > > > > >> > > > > > > > >>> to ensure that the receiving broker picks up
> > the
> > > > > > latest
> > > > > > >> > state
> > > > > > >> > > > for
> > > > > > >> > > > > > > each
> > > > > > >> > > > > > > > >>> partition.
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>> One potential issue with the dequeue
> approach
> > is
> > > > > that
> > > > > > if
> > > > > > >> > the
> > > > > > >> > > > > queue
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > >> full,
> > > > > > >> > > > > > > > >>> there is no guarantee that the controller
> > > requests
> > > > > > will
> > > > > > >> be
> > > > > > >> > > > > enqueued
> > > > > > >> > > > > > > > >>> quickly.
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>> Thanks,
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>> Jun
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh
> > > Gharat <
> > > > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > >> > > > > > > > >>>> wrote:
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in
> > the
> > > > > > >> > NetworkClient
> > > > > > >> > > > > > > > >> constructor.
> > > > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient
> between
> > > > > > >> Controller
> > > > > > >> > and
> > > > > > >> > > > the
> > > > > > >> > > > > > > > >> broker,
> > > > > > >> > > > > > > > >>> a
> > > > > > >> > > > > > > > >>>> disconnection should not cause it to reset
> to
> > > 0,
> > > > in
> > > > > > >> which
> > > > > > >> > > case
> > > > > > >> > > > > it
> > > > > > >> > > > > > > can
> > > > > > >> > > > > > > > >> be
> > > > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > > > >> > > > > > > > >>>>
> > > > > > >> > > > > > > > >>>> Thanks,
> > > > > > >> > > > > > > > >>>>
> > > > > > >> > > > > > > > >>>> Mayuresh
> > > > > > >> > > > > > > > >>>>
> > > > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang
> <
> > > > > > >> > > > > lucasatucla@gmail.com
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > >>> wrote:
> > > > > > >> > > > > > > > >>>>
> > > > > > >> > > > > > > > >>>>> @Dong,
> > > > > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > > > >> > > > > > > > >>>>>
> > > > > > >> > > > > > > > >>>>> @All
> > > > > > >> > > > > > > > >>>>> Regarding the example given by Dong, it
> > seems
> > > > even
> > > > > > if
> > > > > > >> we
> > > > > > >> > > use
> > > > > > >> > > > a
> > > > > > >> > > > > > > queue,
> > > > > > >> > > > > > > > >>>> and a
> > > > > > >> > > > > > > > >>>>> dedicated controller request handling
> > thread,
> > > > > > >> > > > > > > > >>>>> the same result can still happen because
> > R1_a
> > > > will
> > > > > > be
> > > > > > >> > sent
> > > > > > >> > > on
> > > > > > >> > > > > one
> > > > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on
> a
> > > > > > different
> > > > > > >> > > > > connection,
> > > > > > >> > > > > > > > >>>>> and there is no ordering between different
> > > > > > >> connections on
> > > > > > >> > > the
> > > > > > >> > > > > > > broker
> > > > > > >> > > > > > > > >>>> side.
> > > > > > >> > > > > > > > >>>>> I was discussing with Mayuresh offline,
> and
> > it
> > > > > seems
> > > > > > >> > > > > correlation
> > > > > > >> > > > > > id
> > > > > > >> > > > > > > > >>>> within
> > > > > > >> > > > > > > > >>>>> the same NetworkClient object is
> > monotonically
> > > > > > >> increasing
> > > > > > >> > > and
> > > > > > >> > > > > > never
> > > > > > >> > > > > > > > >>>> reset,
> > > > > > >> > > > > > > > >>>>> hence a broker can leverage that to
> properly
> > > > > reject
> > > > > > >> > > obsolete
> > > > > > >> > > > > > > > >> requests.
> > > > > > >> > > > > > > > >>>>> Thoughts?
> > > > > > >> > > > > > > > >>>>>
> > > > > > >> > > > > > > > >>>>> Thanks,
> > > > > > >> > > > > > > > >>>>> Lucas
> > > > > > >> > > > > > > > >>>>>
> > > > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh
> > > > Gharat
> > > > > <
> > > > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > >> > > > > > > > >>>>>
> > > > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in
> > case
> > > of
> > > > > > >> > connection
> > > > > > >> > > > > > loss, I
> > > > > > >> > > > > > > > >>>> think.
> > > > > > >> > > > > > > > >>>>>>
> > > > > > >> > > > > > > > >>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>
> > > > > > >> > > > > > > > >>>>>> Mayuresh
> > > > > > >> > > > > > > > >>>>>>
> > > > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh
> > > > Gharat
> > > > > <
> > > > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>
> > > > > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order
> > > processing
> > > > > can
> > > > > > >> > happen
> > > > > > >> > > > > with
> > > > > > >> > > > > > > > >>>> having 2
> > > > > > >> > > > > > > > >>>>>>> separate queues as well and it can even
> > > happen
> > > > > > >> today.
> > > > > > >> > > > > > > > >>>>>>> Can we use the correlationId in the
> > request
> > > > from
> > > > > > the
> > > > > > >> > > > > controller
> > > > > > >> > > > > > > > >> to
> > > > > > >> > > > > > > > >>>> the
> > > > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > >> > > > > > > > >>>>>>>
> > > > > > >> > > > > > > > >>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>
> > > > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > > > >> > > > > > > > >>>>>>>
> > > > > > >> > > > > > > > >>>>>>>
> > > > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket
> > Qin <
> > > > > > >> > > > > > becket.qin@gmail.com
> > > > > > >> > > > > > > > >>>
> > > > > > >> > > > > > > > >>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>
> > > > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a
> > dedicated
> > > > > > >> controller
> > > > > > >> > > > > request
> > > > > > >> > > > > > > > >>>> handling
> > > > > > >> > > > > > > > >>>>>>>> thread would be a better isolation. It
> > also
> > > > > > solves
> > > > > > >> the
> > > > > > >> > > > > > > > >> reordering
> > > > > > >> > > > > > > > >>>>> issue.
> > > > > > >> > > > > > > > >>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel
> > > Koshy <
> > > > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>> Good example. I think this scenario
> can
> > > > occur
> > > > > in
> > > > > > >> the
> > > > > > >> > > > > current
> > > > > > >> > > > > > > > >>> code
> > > > > > >> > > > > > > > >>>> as
> > > > > > >> > > > > > > > >>>>>>>> well
> > > > > > >> > > > > > > > >>>>>>>>> but with even lower probability given
> > that
> > > > > there
> > > > > > >> are
> > > > > > >> > > > other
> > > > > > >> > > > > > > > >>>>>>>> non-controller
> > > > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still
> > sketchy
> > > > > though
> > > > > > >> and
> > > > > > >> > I
> > > > > > >> > > > > think
> > > > > > >> > > > > > a
> > > > > > >> > > > > > > > >>>> safer
> > > > > > >> > > > > > > > >>>>>>>>> approach would be separate queues and
> > > > pinning
> > > > > > >> > > controller
> > > > > > >> > > > > > > > >> request
> > > > > > >> > > > > > > > >>>>>>>> handling
> > > > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong
> > > Lin <
> > > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > > >> > > > > > > > >>>>
> > > > > > >> > > > > > > > >>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> I think you are right that there may
> be
> > > > > > >> out-of-order
> > > > > > >> > > > > > > > >>> processing.
> > > > > > >> > > > > > > > >>>>>>>> However,
> > > > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing
> > may
> > > > > also
> > > > > > >> > happen
> > > > > > >> > > > even
> > > > > > >> > > > > > > > >> if
> > > > > > >> > > > > > > > >>> we
> > > > > > >> > > > > > > > >>>>>> use a
> > > > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> > > disconnected
> > > > > > before
> > > > > > >> > > > > receiving
> > > > > > >> > > > > > > > >>>>>> response.
> > > > > > >> > > > > > > > >>>>>>>>> Then
> > > > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both
> > requests
> > > > now
> > > > > > >> stay
> > > > > > >> > in
> > > > > > >> > > > the
> > > > > > >> > > > > > > > >>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>> request queue in the order they are
> > sent.
> > > > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request
> > > queue
> > > > > and
> > > > > > >> then
> > > > > > >> > > > > thread2
> > > > > > >> > > > > > > > >>>> takes
> > > > > > >> > > > > > > > >>>>>> R2
> > > > > > >> > > > > > > > >>>>>>>>> from
> > > > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the same
> > > time.
> > > > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in
> > > parallel.
> > > > > > >> There is
> > > > > > >> > > > > chance
> > > > > > >> > > > > > > > >>> that
> > > > > > >> > > > > > > > >>>>>> R2's
> > > > > > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen
> > for
> > > > > both
> > > > > > >> > > > approaches
> > > > > > >> > > > > > > > >> with
> > > > > > >> > > > > > > > >>>>> very
> > > > > > >> > > > > > > > >>>>>>>> low
> > > > > > >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile
> > to
> > > > add
> > > > > > the
> > > > > > >> > extra
> > > > > > >> > > > > > > > >> queue.
> > > > > > >> > > > > > > > >>>> What
> > > > > > >> > > > > > > > >>>>>> do
> > > > > > >> > > > > > > > >>>>>>>> you
> > > > > > >> > > > > > > > >>>>>>>>>> think?
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM,
> Becket
> > > > Qin <
> > > > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > > > >> > > > > > > > >>>>>>
> > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a
> dequeue
> > > was
> > > > > > >> bright
> > > > > > >> > up
> > > > > > >> > > > some
> > > > > > >> > > > > > > > >>> time
> > > > > > >> > > > > > > > >>>>> ago
> > > > > > >> > > > > > > > >>>>>>>> when
> > > > > > >> > > > > > > > >>>>>>>>>> we
> > > > > > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing
> the
> > > > > > request.
> > > > > > >> The
> > > > > > >> > > > > > > > >> concern
> > > > > > >> > > > > > > > >>>> was
> > > > > > >> > > > > > > > >>>>>> that
> > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed to
> be
> > > > > > >> processed in
> > > > > > >> > > > > order.
> > > > > > >> > > > > > > > >>> If
> > > > > > >> > > > > > > > >>>> we
> > > > > > >> > > > > > > > >>>>>> can
> > > > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > > > >> > > > > > > > >>>>>>>>>>> that there is one controller request
> > in
> > > > the
> > > > > > >> request
> > > > > > >> > > > > > > > >> channel,
> > > > > > >> > > > > > > > >>>> the
> > > > > > >> > > > > > > > >>>>>>>> order
> > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that
> there
> > > are
> > > > > > more
> > > > > > >> > than
> > > > > > >> > > > one
> > > > > > >> > > > > > > > >>>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>> request
> > > > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the
> > controller
> > > > > > request
> > > > > > >> > order
> > > > > > >> > > > may
> > > > > > >> > > > > > > > >>>> change
> > > > > > >> > > > > > > > >>>>>> and
> > > > > > >> > > > > > > > >>>>>>>>>> cause
> > > > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think about
> the
> > > > > > following
> > > > > > >> > > > sequence:
> > > > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a
> > > request
> > > > R1
> > > > > > to
> > > > > > >> > > broker
> > > > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the
> > > request
> > > > to
> > > > > > the
> > > > > > >> > head
> > > > > > >> > > > of
> > > > > > >> > > > > > > > >> the
> > > > > > >> > > > > > > > >>>>>> request
> > > > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection
> > > failed
> > > > > and
> > > > > > >> the
> > > > > > >> > > > > > > > >> controller
> > > > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > > > >> > > > > > > > >>>>>>>>>> to
> > > > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to
> > the
> > > > > broker
> > > > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to
> > the
> > > > head
> > > > > > of
> > > > > > >> the
> > > > > > >> > > > > > > > >> request
> > > > > > >> > > > > > > > >>>>> queue.
> > > > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be
> > > > processed
> > > > > > >> before
> > > > > > >> > > R1
> > > > > > >> > > > is
> > > > > > >> > > > > > > > >>>>>> processed,
> > > > > > >> > > > > > > > >>>>>>>>>> which
> > > > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM,
> Joel
> > > > Koshy
> > > > > <
> > > > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It
> > > appears
> > > > to
> > > > > > be
> > > > > > >> a
> > > > > > >> > > > simpler
> > > > > > >> > > > > > > > >>>> less
> > > > > > >> > > > > > > > >>>>>>>>> invasive
> > > > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > > > > >> Jun/Becket/others,
> > > > > > >> > > do
> > > > > > >> > > > > > > > >> you
> > > > > > >> > > > > > > > >>>> see
> > > > > > >> > > > > > > > >>>>>> any
> > > > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM,
> > Lucas
> > > > > Wang
> > > > > > <
> > > > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea
> that
> > I
> > > > > > haven't
> > > > > > >> > > thought
> > > > > > >> > > > > > > > >>>>> before.
> > > > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at
> > hand
> > > > > pretty
> > > > > > >> > well,
> > > > > > >> > > > and
> > > > > > >> > > > > > > > >>>> also
> > > > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size
> > > > metric
> > > > > > and
> > > > > > >> > > > capacity
> > > > > > >> > > > > > > > >>>>> config
> > > > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request queue.
> In
> > > > fact,
> > > > > > if
> > > > > > >> we
> > > > > > >> > > were
> > > > > > >> > > > > > > > >> to
> > > > > > >> > > > > > > > >>>>> adopt
> > > > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public
> > > > interface
> > > > > > >> change,
> > > > > > >> > > and
> > > > > > >> > > > > > > > >> we
> > > > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > > >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue
> > can
> > > > > > readily
> > > > > > >> > > > satisfy
> > > > > > >> > > > > > > > >>> the
> > > > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also
> > > > > allowing
> > > > > > >> > > inserting
> > > > > > >> > > > > > > > >> at
> > > > > > >> > > > > > > > >>>>> both
> > > > > > >> > > > > > > > >>>>>>>> ends.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this
> design
> > is
> > > > > tied
> > > > > > to
> > > > > > >> > the
> > > > > > >> > > > > > > > >>>>> coincidence
> > > > > > >> > > > > > > > >>>>>>>> that
> > > > > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and
> > > there
> > > > > are
> > > > > > >> two
> > > > > > >> > > ends
> > > > > > >> > > > > > > > >>> to a
> > > > > > >> > > > > > > > >>>>>>>> deque.
> > > > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed
> design,
> > it
> > > > > seems
> > > > > > >> the
> > > > > > >> > > > > > > > >> network
> > > > > > >> > > > > > > > >>>>> layer
> > > > > > >> > > > > > > > >>>>>>>> is
> > > > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper
> > layer
> > > > > logic,
> > > > > > >> e.g.
> > > > > > >> > > if
> > > > > > >> > > > > > > > >> we
> > > > > > >> > > > > > > > >>>> were
> > > > > > >> > > > > > > > >>>>>> to
> > > > > > >> > > > > > > > >>>>>>>> add
> > > > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the
> > future
> > > > for
> > > > > > some
> > > > > > >> > > > reason,
> > > > > > >> > > > > > > > >>> we
> > > > > > >> > > > > > > > >>>>>> would
> > > > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of
> > > > separate
> > > > > > >> queues,
> > > > > > >> > > one
> > > > > > >> > > > > > > > >>> for
> > > > > > >> > > > > > > > >>>>> each
> > > > > > >> > > > > > > > >>>>>>>>>> priority
> > > > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both
> designs
> > > and
> > > > > > lean
> > > > > > >> > > toward
> > > > > > >> > > > > > > > >>> your
> > > > > > >> > > > > > > > >>>>>>>> suggested
> > > > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested
> new
> > > > > design,
> > > > > > >> I'm
> > > > > > >> > > > > > > > >>> answering
> > > > > > >> > > > > > > > >>>>>> your
> > > > > > >> > > > > > > > >>>>>>>>>>> question
> > > > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think
> > > your
> > > > > > >> > suggestion
> > > > > > >> > > > > > > > >> makes
> > > > > > >> > > > > > > > >>>>>> sense,
> > > > > > >> > > > > > > > >>>>>>>> and
> > > > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config
> > and
> > > > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default value
> of
> > > 20
> > > > > > being
> > > > > > >> > > > > > > > >> sufficient
> > > > > > >> > > > > > > > >>>>>> enough.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM,
> > > > Mayuresh
> > > > > > >> Gharat
> > > > > > >> > <
> > > > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here
> is
> > to
> > > > > > >> prioritize
> > > > > > >> > > the
> > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>>> request
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the
> > > request
> > > > > > queue
> > > > > > >> > to a
> > > > > > >> > > > > > > > >>>>> dequeue,
> > > > > > >> > > > > > > > >>>>>>>> where
> > > > > > >> > > > > > > > >>>>>>>>>> you
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests
> > > > > (produce,
> > > > > > >> > > > > > > > >>>> consume,..etc)
> > > > > > >> > > > > > > > >>>>>> to
> > > > > > >> > > > > > > > >>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>> end
> > > > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a
> > controller
> > > > > > request,
> > > > > > >> > you
> > > > > > >> > > > > > > > >>> insert
> > > > > > >> > > > > > > > >>>>> it
> > > > > > >> > > > > > > > >>>>>> to
> > > > > > >> > > > > > > > >>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>> head
> > > > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the
> > > > > controller
> > > > > > >> > > request
> > > > > > >> > > > > > > > >>> will
> > > > > > >> > > > > > > > >>>>> be
> > > > > > >> > > > > > > > >>>>>>>> given
> > > > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one
> request
> > > > from
> > > > > > the
> > > > > > >> > > socket
> > > > > > >> > > > > > > > >>> and
> > > > > > >> > > > > > > > >>>>> mute
> > > > > > >> > > > > > > > >>>>>>>> it
> > > > > > >> > > > > > > > >>>>>>>>> and
> > > > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the
> > request,
> > > > > this
> > > > > > >> would
> > > > > > >> > > > > > > > >>> ensure
> > > > > > >> > > > > > > > >>>>> that
> > > > > > >> > > > > > > > >>>>>>>> we
> > > > > > >> > > > > > > > >>>>>>>>>> don't
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of
> > > > order.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid
> the
> > > > > second
> > > > > > >> queue
> > > > > > >> > > and
> > > > > > >> > > > > > > > >>> the
> > > > > > >> > > > > > > > >>>>>>>>> additional
> > > > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM
> > > Becket
> > > > > Qin
> > > > > > <
> > > > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > >> > > > > > > > >>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail
> explanation.
> > I
> > > > > agree
> > > > > > >> the
> > > > > > >> > > > > > > > >>> current
> > > > > > >> > > > > > > > >>>>>> design
> > > > > > >> > > > > > > > >>>>>>>>>> makes
> > > > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether
> the
> > > new
> > > > > > config
> > > > > > >> > for
> > > > > > >> > > > > > > > >> the
> > > > > > >> > > > > > > > >>>>>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot
> > > think
> > > > > of a
> > > > > > >> case
> > > > > > >> > > in
> > > > > > >> > > > > > > > >>>> which
> > > > > > >> > > > > > > > >>>>>>>> users
> > > > > > >> > > > > > > > >>>>>>>>>>> would
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM,
> > > > Becket
> > > > > > Qin
> > > > > > >> <
> > > > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be
> > > rephrased
> > > > to
> > > > > > >> "do we
> > > > > > >> > > > > > > > >>>> expect
> > > > > > >> > > > > > > > >>>>>>>> user to
> > > > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue
> > > > capacity"?
> > > > > > If
> > > > > > >> we
> > > > > > >> > > > > > > > >>> agree
> > > > > > >> > > > > > > > >>>>> that
> > > > > > >> > > > > > > > >>>>>>>> 20
> > > > > > >> > > > > > > > >>>>>>>>> is
> > > > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number
> and
> > we
> > > > do
> > > > > > not
> > > > > > >> > > > > > > > >> expect
> > > > > > >> > > > > > > > >>>> user
> > > > > > >> > > > > > > > >>>>>> to
> > > > > > >> > > > > > > > >>>>>>>>>> change
> > > > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this
> > as a
> > > > > > config?
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29
> AM,
> > > > Lucas
> > > > > > >> Wang <
> > > > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You
> > are
> > > > > right
> > > > > > >> that
> > > > > > >> > > > > > > > >>>>> normally
> > > > > > >> > > > > > > > >>>>>>>> there
> > > > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because
> > of
> > > > > > muting,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say
> > > there
> > > > > > would
> > > > > > >> be
> > > > > > >> > > > > > > > >> many
> > > > > > >> > > > > > > > >>>>>>>> enqueued
> > > > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again,
> > and
> > > > I'm
> > > > > > not
> > > > > > >> > sure
> > > > > > >> > > > > > > > >>>> which
> > > > > > >> > > > > > > > >>>>>> part
> > > > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you
> > > point
> > > > it
> > > > > > out
> > > > > > >> > the
> > > > > > >> > > > > > > > >>>>> section.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen
> > in
> > > > > normal
> > > > > > >> > > > > > > > >>>> conditions,
> > > > > > >> > > > > > > > >>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>> current
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> > > > controllers
> > > > > > >> > running
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we
> > > don't
> > > > > have
> > > > > > >> the
> > > > > > >> > > > > > > > >>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its
> > > capacity
> > > > to
> > > > > > be
> > > > > > >> 1,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling
> > requests
> > > > from
> > > > > > >> > > > > > > > >> different
> > > > > > >> > > > > > > > >>>>>>>> controllers
> > > > > > >> > > > > > > > >>>>>>>>>>> will
> > > > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those
> troublesome
> > > > > times,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we
> > > want.
> > > > On
> > > > > > the
> > > > > > >> > > > > > > > >> other
> > > > > > >> > > > > > > > >>>>> hand,
> > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value,
> say
> > > 20,
> > > > > > >> guards
> > > > > > >> > us
> > > > > > >> > > > > > > > >>> from
> > > > > > >> > > > > > > > >>>>>>>> issues
> > > > > > >> > > > > > > > >>>>>>>>> in
> > > > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO
> there
> > > > isn't
> > > > > > >> much
> > > > > > >> > > > > > > > >>>> downside
> > > > > > >> > > > > > > > >>>>> of
> > > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is
> an
> > > > > obsolete
> > > > > > >> > > > > > > > >>> statement
> > > > > > >> > > > > > > > >>>>>> based
> > > > > > >> > > > > > > > >>>>>>>> on
> > > > > > >> > > > > > > > >>>>>>>>> a
> > > > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the
> wording
> > > in
> > > > > the
> > > > > > >> KIP.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33
> > AM,
> > > > > > Mayuresh
> > > > > > >> > > > > > > > >>> Gharat <
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com>
> > > wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why
> > you
> > > > > think
> > > > > > >> "The
> > > > > > >> > > > > > > > >>>> memory
> > > > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of
> > queued
> > > > > > requests
> > > > > > >> > can
> > > > > > >> > > > > > > > >>> go
> > > > > > >> > > > > > > > >>>> up
> > > > > > >> > > > > > > > >>>>>> to
> > > > > > >> > > > > > > > >>>>>>>> 2x"
> > > > > > >> > > > > > > > >>>>>>>>>> in
> > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the
> requests
> > > from
> > > > > > >> > > > > > > > >> controller
> > > > > > >> > > > > > > > >>>> to a
> > > > > > >> > > > > > > > >>>>>>>> Broker
> > > > > > >> > > > > > > > >>>>>>>>>> are
> > > > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06
> AM
> > > > > Becket
> > > > > > >> Qin <
> > > > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> > > > > Separating
> > > > > > >> the
> > > > > > >> > > > > > > > >>>> control
> > > > > > >> > > > > > > > >>>>>>>> plane
> > > > > > >> > > > > > > > >>>>>>>>>> from
> > > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned
> that
> > > the
> > > > > > >> > > > > > > > >> controller
> > > > > > >> > > > > > > > >>>>>> request
> > > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > > >> > > > > > > > >>>>>>>>>>> may
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this
> be a
> > > > > common
> > > > > > >> case?
> > > > > > >> > > > > > > > >>> The
> > > > > > >> > > > > > > > >>>>>>>>> controller
> > > > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the
> SocketServer.
> > > The
> > > > > > >> > > > > > > > >>> SocketServer
> > > > > > >> > > > > > > > >>>>>> will
> > > > > > >> > > > > > > > >>>>>>>>> mute
> > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put
> into
> > > the
> > > > > > >> request
> > > > > > >> > > > > > > > >>>>> channel.
> > > > > > >> > > > > > > > >>>>>>>> So
> > > > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between
> > > > > controller
> > > > > > >> and
> > > > > > >> > > > > > > > >>> each
> > > > > > >> > > > > > > > >>>>>>>> broker,
> > > > > > >> > > > > > > > >>>>>>>>> on
> > > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > > >> > > > > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

Yes, a separate Jira is OK.

Since the proposal has significantly changed since the initial vote
started. We probably should let the others who have already voted know and
ensure they are happy with the updated proposal.
Also, it seems the motivation part of the KIP wiki is still just talking
about the separate queue and not fully cover the changes we make now, e.g.
separate thread, port, etc. We might want to explain a bit more so for
people who did not follow the discussion mail thread also understand the
whole proposal.

Thanks,

Jiangjie (Becket) Qin

On Wed, Aug 8, 2018 at 12:44 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Becket,
>
> Thanks for the review. The current write up in the KIP won’t change the
> ordering behavior. Are you ok with addressing that as a separate
> independent issue (I’ll create a separate ticket for it)?
> If so, can you please give me a +1 on the vote thread?
>
> Thanks,
> Lucas
>
> On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
> >
> > It might be an implementation detail, but do we still plan to use the
> > correlation id to ensure the request processing order?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks for your review, Dong.
> > > Ack that these configs will have a bigger impact for users.
> > >
> > > On the other hand, I would argue that the request queue becoming full
> > > may or may not be a rare scenario.
> > > How often the request queue gets full depends on the request incoming
> > rate,
> > > the request processing rate, and the size of the request queue.
> > > When that happens, the dedicated endpoints design can better handle
> > > it than any of the previously discussed options.
> > >
> > > Another reason I made the change was that I have the same taste
> > > as Becket that it's a better separation of the control plane from the
> > data
> > > plane.
> > >
> > > Finally, I want to clarify that this change is NOT motivated by the
> > > out-of-order
> > > processing discussion. The latter problem is orthogonal to this KIP,
> and
> > it
> > > can happen in any of the design options we discussed for this KIP so
> far.
> > > So I'd like to address out-of-order processing separately in another
> > > thread,
> > > and avoid mentioning it in this KIP.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hey Lucas,
> > > >
> > > > Thanks for the update.
> > > >
> > > > The current KIP propose new broker configs "listeners.for.controller"
> > and
> > > > "advertised.listeners.for.controller". This is going to be a big
> change
> > > > since listeners are among the most important configs that every user
> > > needs
> > > > to change. According to the rejected alternative section, it seems
> that
> > > the
> > > > reason to add these two configs is to improve performance when the
> data
> > > > request queue is full rather than for correctness. It should be a
> very
> > > rare
> > > > scenario and I am not sure we should add configs for all users just
> to
> > > > improve the performance in such rare scenario.
> > > >
> > > > Also, if the new design is based on the issues which are discovered
> in
> > > the
> > > > recent discussion, e.g. out of order processing if we don't use a
> > > dedicated
> > > > thread for controller request, it may be useful to explain the
> problem
> > in
> > > > the motivation section.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > A kind reminder for review of this KIP.
> > > > >
> > > > > Thank you very much!
> > > > > Lucas
> > > > >
> > > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I've updated the KIP by adding the dedicated endpoints for
> > controller
> > > > > > connections,
> > > > > > and pinning threads for controller requests.
> > > > > > Also I've updated the title of this KIP. Please take a look and
> let
> > > me
> > > > > > know your feedback.
> > > > > >
> > > > > > Thanks a lot for your time!
> > > > > > Lucas
> > > > > >
> > > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > >
> > > > > >> Hi Lucas,
> > > > > >> I agree, if we want to go forward with a separate controller
> plane
> > > and
> > > > > >> data
> > > > > >> plane and completely isolate them, having a separate port for
> > > > controller
> > > > > >> with a separate Acceptor and a Processor sounds ideal to me.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Mayuresh
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi Lucas,
> > > > > >> >
> > > > > >> > Yes, I agree that a dedicated end to end control flow would be
> > > > ideal.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Jiangjie (Becket) Qin
> > > > > >> >
> > > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks for the comment, Becket.
> > > > > >> > > So far, we've been trying to avoid making any request
> handler
> > > > thread
> > > > > >> > > special.
> > > > > >> > > But if we were to follow that path in order to make the two
> > > planes
> > > > > >> more
> > > > > >> > > isolated,
> > > > > >> > > what do you think about also having a dedicated processor
> > > thread,
> > > > > >> > > and dedicated port for the controller?
> > > > > >> > >
> > > > > >> > > Today one processor thread can handle multiple connections,
> > > let's
> > > > > say
> > > > > >> 100
> > > > > >> > > connections
> > > > > >> > >
> > > > > >> > > represented by connection0, ... connection99, among which
> > > > > >> connection0-98
> > > > > >> > > are from clients, while connection99 is from
> > > > > >> > >
> > > > > >> > > the controller. Further let's say after one selector
> polling,
> > > > there
> > > > > >> are
> > > > > >> > > incoming requests on all connections.
> > > > > >> > >
> > > > > >> > > When the request queue is full, (either the data request
> being
> > > > full
> > > > > in
> > > > > >> > the
> > > > > >> > > two queue design, or
> > > > > >> > >
> > > > > >> > > the one single queue being full in the deque design), the
> > > > processor
> > > > > >> > thread
> > > > > >> > > will be blocked first
> > > > > >> > >
> > > > > >> > > when trying to enqueue the data request from connection0,
> then
> > > > > >> possibly
> > > > > >> > > blocked for the data request
> > > > > >> > >
> > > > > >> > > from connection1, ... etc even though the controller request
> > is
> > > > > ready
> > > > > >> to
> > > > > >> > be
> > > > > >> > > enqueued.
> > > > > >> > >
> > > > > >> > > To solve this problem, it seems we would need to have a
> > separate
> > > > > port
> > > > > >> > > dedicated to
> > > > > >> > >
> > > > > >> > > the controller, a dedicated processor thread, a dedicated
> > > > controller
> > > > > >> > > request queue,
> > > > > >> > >
> > > > > >> > > and pinning of one request handler thread for controller
> > > requests.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Lucas
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Personally I am not fond of the dequeue approach simply
> > > because
> > > > it
> > > > > >> is
> > > > > >> > > > against the basic idea of isolating the controller plane
> and
> > > > data
> > > > > >> > plane.
> > > > > >> > > > With a single dequeue, theoretically speaking the
> controller
> > > > > >> requests
> > > > > >> > can
> > > > > >> > > > starve the clients requests. I would prefer the approach
> > with
> > > a
> > > > > >> > separate
> > > > > >> > > > controller request queue and a dedicated controller
> request
> > > > > handler
> > > > > >> > > thread.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > >
> > > > > >> > > > Jiangjie (Becket) Qin
> > > > > >> > > >
> > > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Sure, I can summarize the usage of correlation id. But
> > > before
> > > > I
> > > > > do
> > > > > >> > > that,
> > > > > >> > > > it
> > > > > >> > > > > seems
> > > > > >> > > > > the same out-of-order processing can also happen to
> > Produce
> > > > > >> requests
> > > > > >> > > sent
> > > > > >> > > > > by producers,
> > > > > >> > > > > following the same example you described earlier.
> > > > > >> > > > > If that's the case, I think this probably deserves a
> > > separate
> > > > > doc
> > > > > >> and
> > > > > >> > > > > design independent of this KIP.
> > > > > >> > > > >
> > > > > >> > > > > Lucas
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > > lindong28@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > >
> > > > > >> > > > > > Could you update the KIP if you are confident with the
> > > > > approach
> > > > > >> > which
> > > > > >> > > > > uses
> > > > > >> > > > > > correlation id? The idea around correlation id is kind
> > of
> > > > > >> scattered
> > > > > >> > > > > across
> > > > > >> > > > > > multiple emails. It will be useful if other reviews
> can
> > > read
> > > > > the
> > > > > >> > KIP
> > > > > >> > > to
> > > > > >> > > > > > understand the latest proposal.
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks,
> > > > > >> > > > > > Dong
> > > > > >> > > > > >
> > > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > I like the idea of the dequeue implementation by
> > Lucas.
> > > > This
> > > > > >> will
> > > > > >> > > > help
> > > > > >> > > > > us
> > > > > >> > > > > > > avoid additional queue for controller and additional
> > > > configs
> > > > > >> in
> > > > > >> > > > Kafka.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Mayuresh
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > > >> becket.qin@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hi Jun,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > The usage of correlation ID might still be useful
> to
> > > > > address
> > > > > >> > the
> > > > > >> > > > > cases
> > > > > >> > > > > > > > that the controller epoch and leader epoch check
> are
> > > not
> > > > > >> > > sufficient
> > > > > >> > > > > to
> > > > > >> > > > > > > > guarantee correct behavior. For example, if the
> > > > controller
> > > > > >> > sends
> > > > > >> > > a
> > > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> > StopReplicaRequest,
> > > > and
> > > > > >> the
> > > > > >> > > > broker
> > > > > >> > > > > > > > processes it in the reverse order, the replica may
> > > still
> > > > > be
> > > > > >> > > wrongly
> > > > > >> > > > > > > > recreated, right?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > > jun@confluent.io
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Hmm, since we already use controller epoch and
> > > leader
> > > > > >> epoch
> > > > > >> > for
> > > > > >> > > > > > > properly
> > > > > >> > > > > > > > > caching the latest partition state, do we really
> > > need
> > > > > >> > > correlation
> > > > > >> > > > > id
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > > ordering the controller requests?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Jun
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > > > >> > > > becket.qin@gmail.com>
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Good idea. The correlation id should work.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> In the ControllerChannelManager, a request will
> > be
> > > > > resent
> > > > > >> > > until
> > > > > >> > > > a
> > > > > >> > > > > > > > response
> > > > > >> > > > > > > > >> is received. So if the controller to broker
> > > > connection
> > > > > >> > > > disconnects
> > > > > >> > > > > > > after
> > > > > >> > > > > > > > >> controller sends R1_a, but before the response
> of
> > > > R1_a
> > > > > is
> > > > > >> > > > > received,
> > > > > >> > > > > > a
> > > > > >> > > > > > > > >> disconnection may cause the controller to
> resend
> > > > R1_b.
> > > > > >> i.e.
> > > > > >> > > > until
> > > > > >> > > > > R1
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > > > >> > > > > > > > >> This gives two guarantees:
> > > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must
> > > have
> > > > > been
> > > > > >> > > > > processed
> > > > > >> > > > > > at
> > > > > >> > > > > > > > >> least once.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> So on the broker side, with a single thread
> > > > controller
> > > > > >> > request
> > > > > >> > > > > > > handler,
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> logic should be:
> > > > > >> > > > > > > > >> 1. Process what ever request seen in the
> > controller
> > > > > >> request
> > > > > >> > > > queue
> > > > > >> > > > > > > > >> 2. For the given epoch, drop request if its
> > > > correlation
> > > > > >> id
> > > > > >> > is
> > > > > >> > > > > > smaller
> > > > > >> > > > > > > > than
> > > > > >> > > > > > > > >> that of the last processed request.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Thanks,
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > > > > >> jun@confluent.io>
> > > > > >> > > > > wrote:
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >>> I agree that there is no strong ordering when
> > > there
> > > > > are
> > > > > >> > more
> > > > > >> > > > than
> > > > > >> > > > > > one
> > > > > >> > > > > > > > >>> socket connections. Currently, we rely on
> > > > > >> controllerEpoch
> > > > > >> > and
> > > > > >> > > > > > > > leaderEpoch
> > > > > >> > > > > > > > >>> to ensure that the receiving broker picks up
> the
> > > > > latest
> > > > > >> > state
> > > > > >> > > > for
> > > > > >> > > > > > > each
> > > > > >> > > > > > > > >>> partition.
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>> One potential issue with the dequeue approach
> is
> > > > that
> > > > > if
> > > > > >> > the
> > > > > >> > > > > queue
> > > > > >> > > > > > is
> > > > > >> > > > > > > > >> full,
> > > > > >> > > > > > > > >>> there is no guarantee that the controller
> > requests
> > > > > will
> > > > > >> be
> > > > > >> > > > > enqueued
> > > > > >> > > > > > > > >>> quickly.
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>> Thanks,
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>> Jun
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh
> > Gharat <
> > > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > >> > > > > > > > >>>> wrote:
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in
> the
> > > > > >> > NetworkClient
> > > > > >> > > > > > > > >> constructor.
> > > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient between
> > > > > >> Controller
> > > > > >> > and
> > > > > >> > > > the
> > > > > >> > > > > > > > >> broker,
> > > > > >> > > > > > > > >>> a
> > > > > >> > > > > > > > >>>> disconnection should not cause it to reset to
> > 0,
> > > in
> > > > > >> which
> > > > > >> > > case
> > > > > >> > > > > it
> > > > > >> > > > > > > can
> > > > > >> > > > > > > > >> be
> > > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > > >> > > > > > > > >>>>
> > > > > >> > > > > > > > >>>> Thanks,
> > > > > >> > > > > > > > >>>>
> > > > > >> > > > > > > > >>>> Mayuresh
> > > > > >> > > > > > > > >>>>
> > > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > > > >> > > > > lucasatucla@gmail.com
> > > > > >> > > > > > >
> > > > > >> > > > > > > > >>> wrote:
> > > > > >> > > > > > > > >>>>
> > > > > >> > > > > > > > >>>>> @Dong,
> > > > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > > >> > > > > > > > >>>>>
> > > > > >> > > > > > > > >>>>> @All
> > > > > >> > > > > > > > >>>>> Regarding the example given by Dong, it
> seems
> > > even
> > > > > if
> > > > > >> we
> > > > > >> > > use
> > > > > >> > > > a
> > > > > >> > > > > > > queue,
> > > > > >> > > > > > > > >>>> and a
> > > > > >> > > > > > > > >>>>> dedicated controller request handling
> thread,
> > > > > >> > > > > > > > >>>>> the same result can still happen because
> R1_a
> > > will
> > > > > be
> > > > > >> > sent
> > > > > >> > > on
> > > > > >> > > > > one
> > > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a
> > > > > different
> > > > > >> > > > > connection,
> > > > > >> > > > > > > > >>>>> and there is no ordering between different
> > > > > >> connections on
> > > > > >> > > the
> > > > > >> > > > > > > broker
> > > > > >> > > > > > > > >>>> side.
> > > > > >> > > > > > > > >>>>> I was discussing with Mayuresh offline, and
> it
> > > > seems
> > > > > >> > > > > correlation
> > > > > >> > > > > > id
> > > > > >> > > > > > > > >>>> within
> > > > > >> > > > > > > > >>>>> the same NetworkClient object is
> monotonically
> > > > > >> increasing
> > > > > >> > > and
> > > > > >> > > > > > never
> > > > > >> > > > > > > > >>>> reset,
> > > > > >> > > > > > > > >>>>> hence a broker can leverage that to properly
> > > > reject
> > > > > >> > > obsolete
> > > > > >> > > > > > > > >> requests.
> > > > > >> > > > > > > > >>>>> Thoughts?
> > > > > >> > > > > > > > >>>>>
> > > > > >> > > > > > > > >>>>> Thanks,
> > > > > >> > > > > > > > >>>>> Lucas
> > > > > >> > > > > > > > >>>>>
> > > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh
> > > Gharat
> > > > <
> > > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > >> > > > > > > > >>>>>
> > > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in
> case
> > of
> > > > > >> > connection
> > > > > >> > > > > > loss, I
> > > > > >> > > > > > > > >>>> think.
> > > > > >> > > > > > > > >>>>>>
> > > > > >> > > > > > > > >>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>
> > > > > >> > > > > > > > >>>>>> Mayuresh
> > > > > >> > > > > > > > >>>>>>
> > > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh
> > > Gharat
> > > > <
> > > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > >> > > > > > > > >>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>
> > > > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order
> > processing
> > > > can
> > > > > >> > happen
> > > > > >> > > > > with
> > > > > >> > > > > > > > >>>> having 2
> > > > > >> > > > > > > > >>>>>>> separate queues as well and it can even
> > happen
> > > > > >> today.
> > > > > >> > > > > > > > >>>>>>> Can we use the correlationId in the
> request
> > > from
> > > > > the
> > > > > >> > > > > controller
> > > > > >> > > > > > > > >> to
> > > > > >> > > > > > > > >>>> the
> > > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > >> > > > > > > > >>>>>>>
> > > > > >> > > > > > > > >>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>
> > > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > > >> > > > > > > > >>>>>>>
> > > > > >> > > > > > > > >>>>>>>
> > > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket
> Qin <
> > > > > >> > > > > > becket.qin@gmail.com
> > > > > >> > > > > > > > >>>
> > > > > >> > > > > > > > >>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>
> > > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a
> dedicated
> > > > > >> controller
> > > > > >> > > > > request
> > > > > >> > > > > > > > >>>> handling
> > > > > >> > > > > > > > >>>>>>>> thread would be a better isolation. It
> also
> > > > > solves
> > > > > >> the
> > > > > >> > > > > > > > >> reordering
> > > > > >> > > > > > > > >>>>> issue.
> > > > > >> > > > > > > > >>>>>>>>
> > > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel
> > Koshy <
> > > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > >> > > > > > > > >>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>> Good example. I think this scenario can
> > > occur
> > > > in
> > > > > >> the
> > > > > >> > > > > current
> > > > > >> > > > > > > > >>> code
> > > > > >> > > > > > > > >>>> as
> > > > > >> > > > > > > > >>>>>>>> well
> > > > > >> > > > > > > > >>>>>>>>> but with even lower probability given
> that
> > > > there
> > > > > >> are
> > > > > >> > > > other
> > > > > >> > > > > > > > >>>>>>>> non-controller
> > > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still
> sketchy
> > > > though
> > > > > >> and
> > > > > >> > I
> > > > > >> > > > > think
> > > > > >> > > > > > a
> > > > > >> > > > > > > > >>>> safer
> > > > > >> > > > > > > > >>>>>>>>> approach would be separate queues and
> > > pinning
> > > > > >> > > controller
> > > > > >> > > > > > > > >> request
> > > > > >> > > > > > > > >>>>>>>> handling
> > > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > >> > > > > > > > >>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong
> > Lin <
> > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > >> > > > > > > > >>>>
> > > > > >> > > > > > > > >>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> I think you are right that there may be
> > > > > >> out-of-order
> > > > > >> > > > > > > > >>> processing.
> > > > > >> > > > > > > > >>>>>>>> However,
> > > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing
> may
> > > > also
> > > > > >> > happen
> > > > > >> > > > even
> > > > > >> > > > > > > > >> if
> > > > > >> > > > > > > > >>> we
> > > > > >> > > > > > > > >>>>>> use a
> > > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> > disconnected
> > > > > before
> > > > > >> > > > > receiving
> > > > > >> > > > > > > > >>>>>> response.
> > > > > >> > > > > > > > >>>>>>>>> Then
> > > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both
> requests
> > > now
> > > > > >> stay
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > > > > >>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>> request queue in the order they are
> sent.
> > > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request
> > queue
> > > > and
> > > > > >> then
> > > > > >> > > > > thread2
> > > > > >> > > > > > > > >>>> takes
> > > > > >> > > > > > > > >>>>>> R2
> > > > > >> > > > > > > > >>>>>>>>> from
> > > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the same
> > time.
> > > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in
> > parallel.
> > > > > >> There is
> > > > > >> > > > > chance
> > > > > >> > > > > > > > >>> that
> > > > > >> > > > > > > > >>>>>> R2's
> > > > > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen
> for
> > > > both
> > > > > >> > > > approaches
> > > > > >> > > > > > > > >> with
> > > > > >> > > > > > > > >>>>> very
> > > > > >> > > > > > > > >>>>>>>> low
> > > > > >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile
> to
> > > add
> > > > > the
> > > > > >> > extra
> > > > > >> > > > > > > > >> queue.
> > > > > >> > > > > > > > >>>> What
> > > > > >> > > > > > > > >>>>>> do
> > > > > >> > > > > > > > >>>>>>>> you
> > > > > >> > > > > > > > >>>>>>>>>> think?
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket
> > > Qin <
> > > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > > >> > > > > > > > >>>>>>
> > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue
> > was
> > > > > >> bright
> > > > > >> > up
> > > > > >> > > > some
> > > > > >> > > > > > > > >>> time
> > > > > >> > > > > > > > >>>>> ago
> > > > > >> > > > > > > > >>>>>>>> when
> > > > > >> > > > > > > > >>>>>>>>>> we
> > > > > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the
> > > > > request.
> > > > > >> The
> > > > > >> > > > > > > > >> concern
> > > > > >> > > > > > > > >>>> was
> > > > > >> > > > > > > > >>>>>> that
> > > > > >> > > > > > > > >>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
> > > > > >> processed in
> > > > > >> > > > > order.
> > > > > >> > > > > > > > >>> If
> > > > > >> > > > > > > > >>>> we
> > > > > >> > > > > > > > >>>>>> can
> > > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > > >> > > > > > > > >>>>>>>>>>> that there is one controller request
> in
> > > the
> > > > > >> request
> > > > > >> > > > > > > > >> channel,
> > > > > >> > > > > > > > >>>> the
> > > > > >> > > > > > > > >>>>>>>> order
> > > > > >> > > > > > > > >>>>>>>>> is
> > > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there
> > are
> > > > > more
> > > > > >> > than
> > > > > >> > > > one
> > > > > >> > > > > > > > >>>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>> request
> > > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the
> controller
> > > > > request
> > > > > >> > order
> > > > > >> > > > may
> > > > > >> > > > > > > > >>>> change
> > > > > >> > > > > > > > >>>>>> and
> > > > > >> > > > > > > > >>>>>>>>>> cause
> > > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think about the
> > > > > following
> > > > > >> > > > sequence:
> > > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a
> > request
> > > R1
> > > > > to
> > > > > >> > > broker
> > > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the
> > request
> > > to
> > > > > the
> > > > > >> > head
> > > > > >> > > > of
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >>>>>> request
> > > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection
> > failed
> > > > and
> > > > > >> the
> > > > > >> > > > > > > > >> controller
> > > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > > >> > > > > > > > >>>>>>>>>> to
> > > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to
> the
> > > > broker
> > > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to
> the
> > > head
> > > > > of
> > > > > >> the
> > > > > >> > > > > > > > >> request
> > > > > >> > > > > > > > >>>>> queue.
> > > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be
> > > processed
> > > > > >> before
> > > > > >> > > R1
> > > > > >> > > > is
> > > > > >> > > > > > > > >>>>>> processed,
> > > > > >> > > > > > > > >>>>>>>>>> which
> > > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel
> > > Koshy
> > > > <
> > > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It
> > appears
> > > to
> > > > > be
> > > > > >> a
> > > > > >> > > > simpler
> > > > > >> > > > > > > > >>>> less
> > > > > >> > > > > > > > >>>>>>>>> invasive
> > > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > > > >> Jun/Becket/others,
> > > > > >> > > do
> > > > > >> > > > > > > > >> you
> > > > > >> > > > > > > > >>>> see
> > > > > >> > > > > > > > >>>>>> any
> > > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM,
> Lucas
> > > > Wang
> > > > > <
> > > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that
> I
> > > > > haven't
> > > > > >> > > thought
> > > > > >> > > > > > > > >>>>> before.
> > > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at
> hand
> > > > pretty
> > > > > >> > well,
> > > > > >> > > > and
> > > > > >> > > > > > > > >>>> also
> > > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size
> > > metric
> > > > > and
> > > > > >> > > > capacity
> > > > > >> > > > > > > > >>>>> config
> > > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In
> > > fact,
> > > > > if
> > > > > >> we
> > > > > >> > > were
> > > > > >> > > > > > > > >> to
> > > > > >> > > > > > > > >>>>> adopt
> > > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public
> > > interface
> > > > > >> change,
> > > > > >> > > and
> > > > > >> > > > > > > > >> we
> > > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue
> can
> > > > > readily
> > > > > >> > > > satisfy
> > > > > >> > > > > > > > >>> the
> > > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also
> > > > allowing
> > > > > >> > > inserting
> > > > > >> > > > > > > > >> at
> > > > > >> > > > > > > > >>>>> both
> > > > > >> > > > > > > > >>>>>>>> ends.
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this design
> is
> > > > tied
> > > > > to
> > > > > >> > the
> > > > > >> > > > > > > > >>>>> coincidence
> > > > > >> > > > > > > > >>>>>>>> that
> > > > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and
> > there
> > > > are
> > > > > >> two
> > > > > >> > > ends
> > > > > >> > > > > > > > >>> to a
> > > > > >> > > > > > > > >>>>>>>> deque.
> > > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design,
> it
> > > > seems
> > > > > >> the
> > > > > >> > > > > > > > >> network
> > > > > >> > > > > > > > >>>>> layer
> > > > > >> > > > > > > > >>>>>>>> is
> > > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper
> layer
> > > > logic,
> > > > > >> e.g.
> > > > > >> > > if
> > > > > >> > > > > > > > >> we
> > > > > >> > > > > > > > >>>> were
> > > > > >> > > > > > > > >>>>>> to
> > > > > >> > > > > > > > >>>>>>>> add
> > > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the
> future
> > > for
> > > > > some
> > > > > >> > > > reason,
> > > > > >> > > > > > > > >>> we
> > > > > >> > > > > > > > >>>>>> would
> > > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of
> > > separate
> > > > > >> queues,
> > > > > >> > > one
> > > > > >> > > > > > > > >>> for
> > > > > >> > > > > > > > >>>>> each
> > > > > >> > > > > > > > >>>>>>>>>> priority
> > > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs
> > and
> > > > > lean
> > > > > >> > > toward
> > > > > >> > > > > > > > >>> your
> > > > > >> > > > > > > > >>>>>>>> suggested
> > > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new
> > > > design,
> > > > > >> I'm
> > > > > >> > > > > > > > >>> answering
> > > > > >> > > > > > > > >>>>>> your
> > > > > >> > > > > > > > >>>>>>>>>>> question
> > > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think
> > your
> > > > > >> > suggestion
> > > > > >> > > > > > > > >> makes
> > > > > >> > > > > > > > >>>>>> sense,
> > > > > >> > > > > > > > >>>>>>>> and
> > > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config
> and
> > > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default value of
> > 20
> > > > > being
> > > > > >> > > > > > > > >> sufficient
> > > > > >> > > > > > > > >>>>>> enough.
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM,
> > > Mayuresh
> > > > > >> Gharat
> > > > > >> > <
> > > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is
> to
> > > > > >> prioritize
> > > > > >> > > the
> > > > > >> > > > > > > > >>>>>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>>> request
> > > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the
> > request
> > > > > queue
> > > > > >> > to a
> > > > > >> > > > > > > > >>>>> dequeue,
> > > > > >> > > > > > > > >>>>>>>> where
> > > > > >> > > > > > > > >>>>>>>>>> you
> > > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests
> > > > (produce,
> > > > > >> > > > > > > > >>>> consume,..etc)
> > > > > >> > > > > > > > >>>>>> to
> > > > > >> > > > > > > > >>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>> end
> > > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a
> controller
> > > > > request,
> > > > > >> > you
> > > > > >> > > > > > > > >>> insert
> > > > > >> > > > > > > > >>>>> it
> > > > > >> > > > > > > > >>>>>> to
> > > > > >> > > > > > > > >>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>> head
> > > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the
> > > > controller
> > > > > >> > > request
> > > > > >> > > > > > > > >>> will
> > > > > >> > > > > > > > >>>>> be
> > > > > >> > > > > > > > >>>>>>>> given
> > > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request
> > > from
> > > > > the
> > > > > >> > > socket
> > > > > >> > > > > > > > >>> and
> > > > > >> > > > > > > > >>>>> mute
> > > > > >> > > > > > > > >>>>>>>> it
> > > > > >> > > > > > > > >>>>>>>>> and
> > > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the
> request,
> > > > this
> > > > > >> would
> > > > > >> > > > > > > > >>> ensure
> > > > > >> > > > > > > > >>>>> that
> > > > > >> > > > > > > > >>>>>>>> we
> > > > > >> > > > > > > > >>>>>>>>>> don't
> > > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of
> > > order.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the
> > > > second
> > > > > >> queue
> > > > > >> > > and
> > > > > >> > > > > > > > >>> the
> > > > > >> > > > > > > > >>>>>>>>> additional
> > > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM
> > Becket
> > > > Qin
> > > > > <
> > > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > >> > > > > > > > >>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation.
> I
> > > > agree
> > > > > >> the
> > > > > >> > > > > > > > >>> current
> > > > > >> > > > > > > > >>>>>> design
> > > > > >> > > > > > > > >>>>>>>>>> makes
> > > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the
> > new
> > > > > config
> > > > > >> > for
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >>>>>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot
> > think
> > > > of a
> > > > > >> case
> > > > > >> > > in
> > > > > >> > > > > > > > >>>> which
> > > > > >> > > > > > > > >>>>>>>> users
> > > > > >> > > > > > > > >>>>>>>>>>> would
> > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM,
> > > Becket
> > > > > Qin
> > > > > >> <
> > > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be
> > rephrased
> > > to
> > > > > >> "do we
> > > > > >> > > > > > > > >>>> expect
> > > > > >> > > > > > > > >>>>>>>> user to
> > > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue
> > > capacity"?
> > > > > If
> > > > > >> we
> > > > > >> > > > > > > > >>> agree
> > > > > >> > > > > > > > >>>>> that
> > > > > >> > > > > > > > >>>>>>>> 20
> > > > > >> > > > > > > > >>>>>>>>> is
> > > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and
> we
> > > do
> > > > > not
> > > > > >> > > > > > > > >> expect
> > > > > >> > > > > > > > >>>> user
> > > > > >> > > > > > > > >>>>>> to
> > > > > >> > > > > > > > >>>>>>>>>> change
> > > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this
> as a
> > > > > config?
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM,
> > > Lucas
> > > > > >> Wang <
> > > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You
> are
> > > > right
> > > > > >> that
> > > > > >> > > > > > > > >>>>> normally
> > > > > >> > > > > > > > >>>>>>>> there
> > > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because
> of
> > > > > muting,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say
> > there
> > > > > would
> > > > > >> be
> > > > > >> > > > > > > > >> many
> > > > > >> > > > > > > > >>>>>>>> enqueued
> > > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again,
> and
> > > I'm
> > > > > not
> > > > > >> > sure
> > > > > >> > > > > > > > >>>> which
> > > > > >> > > > > > > > >>>>>> part
> > > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you
> > point
> > > it
> > > > > out
> > > > > >> > the
> > > > > >> > > > > > > > >>>>> section.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen
> in
> > > > normal
> > > > > >> > > > > > > > >>>> conditions,
> > > > > >> > > > > > > > >>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>> current
> > > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> > > controllers
> > > > > >> > running
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we
> > don't
> > > > have
> > > > > >> the
> > > > > >> > > > > > > > >>>>> controller
> > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its
> > capacity
> > > to
> > > > > be
> > > > > >> 1,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling
> requests
> > > from
> > > > > >> > > > > > > > >> different
> > > > > >> > > > > > > > >>>>>>>> controllers
> > > > > >> > > > > > > > >>>>>>>>>>> will
> > > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome
> > > > times,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we
> > want.
> > > On
> > > > > the
> > > > > >> > > > > > > > >> other
> > > > > >> > > > > > > > >>>>> hand,
> > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say
> > 20,
> > > > > >> guards
> > > > > >> > us
> > > > > >> > > > > > > > >>> from
> > > > > >> > > > > > > > >>>>>>>> issues
> > > > > >> > > > > > > > >>>>>>>>> in
> > > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there
> > > isn't
> > > > > >> much
> > > > > >> > > > > > > > >>>> downside
> > > > > >> > > > > > > > >>>>> of
> > > > > >> > > > > > > > >>>>>>>>> adding
> > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an
> > > > obsolete
> > > > > >> > > > > > > > >>> statement
> > > > > >> > > > > > > > >>>>>> based
> > > > > >> > > > > > > > >>>>>>>> on
> > > > > >> > > > > > > > >>>>>>>>> a
> > > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording
> > in
> > > > the
> > > > > >> KIP.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33
> AM,
> > > > > Mayuresh
> > > > > >> > > > > > > > >>> Gharat <
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com>
> > wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why
> you
> > > > think
> > > > > >> "The
> > > > > >> > > > > > > > >>>> memory
> > > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of
> queued
> > > > > requests
> > > > > >> > can
> > > > > >> > > > > > > > >>> go
> > > > > >> > > > > > > > >>>> up
> > > > > >> > > > > > > > >>>>>> to
> > > > > >> > > > > > > > >>>>>>>> 2x"
> > > > > >> > > > > > > > >>>>>>>>>> in
> > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests
> > from
> > > > > >> > > > > > > > >> controller
> > > > > >> > > > > > > > >>>> to a
> > > > > >> > > > > > > > >>>>>>>> Broker
> > > > > >> > > > > > > > >>>>>>>>>> are
> > > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM
> > > > Becket
> > > > > >> Qin <
> > > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> > > > Separating
> > > > > >> the
> > > > > >> > > > > > > > >>>> control
> > > > > >> > > > > > > > >>>>>>>> plane
> > > > > >> > > > > > > > >>>>>>>>>> from
> > > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that
> > the
> > > > > >> > > > > > > > >> controller
> > > > > >> > > > > > > > >>>>>> request
> > > > > >> > > > > > > > >>>>>>>>> queue
> > > > > >> > > > > > > > >>>>>>>>>>> may
> > > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a
> > > > common
> > > > > >> case?
> > > > > >> > > > > > > > >>> The
> > > > > >> > > > > > > > >>>>>>>>> controller
> > > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer.
> > The
> > > > > >> > > > > > > > >>> SocketServer
> > > > > >> > > > > > > > >>>>>> will
> > > > > >> > > > > > > > >>>>>>>>> mute
> > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into
> > the
> > > > > >> request
> > > > > >> > > > > > > > >>>>> channel.
> > > > > >> > > > > > > > >>>>>>>> So
> > > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between
> > > > controller
> > > > > >> and
> > > > > >> > > > > > > > >>> each
> > > > > >> > > > > > > > >>>>>>>> broker,
> > > > > >> > > > > > > > >>>>>>>>> on
> > > > > >> > > > > > > > >>>>>>>>>>> the
> > > > > >> > > > > >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Becket,

Thanks for the review. The current write up in the KIP won’t change the
ordering behavior. Are you ok with addressing that as a separate
independent issue (I’ll create a separate ticket for it)?
If so, can you please give me a +1 on the vote thread?

Thanks,
Lucas

On Tue, Aug 7, 2018 at 7:34 PM Becket Qin <be...@gmail.com> wrote:

> Thanks for the updated KIP wiki, Lucas. Looks good to me overall.
>
> It might be an implementation detail, but do we still plan to use the
> correlation id to ensure the request processing order?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for your review, Dong.
> > Ack that these configs will have a bigger impact for users.
> >
> > On the other hand, I would argue that the request queue becoming full
> > may or may not be a rare scenario.
> > How often the request queue gets full depends on the request incoming
> rate,
> > the request processing rate, and the size of the request queue.
> > When that happens, the dedicated endpoints design can better handle
> > it than any of the previously discussed options.
> >
> > Another reason I made the change was that I have the same taste
> > as Becket that it's a better separation of the control plane from the
> data
> > plane.
> >
> > Finally, I want to clarify that this change is NOT motivated by the
> > out-of-order
> > processing discussion. The latter problem is orthogonal to this KIP, and
> it
> > can happen in any of the design options we discussed for this KIP so far.
> > So I'd like to address out-of-order processing separately in another
> > thread,
> > and avoid mentioning it in this KIP.
> >
> > Thanks,
> > Lucas
> >
> > On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks for the update.
> > >
> > > The current KIP propose new broker configs "listeners.for.controller"
> and
> > > "advertised.listeners.for.controller". This is going to be a big change
> > > since listeners are among the most important configs that every user
> > needs
> > > to change. According to the rejected alternative section, it seems that
> > the
> > > reason to add these two configs is to improve performance when the data
> > > request queue is full rather than for correctness. It should be a very
> > rare
> > > scenario and I am not sure we should add configs for all users just to
> > > improve the performance in such rare scenario.
> > >
> > > Also, if the new design is based on the issues which are discovered in
> > the
> > > recent discussion, e.g. out of order processing if we don't use a
> > dedicated
> > > thread for controller request, it may be useful to explain the problem
> in
> > > the motivation section.
> > >
> > > Thanks,
> > > Dong
> > >
> > > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > A kind reminder for review of this KIP.
> > > >
> > > > Thank you very much!
> > > > Lucas
> > > >
> > > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I've updated the KIP by adding the dedicated endpoints for
> controller
> > > > > connections,
> > > > > and pinning threads for controller requests.
> > > > > Also I've updated the title of this KIP. Please take a look and let
> > me
> > > > > know your feedback.
> > > > >
> > > > > Thanks a lot for your time!
> > > > > Lucas
> > > > >
> > > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com> wrote:
> > > > >
> > > > >> Hi Lucas,
> > > > >> I agree, if we want to go forward with a separate controller plane
> > and
> > > > >> data
> > > > >> plane and completely isolate them, having a separate port for
> > > controller
> > > > >> with a separate Acceptor and a Processor sounds ideal to me.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Mayuresh
> > > > >>
> > > > >>
> > > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Hi Lucas,
> > > > >> >
> > > > >> > Yes, I agree that a dedicated end to end control flow would be
> > > ideal.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jiangjie (Becket) Qin
> > > > >> >
> > > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks for the comment, Becket.
> > > > >> > > So far, we've been trying to avoid making any request handler
> > > thread
> > > > >> > > special.
> > > > >> > > But if we were to follow that path in order to make the two
> > planes
> > > > >> more
> > > > >> > > isolated,
> > > > >> > > what do you think about also having a dedicated processor
> > thread,
> > > > >> > > and dedicated port for the controller?
> > > > >> > >
> > > > >> > > Today one processor thread can handle multiple connections,
> > let's
> > > > say
> > > > >> 100
> > > > >> > > connections
> > > > >> > >
> > > > >> > > represented by connection0, ... connection99, among which
> > > > >> connection0-98
> > > > >> > > are from clients, while connection99 is from
> > > > >> > >
> > > > >> > > the controller. Further let's say after one selector polling,
> > > there
> > > > >> are
> > > > >> > > incoming requests on all connections.
> > > > >> > >
> > > > >> > > When the request queue is full, (either the data request being
> > > full
> > > > in
> > > > >> > the
> > > > >> > > two queue design, or
> > > > >> > >
> > > > >> > > the one single queue being full in the deque design), the
> > > processor
> > > > >> > thread
> > > > >> > > will be blocked first
> > > > >> > >
> > > > >> > > when trying to enqueue the data request from connection0, then
> > > > >> possibly
> > > > >> > > blocked for the data request
> > > > >> > >
> > > > >> > > from connection1, ... etc even though the controller request
> is
> > > > ready
> > > > >> to
> > > > >> > be
> > > > >> > > enqueued.
> > > > >> > >
> > > > >> > > To solve this problem, it seems we would need to have a
> separate
> > > > port
> > > > >> > > dedicated to
> > > > >> > >
> > > > >> > > the controller, a dedicated processor thread, a dedicated
> > > controller
> > > > >> > > request queue,
> > > > >> > >
> > > > >> > > and pinning of one request handler thread for controller
> > requests.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Lucas
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Personally I am not fond of the dequeue approach simply
> > because
> > > it
> > > > >> is
> > > > >> > > > against the basic idea of isolating the controller plane and
> > > data
> > > > >> > plane.
> > > > >> > > > With a single dequeue, theoretically speaking the controller
> > > > >> requests
> > > > >> > can
> > > > >> > > > starve the clients requests. I would prefer the approach
> with
> > a
> > > > >> > separate
> > > > >> > > > controller request queue and a dedicated controller request
> > > > handler
> > > > >> > > thread.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > >
> > > > >> > > > Jiangjie (Becket) Qin
> > > > >> > > >
> > > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > Sure, I can summarize the usage of correlation id. But
> > before
> > > I
> > > > do
> > > > >> > > that,
> > > > >> > > > it
> > > > >> > > > > seems
> > > > >> > > > > the same out-of-order processing can also happen to
> Produce
> > > > >> requests
> > > > >> > > sent
> > > > >> > > > > by producers,
> > > > >> > > > > following the same example you described earlier.
> > > > >> > > > > If that's the case, I think this probably deserves a
> > separate
> > > > doc
> > > > >> and
> > > > >> > > > > design independent of this KIP.
> > > > >> > > > >
> > > > >> > > > > Lucas
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > > lindong28@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hey Lucas,
> > > > >> > > > > >
> > > > >> > > > > > Could you update the KIP if you are confident with the
> > > > approach
> > > > >> > which
> > > > >> > > > > uses
> > > > >> > > > > > correlation id? The idea around correlation id is kind
> of
> > > > >> scattered
> > > > >> > > > > across
> > > > >> > > > > > multiple emails. It will be useful if other reviews can
> > read
> > > > the
> > > > >> > KIP
> > > > >> > > to
> > > > >> > > > > > understand the latest proposal.
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Dong
> > > > >> > > > > >
> > > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > I like the idea of the dequeue implementation by
> Lucas.
> > > This
> > > > >> will
> > > > >> > > > help
> > > > >> > > > > us
> > > > >> > > > > > > avoid additional queue for controller and additional
> > > configs
> > > > >> in
> > > > >> > > > Kafka.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Mayuresh
> > > > >> > > > > > >
> > > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > > >> becket.qin@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hi Jun,
> > > > >> > > > > > > >
> > > > >> > > > > > > > The usage of correlation ID might still be useful to
> > > > address
> > > > >> > the
> > > > >> > > > > cases
> > > > >> > > > > > > > that the controller epoch and leader epoch check are
> > not
> > > > >> > > sufficient
> > > > >> > > > > to
> > > > >> > > > > > > > guarantee correct behavior. For example, if the
> > > controller
> > > > >> > sends
> > > > >> > > a
> > > > >> > > > > > > > LeaderAndIsrRequest followed by a
> StopReplicaRequest,
> > > and
> > > > >> the
> > > > >> > > > broker
> > > > >> > > > > > > > processes it in the reverse order, the replica may
> > still
> > > > be
> > > > >> > > wrongly
> > > > >> > > > > > > > recreated, right?
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > >
> > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > >> > > > > > > >
> > > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > > jun@confluent.io
> > > > >
> > > > >> > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Hmm, since we already use controller epoch and
> > leader
> > > > >> epoch
> > > > >> > for
> > > > >> > > > > > > properly
> > > > >> > > > > > > > > caching the latest partition state, do we really
> > need
> > > > >> > > correlation
> > > > >> > > > > id
> > > > >> > > > > > > for
> > > > >> > > > > > > > > ordering the controller requests?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Jun
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > > >> > > > becket.qin@gmail.com>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >> Lucas and Mayuresh,
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Good idea. The correlation id should work.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> In the ControllerChannelManager, a request will
> be
> > > > resent
> > > > >> > > until
> > > > >> > > > a
> > > > >> > > > > > > > response
> > > > >> > > > > > > > >> is received. So if the controller to broker
> > > connection
> > > > >> > > > disconnects
> > > > >> > > > > > > after
> > > > >> > > > > > > > >> controller sends R1_a, but before the response of
> > > R1_a
> > > > is
> > > > >> > > > > received,
> > > > >> > > > > > a
> > > > >> > > > > > > > >> disconnection may cause the controller to resend
> > > R1_b.
> > > > >> i.e.
> > > > >> > > > until
> > > > >> > > > > R1
> > > > >> > > > > > > is
> > > > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > > >> > > > > > > > >> This gives two guarantees:
> > > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must
> > have
> > > > been
> > > > >> > > > > processed
> > > > >> > > > > > at
> > > > >> > > > > > > > >> least once.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> So on the broker side, with a single thread
> > > controller
> > > > >> > request
> > > > >> > > > > > > handler,
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >> logic should be:
> > > > >> > > > > > > > >> 1. Process what ever request seen in the
> controller
> > > > >> request
> > > > >> > > > queue
> > > > >> > > > > > > > >> 2. For the given epoch, drop request if its
> > > correlation
> > > > >> id
> > > > >> > is
> > > > >> > > > > > smaller
> > > > >> > > > > > > > than
> > > > >> > > > > > > > >> that of the last processed request.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Thanks,
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > > > >> jun@confluent.io>
> > > > >> > > > > wrote:
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >>> I agree that there is no strong ordering when
> > there
> > > > are
> > > > >> > more
> > > > >> > > > than
> > > > >> > > > > > one
> > > > >> > > > > > > > >>> socket connections. Currently, we rely on
> > > > >> controllerEpoch
> > > > >> > and
> > > > >> > > > > > > > leaderEpoch
> > > > >> > > > > > > > >>> to ensure that the receiving broker picks up the
> > > > latest
> > > > >> > state
> > > > >> > > > for
> > > > >> > > > > > > each
> > > > >> > > > > > > > >>> partition.
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>> One potential issue with the dequeue approach is
> > > that
> > > > if
> > > > >> > the
> > > > >> > > > > queue
> > > > >> > > > > > is
> > > > >> > > > > > > > >> full,
> > > > >> > > > > > > > >>> there is no guarantee that the controller
> requests
> > > > will
> > > > >> be
> > > > >> > > > > enqueued
> > > > >> > > > > > > > >>> quickly.
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>> Thanks,
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>> Jun
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh
> Gharat <
> > > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > >> > > > > > > > >>>> wrote:
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> > > > >> > NetworkClient
> > > > >> > > > > > > > >> constructor.
> > > > >> > > > > > > > >>>> Since we reuse the same NetworkClient between
> > > > >> Controller
> > > > >> > and
> > > > >> > > > the
> > > > >> > > > > > > > >> broker,
> > > > >> > > > > > > > >>> a
> > > > >> > > > > > > > >>>> disconnection should not cause it to reset to
> 0,
> > in
> > > > >> which
> > > > >> > > case
> > > > >> > > > > it
> > > > >> > > > > > > can
> > > > >> > > > > > > > >> be
> > > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > > >> > > > > > > > >>>>
> > > > >> > > > > > > > >>>> Thanks,
> > > > >> > > > > > > > >>>>
> > > > >> > > > > > > > >>>> Mayuresh
> > > > >> > > > > > > > >>>>
> > > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > > >> > > > > lucasatucla@gmail.com
> > > > >> > > > > > >
> > > > >> > > > > > > > >>> wrote:
> > > > >> > > > > > > > >>>>
> > > > >> > > > > > > > >>>>> @Dong,
> > > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > >> > > > > > > > >>>>>
> > > > >> > > > > > > > >>>>> @All
> > > > >> > > > > > > > >>>>> Regarding the example given by Dong, it seems
> > even
> > > > if
> > > > >> we
> > > > >> > > use
> > > > >> > > > a
> > > > >> > > > > > > queue,
> > > > >> > > > > > > > >>>> and a
> > > > >> > > > > > > > >>>>> dedicated controller request handling thread,
> > > > >> > > > > > > > >>>>> the same result can still happen because R1_a
> > will
> > > > be
> > > > >> > sent
> > > > >> > > on
> > > > >> > > > > one
> > > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a
> > > > different
> > > > >> > > > > connection,
> > > > >> > > > > > > > >>>>> and there is no ordering between different
> > > > >> connections on
> > > > >> > > the
> > > > >> > > > > > > broker
> > > > >> > > > > > > > >>>> side.
> > > > >> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it
> > > seems
> > > > >> > > > > correlation
> > > > >> > > > > > id
> > > > >> > > > > > > > >>>> within
> > > > >> > > > > > > > >>>>> the same NetworkClient object is monotonically
> > > > >> increasing
> > > > >> > > and
> > > > >> > > > > > never
> > > > >> > > > > > > > >>>> reset,
> > > > >> > > > > > > > >>>>> hence a broker can leverage that to properly
> > > reject
> > > > >> > > obsolete
> > > > >> > > > > > > > >> requests.
> > > > >> > > > > > > > >>>>> Thoughts?
> > > > >> > > > > > > > >>>>>
> > > > >> > > > > > > > >>>>> Thanks,
> > > > >> > > > > > > > >>>>> Lucas
> > > > >> > > > > > > > >>>>>
> > > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh
> > Gharat
> > > <
> > > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > >> > > > > > > > >>>>>
> > > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case
> of
> > > > >> > connection
> > > > >> > > > > > loss, I
> > > > >> > > > > > > > >>>> think.
> > > > >> > > > > > > > >>>>>>
> > > > >> > > > > > > > >>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>
> > > > >> > > > > > > > >>>>>> Mayuresh
> > > > >> > > > > > > > >>>>>>
> > > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh
> > Gharat
> > > <
> > > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > >> > > > > > > > >>>>>> wrote:
> > > > >> > > > > > > > >>>>>>
> > > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order
> processing
> > > can
> > > > >> > happen
> > > > >> > > > > with
> > > > >> > > > > > > > >>>> having 2
> > > > >> > > > > > > > >>>>>>> separate queues as well and it can even
> happen
> > > > >> today.
> > > > >> > > > > > > > >>>>>>> Can we use the correlationId in the request
> > from
> > > > the
> > > > >> > > > > controller
> > > > >> > > > > > > > >> to
> > > > >> > > > > > > > >>>> the
> > > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > >> > > > > > > > >>>>>>>
> > > > >> > > > > > > > >>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>
> > > > >> > > > > > > > >>>>>>> Mayuresh
> > > > >> > > > > > > > >>>>>>>
> > > > >> > > > > > > > >>>>>>>
> > > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > > >> > > > > > becket.qin@gmail.com
> > > > >> > > > > > > > >>>
> > > > >> > > > > > > > >>>>> wrote:
> > > > >> > > > > > > > >>>>>>>
> > > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
> > > > >> controller
> > > > >> > > > > request
> > > > >> > > > > > > > >>>> handling
> > > > >> > > > > > > > >>>>>>>> thread would be a better isolation. It also
> > > > solves
> > > > >> the
> > > > >> > > > > > > > >> reordering
> > > > >> > > > > > > > >>>>> issue.
> > > > >> > > > > > > > >>>>>>>>
> > > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel
> Koshy <
> > > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > >> > > > > > > > >>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>
> > > > >> > > > > > > > >>>>>>>>> Good example. I think this scenario can
> > occur
> > > in
> > > > >> the
> > > > >> > > > > current
> > > > >> > > > > > > > >>> code
> > > > >> > > > > > > > >>>> as
> > > > >> > > > > > > > >>>>>>>> well
> > > > >> > > > > > > > >>>>>>>>> but with even lower probability given that
> > > there
> > > > >> are
> > > > >> > > > other
> > > > >> > > > > > > > >>>>>>>> non-controller
> > > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy
> > > though
> > > > >> and
> > > > >> > I
> > > > >> > > > > think
> > > > >> > > > > > a
> > > > >> > > > > > > > >>>> safer
> > > > >> > > > > > > > >>>>>>>>> approach would be separate queues and
> > pinning
> > > > >> > > controller
> > > > >> > > > > > > > >> request
> > > > >> > > > > > > > >>>>>>>> handling
> > > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > > >> > > > > > > > >>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong
> Lin <
> > > > >> > > > > > > > >> lindong28@gmail.com
> > > > >> > > > > > > > >>>>
> > > > >> > > > > > > > >>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> I think you are right that there may be
> > > > >> out-of-order
> > > > >> > > > > > > > >>> processing.
> > > > >> > > > > > > > >>>>>>>> However,
> > > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may
> > > also
> > > > >> > happen
> > > > >> > > > even
> > > > >> > > > > > > > >> if
> > > > >> > > > > > > > >>> we
> > > > >> > > > > > > > >>>>>> use a
> > > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got
> disconnected
> > > > before
> > > > >> > > > > receiving
> > > > >> > > > > > > > >>>>>> response.
> > > > >> > > > > > > > >>>>>>>>> Then
> > > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests
> > now
> > > > >> stay
> > > > >> > in
> > > > >> > > > the
> > > > >> > > > > > > > >>>>> controller
> > > > >> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
> > > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request
> queue
> > > and
> > > > >> then
> > > > >> > > > > thread2
> > > > >> > > > > > > > >>>> takes
> > > > >> > > > > > > > >>>>>> R2
> > > > >> > > > > > > > >>>>>>>>> from
> > > > >> > > > > > > > >>>>>>>>>> the request queue almost at the same
> time.
> > > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in
> parallel.
> > > > >> There is
> > > > >> > > > > chance
> > > > >> > > > > > > > >>> that
> > > > >> > > > > > > > >>>>>> R2's
> > > > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for
> > > both
> > > > >> > > > approaches
> > > > >> > > > > > > > >> with
> > > > >> > > > > > > > >>>>> very
> > > > >> > > > > > > > >>>>>>>> low
> > > > >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to
> > add
> > > > the
> > > > >> > extra
> > > > >> > > > > > > > >> queue.
> > > > >> > > > > > > > >>>> What
> > > > >> > > > > > > > >>>>>> do
> > > > >> > > > > > > > >>>>>>>> you
> > > > >> > > > > > > > >>>>>>>>>> think?
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>> Dong
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket
> > Qin <
> > > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > > >> > > > > > > > >>>>>>
> > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue
> was
> > > > >> bright
> > > > >> > up
> > > > >> > > > some
> > > > >> > > > > > > > >>> time
> > > > >> > > > > > > > >>>>> ago
> > > > >> > > > > > > > >>>>>>>> when
> > > > >> > > > > > > > >>>>>>>>>> we
> > > > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the
> > > > request.
> > > > >> The
> > > > >> > > > > > > > >> concern
> > > > >> > > > > > > > >>>> was
> > > > >> > > > > > > > >>>>>> that
> > > > >> > > > > > > > >>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
> > > > >> processed in
> > > > >> > > > > order.
> > > > >> > > > > > > > >>> If
> > > > >> > > > > > > > >>>> we
> > > > >> > > > > > > > >>>>>> can
> > > > >> > > > > > > > >>>>>>>>>> ensure
> > > > >> > > > > > > > >>>>>>>>>>> that there is one controller request in
> > the
> > > > >> request
> > > > >> > > > > > > > >> channel,
> > > > >> > > > > > > > >>>> the
> > > > >> > > > > > > > >>>>>>>> order
> > > > >> > > > > > > > >>>>>>>>> is
> > > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there
> are
> > > > more
> > > > >> > than
> > > > >> > > > one
> > > > >> > > > > > > > >>>>>> controller
> > > > >> > > > > > > > >>>>>>>>>> request
> > > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller
> > > > request
> > > > >> > order
> > > > >> > > > may
> > > > >> > > > > > > > >>>> change
> > > > >> > > > > > > > >>>>>> and
> > > > >> > > > > > > > >>>>>>>>>> cause
> > > > >> > > > > > > > >>>>>>>>>>> problem. For example, think about the
> > > > following
> > > > >> > > > sequence:
> > > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a
> request
> > R1
> > > > to
> > > > >> > > broker
> > > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the
> request
> > to
> > > > the
> > > > >> > head
> > > > >> > > > of
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >>>>>> request
> > > > >> > > > > > > > >>>>>>>>>> queue.
> > > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection
> failed
> > > and
> > > > >> the
> > > > >> > > > > > > > >> controller
> > > > >> > > > > > > > >>>>>>>>> reconnected
> > > > >> > > > > > > > >>>>>>>>>> to
> > > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the
> > > broker
> > > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the
> > head
> > > > of
> > > > >> the
> > > > >> > > > > > > > >> request
> > > > >> > > > > > > > >>>>> queue.
> > > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be
> > processed
> > > > >> before
> > > > >> > > R1
> > > > >> > > > is
> > > > >> > > > > > > > >>>>>> processed,
> > > > >> > > > > > > > >>>>>>>>>> which
> > > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel
> > Koshy
> > > <
> > > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > >> > > > > > > > >>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It
> appears
> > to
> > > > be
> > > > >> a
> > > > >> > > > simpler
> > > > >> > > > > > > > >>>> less
> > > > >> > > > > > > > >>>>>>>>> invasive
> > > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > > >> Jun/Becket/others,
> > > > >> > > do
> > > > >> > > > > > > > >> you
> > > > >> > > > > > > > >>>> see
> > > > >> > > > > > > > >>>>>> any
> > > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > >> > > > > > > > >>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas
> > > Wang
> > > > <
> > > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I
> > > > haven't
> > > > >> > > thought
> > > > >> > > > > > > > >>>>> before.
> > > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand
> > > pretty
> > > > >> > well,
> > > > >> > > > and
> > > > >> > > > > > > > >>>> also
> > > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size
> > metric
> > > > and
> > > > >> > > > capacity
> > > > >> > > > > > > > >>>>> config
> > > > >> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In
> > fact,
> > > > if
> > > > >> we
> > > > >> > > were
> > > > >> > > > > > > > >> to
> > > > >> > > > > > > > >>>>> adopt
> > > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public
> > interface
> > > > >> change,
> > > > >> > > and
> > > > >> > > > > > > > >> we
> > > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can
> > > > readily
> > > > >> > > > satisfy
> > > > >> > > > > > > > >>> the
> > > > >> > > > > > > > >>>>>>>>>> requirement
> > > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also
> > > allowing
> > > > >> > > inserting
> > > > >> > > > > > > > >> at
> > > > >> > > > > > > > >>>>> both
> > > > >> > > > > > > > >>>>>>>> ends.
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is
> > > tied
> > > > to
> > > > >> > the
> > > > >> > > > > > > > >>>>> coincidence
> > > > >> > > > > > > > >>>>>>>> that
> > > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and
> there
> > > are
> > > > >> two
> > > > >> > > ends
> > > > >> > > > > > > > >>> to a
> > > > >> > > > > > > > >>>>>>>> deque.
> > > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it
> > > seems
> > > > >> the
> > > > >> > > > > > > > >> network
> > > > >> > > > > > > > >>>>> layer
> > > > >> > > > > > > > >>>>>>>> is
> > > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer
> > > logic,
> > > > >> e.g.
> > > > >> > > if
> > > > >> > > > > > > > >> we
> > > > >> > > > > > > > >>>> were
> > > > >> > > > > > > > >>>>>> to
> > > > >> > > > > > > > >>>>>>>> add
> > > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future
> > for
> > > > some
> > > > >> > > > reason,
> > > > >> > > > > > > > >>> we
> > > > >> > > > > > > > >>>>>> would
> > > > >> > > > > > > > >>>>>>>>>>> probably
> > > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of
> > separate
> > > > >> queues,
> > > > >> > > one
> > > > >> > > > > > > > >>> for
> > > > >> > > > > > > > >>>>> each
> > > > >> > > > > > > > >>>>>>>>>> priority
> > > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs
> and
> > > > lean
> > > > >> > > toward
> > > > >> > > > > > > > >>> your
> > > > >> > > > > > > > >>>>>>>> suggested
> > > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new
> > > design,
> > > > >> I'm
> > > > >> > > > > > > > >>> answering
> > > > >> > > > > > > > >>>>>> your
> > > > >> > > > > > > > >>>>>>>>>>> question
> > > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think
> your
> > > > >> > suggestion
> > > > >> > > > > > > > >> makes
> > > > >> > > > > > > > >>>>>> sense,
> > > > >> > > > > > > > >>>>>>>> and
> > > > >> > > > > > > > >>>>>>>>>> I'm
> > > > >> > > > > > > > >>>>>>>>>>>> ok
> > > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default value of
> 20
> > > > being
> > > > >> > > > > > > > >> sufficient
> > > > >> > > > > > > > >>>>>> enough.
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM,
> > Mayuresh
> > > > >> Gharat
> > > > >> > <
> > > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
> > > > >> prioritize
> > > > >> > > the
> > > > >> > > > > > > > >>>>>>>> controller
> > > > >> > > > > > > > >>>>>>>>>>> request
> > > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the
> request
> > > > queue
> > > > >> > to a
> > > > >> > > > > > > > >>>>> dequeue,
> > > > >> > > > > > > > >>>>>>>> where
> > > > >> > > > > > > > >>>>>>>>>> you
> > > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests
> > > (produce,
> > > > >> > > > > > > > >>>> consume,..etc)
> > > > >> > > > > > > > >>>>>> to
> > > > >> > > > > > > > >>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>> end
> > > > >> > > > > > > > >>>>>>>>>>>> of
> > > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller
> > > > request,
> > > > >> > you
> > > > >> > > > > > > > >>> insert
> > > > >> > > > > > > > >>>>> it
> > > > >> > > > > > > > >>>>>> to
> > > > >> > > > > > > > >>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>> head
> > > > >> > > > > > > > >>>>>>>>>>>>> of
> > > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the
> > > controller
> > > > >> > > request
> > > > >> > > > > > > > >>> will
> > > > >> > > > > > > > >>>>> be
> > > > >> > > > > > > > >>>>>>>> given
> > > > >> > > > > > > > >>>>>>>>>>>> higher
> > > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request
> > from
> > > > the
> > > > >> > > socket
> > > > >> > > > > > > > >>> and
> > > > >> > > > > > > > >>>>> mute
> > > > >> > > > > > > > >>>>>>>> it
> > > > >> > > > > > > > >>>>>>>>> and
> > > > >> > > > > > > > >>>>>>>>>>>> only
> > > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request,
> > > this
> > > > >> would
> > > > >> > > > > > > > >>> ensure
> > > > >> > > > > > > > >>>>> that
> > > > >> > > > > > > > >>>>>>>> we
> > > > >> > > > > > > > >>>>>>>>>> don't
> > > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of
> > order.
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the
> > > second
> > > > >> queue
> > > > >> > > and
> > > > >> > > > > > > > >>> the
> > > > >> > > > > > > > >>>>>>>>> additional
> > > > >> > > > > > > > >>>>>>>>>>>>> config
> > > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM
> Becket
> > > Qin
> > > > <
> > > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > >> > > > > > > > >>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I
> > > agree
> > > > >> the
> > > > >> > > > > > > > >>> current
> > > > >> > > > > > > > >>>>>> design
> > > > >> > > > > > > > >>>>>>>>>> makes
> > > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the
> new
> > > > config
> > > > >> > for
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >>>>>>>> controller
> > > > >> > > > > > > > >>>>>>>>>>> queue
> > > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot
> think
> > > of a
> > > > >> case
> > > > >> > > in
> > > > >> > > > > > > > >>>> which
> > > > >> > > > > > > > >>>>>>>> users
> > > > >> > > > > > > > >>>>>>>>>>> would
> > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM,
> > Becket
> > > > Qin
> > > > >> <
> > > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be
> rephrased
> > to
> > > > >> "do we
> > > > >> > > > > > > > >>>> expect
> > > > >> > > > > > > > >>>>>>>> user to
> > > > >> > > > > > > > >>>>>>>>>>> ever
> > > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue
> > capacity"?
> > > > If
> > > > >> we
> > > > >> > > > > > > > >>> agree
> > > > >> > > > > > > > >>>>> that
> > > > >> > > > > > > > >>>>>>>> 20
> > > > >> > > > > > > > >>>>>>>>> is
> > > > >> > > > > > > > >>>>>>>>>>>>> already
> > > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we
> > do
> > > > not
> > > > >> > > > > > > > >> expect
> > > > >> > > > > > > > >>>> user
> > > > >> > > > > > > > >>>>>> to
> > > > >> > > > > > > > >>>>>>>>>> change
> > > > >> > > > > > > > >>>>>>>>>>>> it,
> > > > >> > > > > > > > >>>>>>>>>>>>> is
> > > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a
> > > > config?
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM,
> > Lucas
> > > > >> Wang <
> > > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > >> > > > > > > > >>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are
> > > right
> > > > >> that
> > > > >> > > > > > > > >>>>> normally
> > > > >> > > > > > > > >>>>>>>> there
> > > > >> > > > > > > > >>>>>>>>>>>> should
> > > > >> > > > > > > > >>>>>>>>>>>>> be
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of
> > > > muting,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say
> there
> > > > would
> > > > >> be
> > > > >> > > > > > > > >> many
> > > > >> > > > > > > > >>>>>>>> enqueued
> > > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and
> > I'm
> > > > not
> > > > >> > sure
> > > > >> > > > > > > > >>>> which
> > > > >> > > > > > > > >>>>>> part
> > > > >> > > > > > > > >>>>>>>>>>> conveys
> > > > >> > > > > > > > >>>>>>>>>>>>> that
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you
> point
> > it
> > > > out
> > > > >> > the
> > > > >> > > > > > > > >>>>> section.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in
> > > normal
> > > > >> > > > > > > > >>>> conditions,
> > > > >> > > > > > > > >>>>>> the
> > > > >> > > > > > > > >>>>>>>>>> current
> > > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> > controllers
> > > > >> > running
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we
> don't
> > > have
> > > > >> the
> > > > >> > > > > > > > >>>>> controller
> > > > >> > > > > > > > >>>>>>>>> queue
> > > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its
> capacity
> > to
> > > > be
> > > > >> 1,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests
> > from
> > > > >> > > > > > > > >> different
> > > > >> > > > > > > > >>>>>>>> controllers
> > > > >> > > > > > > > >>>>>>>>>>> will
> > > > >> > > > > > > > >>>>>>>>>>>> be
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome
> > > times,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we
> want.
> > On
> > > > the
> > > > >> > > > > > > > >> other
> > > > >> > > > > > > > >>>>> hand,
> > > > >> > > > > > > > >>>>>>>>> adding
> > > > >> > > > > > > > >>>>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say
> 20,
> > > > >> guards
> > > > >> > us
> > > > >> > > > > > > > >>> from
> > > > >> > > > > > > > >>>>>>>> issues
> > > > >> > > > > > > > >>>>>>>>> in
> > > > >> > > > > > > > >>>>>>>>>>>> those
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there
> > isn't
> > > > >> much
> > > > >> > > > > > > > >>>> downside
> > > > >> > > > > > > > >>>>> of
> > > > >> > > > > > > > >>>>>>>>> adding
> > > > >> > > > > > > > >>>>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an
> > > obsolete
> > > > >> > > > > > > > >>> statement
> > > > >> > > > > > > > >>>>>> based
> > > > >> > > > > > > > >>>>>>>> on
> > > > >> > > > > > > > >>>>>>>>> a
> > > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording
> in
> > > the
> > > > >> KIP.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM,
> > > > Mayuresh
> > > > >> > > > > > > > >>> Gharat <
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com>
> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you
> > > think
> > > > >> "The
> > > > >> > > > > > > > >>>> memory
> > > > >> > > > > > > > >>>>>>>>>>> consumption
> > > > >> > > > > > > > >>>>>>>>>>>>> can
> > > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued
> > > > requests
> > > > >> > can
> > > > >> > > > > > > > >>> go
> > > > >> > > > > > > > >>>> up
> > > > >> > > > > > > > >>>>>> to
> > > > >> > > > > > > > >>>>>>>> 2x"
> > > > >> > > > > > > > >>>>>>>>>> in
> > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests
> from
> > > > >> > > > > > > > >> controller
> > > > >> > > > > > > > >>>> to a
> > > > >> > > > > > > > >>>>>>>> Broker
> > > > >> > > > > > > > >>>>>>>>>> are
> > > > >> > > > > > > > >>>>>>>>>>>> not
> > > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM
> > > Becket
> > > > >> Qin <
> > > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> > > Separating
> > > > >> the
> > > > >> > > > > > > > >>>> control
> > > > >> > > > > > > > >>>>>>>> plane
> > > > >> > > > > > > > >>>>>>>>>> from
> > > > >> > > > > > > > >>>>>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that
> the
> > > > >> > > > > > > > >> controller
> > > > >> > > > > > > > >>>>>> request
> > > > >> > > > > > > > >>>>>>>>> queue
> > > > >> > > > > > > > >>>>>>>>>>> may
> > > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a
> > > common
> > > > >> case?
> > > > >> > > > > > > > >>> The
> > > > >> > > > > > > > >>>>>>>>> controller
> > > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer.
> The
> > > > >> > > > > > > > >>> SocketServer
> > > > >> > > > > > > > >>>>>> will
> > > > >> > > > > > > > >>>>>>>>> mute
> > > > >> > > > > > > > >>>>>>>>>>> the
> > > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into
> the
> > > > >> request
> > > > >> > > > > > > > >>>>> channel.
> > > > >> > > > > > > > >>>>>>>> So
> > > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between
> > > controller
> > > > >> and
> > > > >> > > > > > > > >>> each
> > > > >> > > > > > > > >>>>>>>> broker,
> > > > >> > > > > > > > >>>>>>>>> on
> > > > >> > > > > > > > >>>>>>>>>>> the
> > > > >> > > > > >

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Thanks for the updated KIP wiki, Lucas. Looks good to me overall.

It might be an implementation detail, but do we still plan to use the
correlation id to ensure the request processing order?

Thanks,

Jiangjie (Becket) Qin

On Tue, Jul 31, 2018 at 3:39 AM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for your review, Dong.
> Ack that these configs will have a bigger impact for users.
>
> On the other hand, I would argue that the request queue becoming full
> may or may not be a rare scenario.
> How often the request queue gets full depends on the request incoming rate,
> the request processing rate, and the size of the request queue.
> When that happens, the dedicated endpoints design can better handle
> it than any of the previously discussed options.
>
> Another reason I made the change was that I have the same taste
> as Becket that it's a better separation of the control plane from the data
> plane.
>
> Finally, I want to clarify that this change is NOT motivated by the
> out-of-order
> processing discussion. The latter problem is orthogonal to this KIP, and it
> can happen in any of the design options we discussed for this KIP so far.
> So I'd like to address out-of-order processing separately in another
> thread,
> and avoid mentioning it in this KIP.
>
> Thanks,
> Lucas
>
> On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Lucas,
> >
> > Thanks for the update.
> >
> > The current KIP propose new broker configs "listeners.for.controller" and
> > "advertised.listeners.for.controller". This is going to be a big change
> > since listeners are among the most important configs that every user
> needs
> > to change. According to the rejected alternative section, it seems that
> the
> > reason to add these two configs is to improve performance when the data
> > request queue is full rather than for correctness. It should be a very
> rare
> > scenario and I am not sure we should add configs for all users just to
> > improve the performance in such rare scenario.
> >
> > Also, if the new design is based on the issues which are discovered in
> the
> > recent discussion, e.g. out of order processing if we don't use a
> dedicated
> > thread for controller request, it may be useful to explain the problem in
> > the motivation section.
> >
> > Thanks,
> > Dong
> >
> > On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > A kind reminder for review of this KIP.
> > >
> > > Thank you very much!
> > > Lucas
> > >
> > > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I've updated the KIP by adding the dedicated endpoints for controller
> > > > connections,
> > > > and pinning threads for controller requests.
> > > > Also I've updated the title of this KIP. Please take a look and let
> me
> > > > know your feedback.
> > > >
> > > > Thanks a lot for your time!
> > > > Lucas
> > > >
> > > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com> wrote:
> > > >
> > > >> Hi Lucas,
> > > >> I agree, if we want to go forward with a separate controller plane
> and
> > > >> data
> > > >> plane and completely isolate them, having a separate port for
> > controller
> > > >> with a separate Acceptor and a Processor sounds ideal to me.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Mayuresh
> > > >>
> > > >>
> > > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi Lucas,
> > > >> >
> > > >> > Yes, I agree that a dedicated end to end control flow would be
> > ideal.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Thanks for the comment, Becket.
> > > >> > > So far, we've been trying to avoid making any request handler
> > thread
> > > >> > > special.
> > > >> > > But if we were to follow that path in order to make the two
> planes
> > > >> more
> > > >> > > isolated,
> > > >> > > what do you think about also having a dedicated processor
> thread,
> > > >> > > and dedicated port for the controller?
> > > >> > >
> > > >> > > Today one processor thread can handle multiple connections,
> let's
> > > say
> > > >> 100
> > > >> > > connections
> > > >> > >
> > > >> > > represented by connection0, ... connection99, among which
> > > >> connection0-98
> > > >> > > are from clients, while connection99 is from
> > > >> > >
> > > >> > > the controller. Further let's say after one selector polling,
> > there
> > > >> are
> > > >> > > incoming requests on all connections.
> > > >> > >
> > > >> > > When the request queue is full, (either the data request being
> > full
> > > in
> > > >> > the
> > > >> > > two queue design, or
> > > >> > >
> > > >> > > the one single queue being full in the deque design), the
> > processor
> > > >> > thread
> > > >> > > will be blocked first
> > > >> > >
> > > >> > > when trying to enqueue the data request from connection0, then
> > > >> possibly
> > > >> > > blocked for the data request
> > > >> > >
> > > >> > > from connection1, ... etc even though the controller request is
> > > ready
> > > >> to
> > > >> > be
> > > >> > > enqueued.
> > > >> > >
> > > >> > > To solve this problem, it seems we would need to have a separate
> > > port
> > > >> > > dedicated to
> > > >> > >
> > > >> > > the controller, a dedicated processor thread, a dedicated
> > controller
> > > >> > > request queue,
> > > >> > >
> > > >> > > and pinning of one request handler thread for controller
> requests.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Lucas
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <
> becket.qin@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Personally I am not fond of the dequeue approach simply
> because
> > it
> > > >> is
> > > >> > > > against the basic idea of isolating the controller plane and
> > data
> > > >> > plane.
> > > >> > > > With a single dequeue, theoretically speaking the controller
> > > >> requests
> > > >> > can
> > > >> > > > starve the clients requests. I would prefer the approach with
> a
> > > >> > separate
> > > >> > > > controller request queue and a dedicated controller request
> > > handler
> > > >> > > thread.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Jiangjie (Becket) Qin
> > > >> > > >
> > > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Sure, I can summarize the usage of correlation id. But
> before
> > I
> > > do
> > > >> > > that,
> > > >> > > > it
> > > >> > > > > seems
> > > >> > > > > the same out-of-order processing can also happen to Produce
> > > >> requests
> > > >> > > sent
> > > >> > > > > by producers,
> > > >> > > > > following the same example you described earlier.
> > > >> > > > > If that's the case, I think this probably deserves a
> separate
> > > doc
> > > >> and
> > > >> > > > > design independent of this KIP.
> > > >> > > > >
> > > >> > > > > Lucas
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> > lindong28@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > > > Hey Lucas,
> > > >> > > > > >
> > > >> > > > > > Could you update the KIP if you are confident with the
> > > approach
> > > >> > which
> > > >> > > > > uses
> > > >> > > > > > correlation id? The idea around correlation id is kind of
> > > >> scattered
> > > >> > > > > across
> > > >> > > > > > multiple emails. It will be useful if other reviews can
> read
> > > the
> > > >> > KIP
> > > >> > > to
> > > >> > > > > > understand the latest proposal.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Dong
> > > >> > > > > >
> > > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > >> > > > > >
> > > >> > > > > > > I like the idea of the dequeue implementation by Lucas.
> > This
> > > >> will
> > > >> > > > help
> > > >> > > > > us
> > > >> > > > > > > avoid additional queue for controller and additional
> > configs
> > > >> in
> > > >> > > > Kafka.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Mayuresh
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > > >> becket.qin@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hi Jun,
> > > >> > > > > > > >
> > > >> > > > > > > > The usage of correlation ID might still be useful to
> > > address
> > > >> > the
> > > >> > > > > cases
> > > >> > > > > > > > that the controller epoch and leader epoch check are
> not
> > > >> > > sufficient
> > > >> > > > > to
> > > >> > > > > > > > guarantee correct behavior. For example, if the
> > controller
> > > >> > sends
> > > >> > > a
> > > >> > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest,
> > and
> > > >> the
> > > >> > > > broker
> > > >> > > > > > > > processes it in the reverse order, the replica may
> still
> > > be
> > > >> > > wrongly
> > > >> > > > > > > > recreated, right?
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > >
> > > >> > > > > > > > Jiangjie (Becket) Qin
> > > >> > > > > > > >
> > > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> > jun@confluent.io
> > > >
> > > >> > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hmm, since we already use controller epoch and
> leader
> > > >> epoch
> > > >> > for
> > > >> > > > > > > properly
> > > >> > > > > > > > > caching the latest partition state, do we really
> need
> > > >> > > correlation
> > > >> > > > > id
> > > >> > > > > > > for
> > > >> > > > > > > > > ordering the controller requests?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Jun
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > >> > > > becket.qin@gmail.com>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> Lucas and Mayuresh,
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Good idea. The correlation id should work.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> In the ControllerChannelManager, a request will be
> > > resent
> > > >> > > until
> > > >> > > > a
> > > >> > > > > > > > response
> > > >> > > > > > > > >> is received. So if the controller to broker
> > connection
> > > >> > > > disconnects
> > > >> > > > > > > after
> > > >> > > > > > > > >> controller sends R1_a, but before the response of
> > R1_a
> > > is
> > > >> > > > > received,
> > > >> > > > > > a
> > > >> > > > > > > > >> disconnection may cause the controller to resend
> > R1_b.
> > > >> i.e.
> > > >> > > > until
> > > >> > > > > R1
> > > >> > > > > > > is
> > > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > >> > > > > > > > >> This gives two guarantees:
> > > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must
> have
> > > been
> > > >> > > > > processed
> > > >> > > > > > at
> > > >> > > > > > > > >> least once.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> So on the broker side, with a single thread
> > controller
> > > >> > request
> > > >> > > > > > > handler,
> > > >> > > > > > > > the
> > > >> > > > > > > > >> logic should be:
> > > >> > > > > > > > >> 1. Process what ever request seen in the controller
> > > >> request
> > > >> > > > queue
> > > >> > > > > > > > >> 2. For the given epoch, drop request if its
> > correlation
> > > >> id
> > > >> > is
> > > >> > > > > > smaller
> > > >> > > > > > > > than
> > > >> > > > > > > > >> that of the last processed request.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Thanks,
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Jiangjie (Becket) Qin
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > > >> jun@confluent.io>
> > > >> > > > > wrote:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>> I agree that there is no strong ordering when
> there
> > > are
> > > >> > more
> > > >> > > > than
> > > >> > > > > > one
> > > >> > > > > > > > >>> socket connections. Currently, we rely on
> > > >> controllerEpoch
> > > >> > and
> > > >> > > > > > > > leaderEpoch
> > > >> > > > > > > > >>> to ensure that the receiving broker picks up the
> > > latest
> > > >> > state
> > > >> > > > for
> > > >> > > > > > > each
> > > >> > > > > > > > >>> partition.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> One potential issue with the dequeue approach is
> > that
> > > if
> > > >> > the
> > > >> > > > > queue
> > > >> > > > > > is
> > > >> > > > > > > > >> full,
> > > >> > > > > > > > >>> there is no guarantee that the controller requests
> > > will
> > > >> be
> > > >> > > > > enqueued
> > > >> > > > > > > > >>> quickly.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Thanks,
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Jun
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > >> > > > > > > > >>>> wrote:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> > > >> > NetworkClient
> > > >> > > > > > > > >> constructor.
> > > >> > > > > > > > >>>> Since we reuse the same NetworkClient between
> > > >> Controller
> > > >> > and
> > > >> > > > the
> > > >> > > > > > > > >> broker,
> > > >> > > > > > > > >>> a
> > > >> > > > > > > > >>>> disconnection should not cause it to reset to 0,
> in
> > > >> which
> > > >> > > case
> > > >> > > > > it
> > > >> > > > > > > can
> > > >> > > > > > > > >> be
> > > >> > > > > > > > >>>> used to reject obsolete requests.
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> Thanks,
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> Mayuresh
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > >> > > > > lucasatucla@gmail.com
> > > >> > > > > > >
> > > >> > > > > > > > >>> wrote:
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>>> @Dong,
> > > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> @All
> > > >> > > > > > > > >>>>> Regarding the example given by Dong, it seems
> even
> > > if
> > > >> we
> > > >> > > use
> > > >> > > > a
> > > >> > > > > > > queue,
> > > >> > > > > > > > >>>> and a
> > > >> > > > > > > > >>>>> dedicated controller request handling thread,
> > > >> > > > > > > > >>>>> the same result can still happen because R1_a
> will
> > > be
> > > >> > sent
> > > >> > > on
> > > >> > > > > one
> > > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a
> > > different
> > > >> > > > > connection,
> > > >> > > > > > > > >>>>> and there is no ordering between different
> > > >> connections on
> > > >> > > the
> > > >> > > > > > > broker
> > > >> > > > > > > > >>>> side.
> > > >> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it
> > seems
> > > >> > > > > correlation
> > > >> > > > > > id
> > > >> > > > > > > > >>>> within
> > > >> > > > > > > > >>>>> the same NetworkClient object is monotonically
> > > >> increasing
> > > >> > > and
> > > >> > > > > > never
> > > >> > > > > > > > >>>> reset,
> > > >> > > > > > > > >>>>> hence a broker can leverage that to properly
> > reject
> > > >> > > obsolete
> > > >> > > > > > > > >> requests.
> > > >> > > > > > > > >>>>> Thoughts?
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> Thanks,
> > > >> > > > > > > > >>>>> Lucas
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh
> Gharat
> > <
> > > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
> > > >> > connection
> > > >> > > > > > loss, I
> > > >> > > > > > > > >>>> think.
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>> Thanks,
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>> Mayuresh
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh
> Gharat
> > <
> > > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > >> > > > > > > > >>>>>> wrote:
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing
> > can
> > > >> > happen
> > > >> > > > > with
> > > >> > > > > > > > >>>> having 2
> > > >> > > > > > > > >>>>>>> separate queues as well and it can even happen
> > > >> today.
> > > >> > > > > > > > >>>>>>> Can we use the correlationId in the request
> from
> > > the
> > > >> > > > > controller
> > > >> > > > > > > > >> to
> > > >> > > > > > > > >>>> the
> > > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> Mayuresh
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > >> > > > > > becket.qin@gmail.com
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>>> wrote:
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
> > > >> controller
> > > >> > > > > request
> > > >> > > > > > > > >>>> handling
> > > >> > > > > > > > >>>>>>>> thread would be a better isolation. It also
> > > solves
> > > >> the
> > > >> > > > > > > > >> reordering
> > > >> > > > > > > > >>>>> issue.
> > > >> > > > > > > > >>>>>>>>
> > > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > > >> > > > > > > > >>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>
> > > >> > > > > > > > >>>>>>>>> Good example. I think this scenario can
> occur
> > in
> > > >> the
> > > >> > > > > current
> > > >> > > > > > > > >>> code
> > > >> > > > > > > > >>>> as
> > > >> > > > > > > > >>>>>>>> well
> > > >> > > > > > > > >>>>>>>>> but with even lower probability given that
> > there
> > > >> are
> > > >> > > > other
> > > >> > > > > > > > >>>>>>>> non-controller
> > > >> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy
> > though
> > > >> and
> > > >> > I
> > > >> > > > > think
> > > >> > > > > > a
> > > >> > > > > > > > >>>> safer
> > > >> > > > > > > > >>>>>>>>> approach would be separate queues and
> pinning
> > > >> > > controller
> > > >> > > > > > > > >> request
> > > >> > > > > > > > >>>>>>>> handling
> > > >> > > > > > > > >>>>>>>>> to one handler thread.
> > > >> > > > > > > > >>>>>>>>>
> > > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > >> > > > > > > > >> lindong28@gmail.com
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> I think you are right that there may be
> > > >> out-of-order
> > > >> > > > > > > > >>> processing.
> > > >> > > > > > > > >>>>>>>> However,
> > > >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may
> > also
> > > >> > happen
> > > >> > > > even
> > > >> > > > > > > > >> if
> > > >> > > > > > > > >>> we
> > > >> > > > > > > > >>>>>> use a
> > > >> > > > > > > > >>>>>>>>>> separate queue.
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> Here is the example:
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected
> > > before
> > > >> > > > > receiving
> > > >> > > > > > > > >>>>>> response.
> > > >> > > > > > > > >>>>>>>>> Then
> > > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests
> now
> > > >> stay
> > > >> > in
> > > >> > > > the
> > > >> > > > > > > > >>>>> controller
> > > >> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
> > > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue
> > and
> > > >> then
> > > >> > > > > thread2
> > > >> > > > > > > > >>>> takes
> > > >> > > > > > > > >>>>>> R2
> > > >> > > > > > > > >>>>>>>>> from
> > > >> > > > > > > > >>>>>>>>>> the request queue almost at the same time.
> > > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel.
> > > >> There is
> > > >> > > > > chance
> > > >> > > > > > > > >>> that
> > > >> > > > > > > > >>>>>> R2's
> > > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for
> > both
> > > >> > > > approaches
> > > >> > > > > > > > >> with
> > > >> > > > > > > > >>>>> very
> > > >> > > > > > > > >>>>>>>> low
> > > >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to
> add
> > > the
> > > >> > extra
> > > >> > > > > > > > >> queue.
> > > >> > > > > > > > >>>> What
> > > >> > > > > > > > >>>>>> do
> > > >> > > > > > > > >>>>>>>> you
> > > >> > > > > > > > >>>>>>>>>> think?
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>> Dong
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket
> Qin <
> > > >> > > > > > > > >>>> becket.qin@gmail.com
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was
> > > >> bright
> > > >> > up
> > > >> > > > some
> > > >> > > > > > > > >>> time
> > > >> > > > > > > > >>>>> ago
> > > >> > > > > > > > >>>>>>>> when
> > > >> > > > > > > > >>>>>>>>>> we
> > > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the
> > > request.
> > > >> The
> > > >> > > > > > > > >> concern
> > > >> > > > > > > > >>>> was
> > > >> > > > > > > > >>>>>> that
> > > >> > > > > > > > >>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
> > > >> processed in
> > > >> > > > > order.
> > > >> > > > > > > > >>> If
> > > >> > > > > > > > >>>> we
> > > >> > > > > > > > >>>>>> can
> > > >> > > > > > > > >>>>>>>>>> ensure
> > > >> > > > > > > > >>>>>>>>>>> that there is one controller request in
> the
> > > >> request
> > > >> > > > > > > > >> channel,
> > > >> > > > > > > > >>>> the
> > > >> > > > > > > > >>>>>>>> order
> > > >> > > > > > > > >>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are
> > > more
> > > >> > than
> > > >> > > > one
> > > >> > > > > > > > >>>>>> controller
> > > >> > > > > > > > >>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller
> > > request
> > > >> > order
> > > >> > > > may
> > > >> > > > > > > > >>>> change
> > > >> > > > > > > > >>>>>> and
> > > >> > > > > > > > >>>>>>>>>> cause
> > > >> > > > > > > > >>>>>>>>>>> problem. For example, think about the
> > > following
> > > >> > > > sequence:
> > > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request
> R1
> > > to
> > > >> > > broker
> > > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request
> to
> > > the
> > > >> > head
> > > >> > > > of
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >>>>>> request
> > > >> > > > > > > > >>>>>>>>>> queue.
> > > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed
> > and
> > > >> the
> > > >> > > > > > > > >> controller
> > > >> > > > > > > > >>>>>>>>> reconnected
> > > >> > > > > > > > >>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>> the broker.
> > > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the
> > broker
> > > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the
> head
> > > of
> > > >> the
> > > >> > > > > > > > >> request
> > > >> > > > > > > > >>>>> queue.
> > > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be
> processed
> > > >> before
> > > >> > > R1
> > > >> > > > is
> > > >> > > > > > > > >>>>>> processed,
> > > >> > > > > > > > >>>>>>>>>> which
> > > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel
> Koshy
> > <
> > > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > >> > > > > > > > >>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears
> to
> > > be
> > > >> a
> > > >> > > > simpler
> > > >> > > > > > > > >>>> less
> > > >> > > > > > > > >>>>>>>>> invasive
> > > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > > >> Jun/Becket/others,
> > > >> > > do
> > > >> > > > > > > > >> you
> > > >> > > > > > > > >>>> see
> > > >> > > > > > > > >>>>>> any
> > > >> > > > > > > > >>>>>>>>>>> pitfalls
> > > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > > >> > > > > > > > >>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas
> > Wang
> > > <
> > > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I
> > > haven't
> > > >> > > thought
> > > >> > > > > > > > >>>>> before.
> > > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand
> > pretty
> > > >> > well,
> > > >> > > > and
> > > >> > > > > > > > >>>> also
> > > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size
> metric
> > > and
> > > >> > > > capacity
> > > >> > > > > > > > >>>>> config
> > > >> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In
> fact,
> > > if
> > > >> we
> > > >> > > were
> > > >> > > > > > > > >> to
> > > >> > > > > > > > >>>>> adopt
> > > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public
> interface
> > > >> change,
> > > >> > > and
> > > >> > > > > > > > >> we
> > > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can
> > > readily
> > > >> > > > satisfy
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>>>>>> requirement
> > > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also
> > allowing
> > > >> > > inserting
> > > >> > > > > > > > >> at
> > > >> > > > > > > > >>>>> both
> > > >> > > > > > > > >>>>>>>> ends.
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is
> > tied
> > > to
> > > >> > the
> > > >> > > > > > > > >>>>> coincidence
> > > >> > > > > > > > >>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and there
> > are
> > > >> two
> > > >> > > ends
> > > >> > > > > > > > >>> to a
> > > >> > > > > > > > >>>>>>>> deque.
> > > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it
> > seems
> > > >> the
> > > >> > > > > > > > >> network
> > > >> > > > > > > > >>>>> layer
> > > >> > > > > > > > >>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer
> > logic,
> > > >> e.g.
> > > >> > > if
> > > >> > > > > > > > >> we
> > > >> > > > > > > > >>>> were
> > > >> > > > > > > > >>>>>> to
> > > >> > > > > > > > >>>>>>>> add
> > > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future
> for
> > > some
> > > >> > > > reason,
> > > >> > > > > > > > >>> we
> > > >> > > > > > > > >>>>>> would
> > > >> > > > > > > > >>>>>>>>>>> probably
> > > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of
> separate
> > > >> queues,
> > > >> > > one
> > > >> > > > > > > > >>> for
> > > >> > > > > > > > >>>>> each
> > > >> > > > > > > > >>>>>>>>>> priority
> > > >> > > > > > > > >>>>>>>>>>>>> level.
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and
> > > lean
> > > >> > > toward
> > > >> > > > > > > > >>> your
> > > >> > > > > > > > >>>>>>>> suggested
> > > >> > > > > > > > >>>>>>>>>>>>> approach.
> > > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new
> > design,
> > > >> I'm
> > > >> > > > > > > > >>> answering
> > > >> > > > > > > > >>>>>> your
> > > >> > > > > > > > >>>>>>>>>>> question
> > > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
> > > >> > suggestion
> > > >> > > > > > > > >> makes
> > > >> > > > > > > > >>>>>> sense,
> > > >> > > > > > > > >>>>>>>> and
> > > >> > > > > > > > >>>>>>>>>> I'm
> > > >> > > > > > > > >>>>>>>>>>>> ok
> > > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > >> > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20
> > > being
> > > >> > > > > > > > >> sufficient
> > > >> > > > > > > > >>>>>> enough.
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM,
> Mayuresh
> > > >> Gharat
> > > >> > <
> > > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
> > > >> prioritize
> > > >> > > the
> > > >> > > > > > > > >>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request
> > > queue
> > > >> > to a
> > > >> > > > > > > > >>>>> dequeue,
> > > >> > > > > > > > >>>>>>>> where
> > > >> > > > > > > > >>>>>>>>>> you
> > > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests
> > (produce,
> > > >> > > > > > > > >>>> consume,..etc)
> > > >> > > > > > > > >>>>>> to
> > > >> > > > > > > > >>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>> end
> > > >> > > > > > > > >>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller
> > > request,
> > > >> > you
> > > >> > > > > > > > >>> insert
> > > >> > > > > > > > >>>>> it
> > > >> > > > > > > > >>>>>> to
> > > >> > > > > > > > >>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>> head
> > > >> > > > > > > > >>>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the
> > controller
> > > >> > > request
> > > >> > > > > > > > >>> will
> > > >> > > > > > > > >>>>> be
> > > >> > > > > > > > >>>>>>>> given
> > > >> > > > > > > > >>>>>>>>>>>> higher
> > > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request
> from
> > > the
> > > >> > > socket
> > > >> > > > > > > > >>> and
> > > >> > > > > > > > >>>>> mute
> > > >> > > > > > > > >>>>>>>> it
> > > >> > > > > > > > >>>>>>>>> and
> > > >> > > > > > > > >>>>>>>>>>>> only
> > > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request,
> > this
> > > >> would
> > > >> > > > > > > > >>> ensure
> > > >> > > > > > > > >>>>> that
> > > >> > > > > > > > >>>>>>>> we
> > > >> > > > > > > > >>>>>>>>>> don't
> > > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of
> order.
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the
> > second
> > > >> queue
> > > >> > > and
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>>>>> additional
> > > >> > > > > > > > >>>>>>>>>>>>> config
> > > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket
> > Qin
> > > <
> > > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > >> > > > > > > > >>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I
> > agree
> > > >> the
> > > >> > > > > > > > >>> current
> > > >> > > > > > > > >>>>>> design
> > > >> > > > > > > > >>>>>>>>>> makes
> > > >> > > > > > > > >>>>>>>>>>>>> sense.
> > > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new
> > > config
> > > >> > for
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>> queue
> > > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think
> > of a
> > > >> case
> > > >> > > in
> > > >> > > > > > > > >>>> which
> > > >> > > > > > > > >>>>>>>> users
> > > >> > > > > > > > >>>>>>>>>>> would
> > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM,
> Becket
> > > Qin
> > > >> <
> > > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased
> to
> > > >> "do we
> > > >> > > > > > > > >>>> expect
> > > >> > > > > > > > >>>>>>>> user to
> > > >> > > > > > > > >>>>>>>>>>> ever
> > > >> > > > > > > > >>>>>>>>>>>>>> change
> > > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue
> capacity"?
> > > If
> > > >> we
> > > >> > > > > > > > >>> agree
> > > >> > > > > > > > >>>>> that
> > > >> > > > > > > > >>>>>>>> 20
> > > >> > > > > > > > >>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>> already
> > > >> > > > > > > > >>>>>>>>>>>>>> a
> > > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we
> do
> > > not
> > > >> > > > > > > > >> expect
> > > >> > > > > > > > >>>> user
> > > >> > > > > > > > >>>>>> to
> > > >> > > > > > > > >>>>>>>>>> change
> > > >> > > > > > > > >>>>>>>>>>>> it,
> > > >> > > > > > > > >>>>>>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>>>> it
> > > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a
> > > config?
> > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM,
> Lucas
> > > >> Wang <
> > > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > >> > > > > > > > >>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are
> > right
> > > >> that
> > > >> > > > > > > > >>>>> normally
> > > >> > > > > > > > >>>>>>>> there
> > > >> > > > > > > > >>>>>>>>>>>> should
> > > >> > > > > > > > >>>>>>>>>>>>> be
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of
> > > muting,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there
> > > would
> > > >> be
> > > >> > > > > > > > >> many
> > > >> > > > > > > > >>>>>>>> enqueued
> > > >> > > > > > > > >>>>>>>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and
> I'm
> > > not
> > > >> > sure
> > > >> > > > > > > > >>>> which
> > > >> > > > > > > > >>>>>> part
> > > >> > > > > > > > >>>>>>>>>>> conveys
> > > >> > > > > > > > >>>>>>>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point
> it
> > > out
> > > >> > the
> > > >> > > > > > > > >>>>> section.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in
> > normal
> > > >> > > > > > > > >>>> conditions,
> > > >> > > > > > > > >>>>>> the
> > > >> > > > > > > > >>>>>>>>>> current
> > > >> > > > > > > > >>>>>>>>>>>>>> design
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple
> controllers
> > > >> > running
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't
> > have
> > > >> the
> > > >> > > > > > > > >>>>> controller
> > > >> > > > > > > > >>>>>>>>> queue
> > > >> > > > > > > > >>>>>>>>>>>>> capacity
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity
> to
> > > be
> > > >> 1,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests
> from
> > > >> > > > > > > > >> different
> > > >> > > > > > > > >>>>>>>> controllers
> > > >> > > > > > > > >>>>>>>>>>> will
> > > >> > > > > > > > >>>>>>>>>>>> be
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome
> > times,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want.
> On
> > > the
> > > >> > > > > > > > >> other
> > > >> > > > > > > > >>>>> hand,
> > > >> > > > > > > > >>>>>>>>> adding
> > > >> > > > > > > > >>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20,
> > > >> guards
> > > >> > us
> > > >> > > > > > > > >>> from
> > > >> > > > > > > > >>>>>>>> issues
> > > >> > > > > > > > >>>>>>>>> in
> > > >> > > > > > > > >>>>>>>>>>>> those
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there
> isn't
> > > >> much
> > > >> > > > > > > > >>>> downside
> > > >> > > > > > > > >>>>> of
> > > >> > > > > > > > >>>>>>>>> adding
> > > >> > > > > > > > >>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>> extra
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an
> > obsolete
> > > >> > > > > > > > >>> statement
> > > >> > > > > > > > >>>>>> based
> > > >> > > > > > > > >>>>>>>> on
> > > >> > > > > > > > >>>>>>>>> a
> > > >> > > > > > > > >>>>>>>>>>>>> previous
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in
> > the
> > > >> KIP.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM,
> > > Mayuresh
> > > >> > > > > > > > >>> Gharat <
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you
> > think
> > > >> "The
> > > >> > > > > > > > >>>> memory
> > > >> > > > > > > > >>>>>>>>>>> consumption
> > > >> > > > > > > > >>>>>>>>>>>>> can
> > > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued
> > > requests
> > > >> > can
> > > >> > > > > > > > >>> go
> > > >> > > > > > > > >>>> up
> > > >> > > > > > > > >>>>>> to
> > > >> > > > > > > > >>>>>>>> 2x"
> > > >> > > > > > > > >>>>>>>>>> in
> > > >> > > > > > > > >>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > >> > > > > > > > >> controller
> > > >> > > > > > > > >>>> to a
> > > >> > > > > > > > >>>>>>>> Broker
> > > >> > > > > > > > >>>>>>>>>> are
> > > >> > > > > > > > >>>>>>>>>>>> not
> > > >> > > > > > > > >>>>>>>>>>>>>>> high
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM
> > Becket
> > > >> Qin <
> > > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> > Separating
> > > >> the
> > > >> > > > > > > > >>>> control
> > > >> > > > > > > > >>>>>>>> plane
> > > >> > > > > > > > >>>>>>>>>> from
> > > >> > > > > > > > >>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>> data
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > >> > > > > > > > >> controller
> > > >> > > > > > > > >>>>>> request
> > > >> > > > > > > > >>>>>>>>> queue
> > > >> > > > > > > > >>>>>>>>>>> may
> > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a
> > common
> > > >> case?
> > > >> > > > > > > > >>> The
> > > >> > > > > > > > >>>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>>>>> requests
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > >> > > > > > > > >>> SocketServer
> > > >> > > > > > > > >>>>>> will
> > > >> > > > > > > > >>>>>>>>> mute
> > > >> > > > > > > > >>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the
> > > >> request
> > > >> > > > > > > > >>>>> channel.
> > > >> > > > > > > > >>>>>>>> So
> > > >> > > > > > > > >>>>>>>>>>>> assuming
> > > >> > > > > > > > >>>>>>>>>>>>>>> there
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between
> > controller
> > > >> and
> > > >> > > > > > > > >>> each
> > > >> > > > > > > > >>>>>>>> broker,
> > > >> > > > > > > > >>>>>>>>> on
> > > >> > > > > > > > >>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>> broker
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> side,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one
> controller
> > > >> request
> > > >> > > > > > > > >>> in
> > > >> > > > > > > > >>>>> the
> > > >> > > > > > > > >>>>>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> queue
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the
> > > case,
> > > >> do
> > > >> > > > > > > > >> we
> > > >> > > > > > > > >>>>> need
> > > >> > > > > > > > >>>>>> a
> > > >> > > > > > > > >>>>>>>>>>> separate
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The
> > > >> default
> > > >> > > > > > > > >>>> value
> > > >> > > > > > > > >>>>> 20
> > > >> > > > > > > > >>>>>>>>> means
> > > >> > > > > > > > >>>>>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>> we
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> expect
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches
> to
> > > >> happen
> > > >> > > > > > > > >>> in a
> > > >> > > > > > > > >>>>>> short
> > > >> > > > > > > > >>>>>>>>>> period
> > > >> > > > > > > > >>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>> time.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> I
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> am
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should
> > > increase
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >>>>>>>> controller
> > > >> > > > > > > > >>>>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>>>>>> queue
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as
> it
> > > >> seems
> > > >> > > > > > > > >>>>>> indicating
> > > >> > > > > > > > >>>>>>>>>>> something
> > > >> > > > > > > > >>>>>>>>>>>>>> very
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrong
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM,
> > Dong
> > > >> Lin <
> > > >> > > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > >> > > > > > > > >>> intuitive.
> > > >> > > > > > > > >>>> It
> > > >> > > > > > > > >>>>>>>> will
> > > >> > > > > > > > >>>>>>>>> be
> > > >> > > > > > > > >>>>>>>>>>> good
> > > >> > > > > > > > >>>>>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> learn
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other
> > > >> reviewers.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM,
> > > Lucas
> > > >> > > > > > > > >>> Wang <
> > > >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation
> > section
> > > of
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>> KIP
> > > >> > > > > > > > >>>>>> by
> > > >> > > > > > > > >>>>>>>>>>>> explaining
> > > >> > > > > > > > >>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> cases
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me
> know
> > > your
> > > >> > > > > > > > >>>>>> comments.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM,
> > > Lucas
> > > >> > > > > > > > >>> Wang
> > > >> > > > > > > > >>>> <
> > > >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being
> slow
> > > is
> > > >> > > > > > > > >>>> merely
> > > >> > > > > > > > >>>>>>>> for me
> > > >> > > > > > > > >>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>> easily
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> construct
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> a
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce
> > requests.
> > > >> > > > > > > > >> In
> > > >> > > > > > > > >>>>>>>> production,
> > > >> > > > > > > > >>>>>>>>>>> other
> > > >> > > > > > > > >>>>>>>>>>>>>> than
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> disk
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be
> > caused
> > > >> > > > > > > > >> by
> > > >> > > > > > > > >>>> high
> > > >> > > > > > > > >>>>>>>>> produce
> > > >> > > > > > > > >>>>>>>>>>> QPS.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want
> to
> > > kill
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>>>> broker
> > > >> > > > > > > > >>>>>>>>> and
> > > >> > > > > > > > >>>>>>>>>>>>> that's
> > > >> > > > > > > > >>>>>>>>>>>>>>> when
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> this
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous
> > > question
> > > >> > > > > > > > >>>> about
> > > >> > > > > > > > >>>>>> each
> > > >> > > > > > > > >>>>>>>>>>>>>> ProduceRequest
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> covering
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a
> > > LeaderAndIsr
> > > >> > > > > > > > >>>>> request
> > > >> > > > > > > > >>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>> enqueued
> > > >> > > > > > > > >>>>>>>>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> tries
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > >> > > > > > > > >> broker0,
> > > >> > > > > > > > >>>>> from
> > > >> > > > > > > > >>>>>>>>> leader
> > > >> > > > > > > > >>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>>>> follower
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*,
> say
> > > >> > > > > > > > >>>> *test-0*.
> > > >> > > > > > > > >>>>>> For
> > > >> > > > > > > > >>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>> sake
> > > >> > > > > > > > >>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> argument,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other
> > > brokers,
> > > >> > > > > > > > >>> say
> > > >> > > > > > > > >>>>>>>> broker1,
> > > >> > > > > > > > >>>>>>>>>> have
> > > >> > > > > > > > >>>>>>>>>>>>>>> *stopped*
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e.
> broker0.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce
> > requests
> > > >> > > > > > > > >>> have
> > > >> > > > > > > > >>>>>> acks =
> > > >> > > > > > > > >>>>>>>>> -1
> > > >> > > > > > > > >>>>>>>>>>>> (ALL)
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > >> > > > > > > > >>>> ProduceRequests
> > > >> > > > > > > > >>>>>>>> ahead
> > > >> > > > > > > > >>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> will
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> be
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never
> be
> > > >> > > > > > > > >>>>> replicated
> > > >> > > > > > > > >>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>> other
> > > >> > > > > > > > >>>>>>>>>>>>>> brokers
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> (because
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above),
> they
> > > will
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when
> > the
> > > >> > > > > > > > >>>>>>>> LeaderAndISR
> > > >> > > > > > > > >>>>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> processed
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> or
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0
> will
> > > >> > > > > > > > >>>>> immediately
> > > >> > > > > > > > >>>>>>>>>>> transition
> > > >> > > > > > > > >>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> partition
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker
> > > sees
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>>>>>> replication
> > > >> > > > > > > > >>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> remaining
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> 19
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a
> > response
> > > >> > > > > > > > >>>>> indicating
> > > >> > > > > > > > >>>>>>>> that
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the
> leader
> > > for
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>>>>> "test-0".
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > >> > > > > > > > >> between
> > > >> > > > > > > > >>>> 1.1
> > > >> > > > > > > > >>>>>> and
> > > >> > > > > > > > >>>>>>>>> 1.2,
> > > >> > > > > > > > >>>>>>>>>>>> let's
> > > >> > > > > > > > >>>>>>>>>>>>>> say
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> there
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> are
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of
> the
> > > >> > > > > > > > >>>>>> LeaderAndISR,
> > > >> > > > > > > > >>>>>>>>> and
> > > >> > > > > > > > >>>>>>>>>>>> there
> > > >> > > > > > > > >>>>>>>>>>>>>> are
> > > >> > > > > > > > >>>>>>>>>>>>>>> 8
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> io
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > >> > > > > > > > >>>>>> approximately
> > > >> > > > > > > > >>>>>>>>> 3000
> > > >> > > > > > > > >>>>>>>>>>>>> produce
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> requests.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Now
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread
> > > that
> > > >> > > > > > > > >>>>> finally
> > > >> > > > > > > > >>>>>>>>>> processed
> > > >> > > > > > > > >>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests,
> > if
> > > >> > > > > > > > >> we
> > > >> > > > > > > > >>>>> model
> > > >> > > > > > > > >>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>> time
> > > >> > > > > > > > >>>>>>>>>>>> when
> > > >> > > > > > > > >>>>>>>>>>>>>>> their
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0,
> t1,
> > > >> > > > > > > > >>>> ...t2999,
> > > >> > > > > > > > >>>>>> and
> > > >> > > > > > > > >>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> request
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> is
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st
> > produce
> > > >> > > > > > > > >>>> request
> > > >> > > > > > > > >>>>>>>> would
> > > >> > > > > > > > >>>>>>>>>> have
> > > >> > > > > > > > >>>>>>>>>>>>>> waited
> > > >> > > > > > > > >>>>>>>>>>>>>>> an
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> extra
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the
> purgatory,
> > > the
> > > >> > > > > > > > >>> 2nd
> > > >> > > > > > > > >>>>> an
> > > >> > > > > > > > >>>>>>>> extra
> > > >> > > > > > > > >>>>>>>>>>> time
> > > >> > > > > > > > >>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> t3000 -
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> t1,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > >> > > > > > > > >>>> difference
> > > >> > > > > > > > >>>>> is
> > > >> > > > > > > > >>>>>>>>> bigger
> > > >> > > > > > > > >>>>>>>>>>> for
> > > >> > > > > > > > >>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> earlier
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the
> > later
> > > >> > > > > > > > >>> ones.
> > > >> > > > > > > > >>>>> For
> > > >> > > > > > > > >>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>> same
> > > >> > > > > > > > >>>>>>>>>>>>>> reason,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the
> > > bigger
> > > >> > > > > > > > >>>>> benefit
> > > >> > > > > > > > >>>>>>>> we
> > > >> > > > > > > > >>>>>>>>> get
> > > >> > > > > > > > >>>>>>>>>>>>> (capped
> > > >> > > > > > > > >>>>>>>>>>>>>>> by
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce
> > requests
> > > >> > > > > > > > >>> have
> > > >> > > > > > > > >>>>>>>> acks=0 or
> > > >> > > > > > > > >>>>>>>>>>>> acks=1
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > >> > > > > > > > >> differences
> > > >> > > > > > > > >>> in
> > > >> > > > > > > > >>>>>> this
> > > >> > > > > > > > >>>>>>>>> case,
> > > >> > > > > > > > >>>>>>>>>>> but
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the
> > records
> > > >> > > > > > > > >> of
> > > >> > > > > > > > >>>>>>>> partition
> > > >> > > > > > > > >>>>>>>>>>> test-0
> > > >> > > > > > > > >>>>>>>>>>>> in
> > > >> > > > > > > > >>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > >> > > > > > > > >>> LeaderAndISR
> > > >> > > > > > > > >>>>>> will
> > > >> > > > > > > > >>>>>>>> be
> > > >> > > > > > > > >>>>>>>>>>>> appended
> > > >> > > > > > > > >>>>>>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> local
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be
> > truncated
> > > >> > > > > > > > >>> after
> > > >> > > > > > > > >>>>>>>>> processing
> > > >> > > > > > > > >>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial
> > definition
> > > >> > > > > > > > >> of
> > > >> > > > > > > > >>>> data
> > > >> > > > > > > > >>>>>>>> loss
> > > >> > > > > > > > >>>>>>>>> in
> > > >> > > > > > > > >>>>>>>>>>>> terms
> > > >> > > > > > > > >>>>>>>>>>>>> of
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>> messages
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can
> > mitigate
> > > >> > > > > > > > >>> the
> > > >> > > > > > > > >>>>>> effect
> > > >> > > > > > > > >>>>>>>>>> since
> > > >> > > > > > > > >>>>>>>>>>> if
> > > >> > > > > > > > >>>>>>>>>>>>> the
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > >> > > > > > > > >> response
> > > >> > > > > > > > >>> to
> > > >> > > > > > > > >>>>>>>>> producers
> > > >> > > > > > > > >>>>>>>>>>> will
> > > >> > > > > > > > >>>>>>>>>>>>>> have
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the
> NotLeaderForPartition
> > > >> > > > > > > > >>> error,
> > > >> > > > > > > > >>>>>>>> causing
> > > >> > > > > > > > >>>>>>>>>>>> producers
> > > >> > > > > > > > >>>>>>>>>>>>>> to
> > > >> > > > > > > > >>>>>>>>>>>>>>>>> retry
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the
> > > benefit
> > > >> > > > > > > > >>> for
> > > >> > > > > > > > >>>>>>>> reducing
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> -Regards,
> > > >> Mayuresh R. Gharat
> > > >> (862) 250-7125
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for your review, Dong.
Ack that these configs will have a bigger impact for users.

On the other hand, I would argue that the request queue becoming full
may or may not be a rare scenario.
How often the request queue gets full depends on the request incoming rate,
the request processing rate, and the size of the request queue.
When that happens, the dedicated endpoints design can better handle
it than any of the previously discussed options.

Another reason I made the change was that I have the same taste
as Becket that it's a better separation of the control plane from the data
plane.

Finally, I want to clarify that this change is NOT motivated by the
out-of-order
processing discussion. The latter problem is orthogonal to this KIP, and it
can happen in any of the design options we discussed for this KIP so far.
So I'd like to address out-of-order processing separately in another thread,
and avoid mentioning it in this KIP.

Thanks,
Lucas

On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Lucas,
>
> Thanks for the update.
>
> The current KIP propose new broker configs "listeners.for.controller" and
> "advertised.listeners.for.controller". This is going to be a big change
> since listeners are among the most important configs that every user needs
> to change. According to the rejected alternative section, it seems that the
> reason to add these two configs is to improve performance when the data
> request queue is full rather than for correctness. It should be a very rare
> scenario and I am not sure we should add configs for all users just to
> improve the performance in such rare scenario.
>
> Also, if the new design is based on the issues which are discovered in the
> recent discussion, e.g. out of order processing if we don't use a dedicated
> thread for controller request, it may be useful to explain the problem in
> the motivation section.
>
> Thanks,
> Dong
>
> On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > A kind reminder for review of this KIP.
> >
> > Thank you very much!
> > Lucas
> >
> > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I've updated the KIP by adding the dedicated endpoints for controller
> > > connections,
> > > and pinning threads for controller requests.
> > > Also I've updated the title of this KIP. Please take a look and let me
> > > know your feedback.
> > >
> > > Thanks a lot for your time!
> > > Lucas
> > >
> > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com> wrote:
> > >
> > >> Hi Lucas,
> > >> I agree, if we want to go forward with a separate controller plane and
> > >> data
> > >> plane and completely isolate them, having a separate port for
> controller
> > >> with a separate Acceptor and a Processor sounds ideal to me.
> > >>
> > >> Thanks,
> > >>
> > >> Mayuresh
> > >>
> > >>
> > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com>
> > wrote:
> > >>
> > >> > Hi Lucas,
> > >> >
> > >> > Yes, I agree that a dedicated end to end control flow would be
> ideal.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Thanks for the comment, Becket.
> > >> > > So far, we've been trying to avoid making any request handler
> thread
> > >> > > special.
> > >> > > But if we were to follow that path in order to make the two planes
> > >> more
> > >> > > isolated,
> > >> > > what do you think about also having a dedicated processor thread,
> > >> > > and dedicated port for the controller?
> > >> > >
> > >> > > Today one processor thread can handle multiple connections, let's
> > say
> > >> 100
> > >> > > connections
> > >> > >
> > >> > > represented by connection0, ... connection99, among which
> > >> connection0-98
> > >> > > are from clients, while connection99 is from
> > >> > >
> > >> > > the controller. Further let's say after one selector polling,
> there
> > >> are
> > >> > > incoming requests on all connections.
> > >> > >
> > >> > > When the request queue is full, (either the data request being
> full
> > in
> > >> > the
> > >> > > two queue design, or
> > >> > >
> > >> > > the one single queue being full in the deque design), the
> processor
> > >> > thread
> > >> > > will be blocked first
> > >> > >
> > >> > > when trying to enqueue the data request from connection0, then
> > >> possibly
> > >> > > blocked for the data request
> > >> > >
> > >> > > from connection1, ... etc even though the controller request is
> > ready
> > >> to
> > >> > be
> > >> > > enqueued.
> > >> > >
> > >> > > To solve this problem, it seems we would need to have a separate
> > port
> > >> > > dedicated to
> > >> > >
> > >> > > the controller, a dedicated processor thread, a dedicated
> controller
> > >> > > request queue,
> > >> > >
> > >> > > and pinning of one request handler thread for controller requests.
> > >> > >
> > >> > > Thanks,
> > >> > > Lucas
> > >> > >
> > >> > >
> > >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <becket.qin@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Personally I am not fond of the dequeue approach simply because
> it
> > >> is
> > >> > > > against the basic idea of isolating the controller plane and
> data
> > >> > plane.
> > >> > > > With a single dequeue, theoretically speaking the controller
> > >> requests
> > >> > can
> > >> > > > starve the clients requests. I would prefer the approach with a
> > >> > separate
> > >> > > > controller request queue and a dedicated controller request
> > handler
> > >> > > thread.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jiangjie (Becket) Qin
> > >> > > >
> > >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> > lucasatucla@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Sure, I can summarize the usage of correlation id. But before
> I
> > do
> > >> > > that,
> > >> > > > it
> > >> > > > > seems
> > >> > > > > the same out-of-order processing can also happen to Produce
> > >> requests
> > >> > > sent
> > >> > > > > by producers,
> > >> > > > > following the same example you described earlier.
> > >> > > > > If that's the case, I think this probably deserves a separate
> > doc
> > >> and
> > >> > > > > design independent of this KIP.
> > >> > > > >
> > >> > > > > Lucas
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > Hey Lucas,
> > >> > > > > >
> > >> > > > > > Could you update the KIP if you are confident with the
> > approach
> > >> > which
> > >> > > > > uses
> > >> > > > > > correlation id? The idea around correlation id is kind of
> > >> scattered
> > >> > > > > across
> > >> > > > > > multiple emails. It will be useful if other reviews can read
> > the
> > >> > KIP
> > >> > > to
> > >> > > > > > understand the latest proposal.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Dong
> > >> > > > > >
> > >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > >> > > > > > gharatmayuresh15@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > I like the idea of the dequeue implementation by Lucas.
> This
> > >> will
> > >> > > > help
> > >> > > > > us
> > >> > > > > > > avoid additional queue for controller and additional
> configs
> > >> in
> > >> > > > Kafka.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Mayuresh
> > >> > > > > > >
> > >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> > >> becket.qin@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Jun,
> > >> > > > > > > >
> > >> > > > > > > > The usage of correlation ID might still be useful to
> > address
> > >> > the
> > >> > > > > cases
> > >> > > > > > > > that the controller epoch and leader epoch check are not
> > >> > > sufficient
> > >> > > > > to
> > >> > > > > > > > guarantee correct behavior. For example, if the
> controller
> > >> > sends
> > >> > > a
> > >> > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest,
> and
> > >> the
> > >> > > > broker
> > >> > > > > > > > processes it in the reverse order, the replica may still
> > be
> > >> > > wrongly
> > >> > > > > > > > recreated, right?
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > >
> > >> > > > > > > > Jiangjie (Becket) Qin
> > >> > > > > > > >
> > >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <
> jun@confluent.io
> > >
> > >> > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > Hmm, since we already use controller epoch and leader
> > >> epoch
> > >> > for
> > >> > > > > > > properly
> > >> > > > > > > > > caching the latest partition state, do we really need
> > >> > > correlation
> > >> > > > > id
> > >> > > > > > > for
> > >> > > > > > > > > ordering the controller requests?
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > >
> > >> > > > > > > > > Jun
> > >> > > > > > > > >
> > >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > >> > > > becket.qin@gmail.com>
> > >> > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> Lucas and Mayuresh,
> > >> > > > > > > > >>
> > >> > > > > > > > >> Good idea. The correlation id should work.
> > >> > > > > > > > >>
> > >> > > > > > > > >> In the ControllerChannelManager, a request will be
> > resent
> > >> > > until
> > >> > > > a
> > >> > > > > > > > response
> > >> > > > > > > > >> is received. So if the controller to broker
> connection
> > >> > > > disconnects
> > >> > > > > > > after
> > >> > > > > > > > >> controller sends R1_a, but before the response of
> R1_a
> > is
> > >> > > > > received,
> > >> > > > > > a
> > >> > > > > > > > >> disconnection may cause the controller to resend
> R1_b.
> > >> i.e.
> > >> > > > until
> > >> > > > > R1
> > >> > > > > > > is
> > >> > > > > > > > >> acked, R2 won't be sent by the controller.
> > >> > > > > > > > >> This gives two guarantees:
> > >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have
> > been
> > >> > > > > processed
> > >> > > > > > at
> > >> > > > > > > > >> least once.
> > >> > > > > > > > >>
> > >> > > > > > > > >> So on the broker side, with a single thread
> controller
> > >> > request
> > >> > > > > > > handler,
> > >> > > > > > > > the
> > >> > > > > > > > >> logic should be:
> > >> > > > > > > > >> 1. Process what ever request seen in the controller
> > >> request
> > >> > > > queue
> > >> > > > > > > > >> 2. For the given epoch, drop request if its
> correlation
> > >> id
> > >> > is
> > >> > > > > > smaller
> > >> > > > > > > > than
> > >> > > > > > > > >> that of the last processed request.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Thanks,
> > >> > > > > > > > >>
> > >> > > > > > > > >> Jiangjie (Becket) Qin
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> > >> jun@confluent.io>
> > >> > > > > wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >>> I agree that there is no strong ordering when there
> > are
> > >> > more
> > >> > > > than
> > >> > > > > > one
> > >> > > > > > > > >>> socket connections. Currently, we rely on
> > >> controllerEpoch
> > >> > and
> > >> > > > > > > > leaderEpoch
> > >> > > > > > > > >>> to ensure that the receiving broker picks up the
> > latest
> > >> > state
> > >> > > > for
> > >> > > > > > > each
> > >> > > > > > > > >>> partition.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> One potential issue with the dequeue approach is
> that
> > if
> > >> > the
> > >> > > > > queue
> > >> > > > > > is
> > >> > > > > > > > >> full,
> > >> > > > > > > > >>> there is no guarantee that the controller requests
> > will
> > >> be
> > >> > > > > enqueued
> > >> > > > > > > > >>> quickly.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Thanks,
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Jun
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > >> > > > > > > > >>> gharatmayuresh15@gmail.com
> > >> > > > > > > > >>>> wrote:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> > >> > NetworkClient
> > >> > > > > > > > >> constructor.
> > >> > > > > > > > >>>> Since we reuse the same NetworkClient between
> > >> Controller
> > >> > and
> > >> > > > the
> > >> > > > > > > > >> broker,
> > >> > > > > > > > >>> a
> > >> > > > > > > > >>>> disconnection should not cause it to reset to 0, in
> > >> which
> > >> > > case
> > >> > > > > it
> > >> > > > > > > can
> > >> > > > > > > > >> be
> > >> > > > > > > > >>>> used to reject obsolete requests.
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> Thanks,
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> Mayuresh
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > >> > > > > lucasatucla@gmail.com
> > >> > > > > > >
> > >> > > > > > > > >>> wrote:
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>>> @Dong,
> > >> > > > > > > > >>>>> Great example and explanation, thanks!
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> @All
> > >> > > > > > > > >>>>> Regarding the example given by Dong, it seems even
> > if
> > >> we
> > >> > > use
> > >> > > > a
> > >> > > > > > > queue,
> > >> > > > > > > > >>>> and a
> > >> > > > > > > > >>>>> dedicated controller request handling thread,
> > >> > > > > > > > >>>>> the same result can still happen because R1_a will
> > be
> > >> > sent
> > >> > > on
> > >> > > > > one
> > >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a
> > different
> > >> > > > > connection,
> > >> > > > > > > > >>>>> and there is no ordering between different
> > >> connections on
> > >> > > the
> > >> > > > > > > broker
> > >> > > > > > > > >>>> side.
> > >> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it
> seems
> > >> > > > > correlation
> > >> > > > > > id
> > >> > > > > > > > >>>> within
> > >> > > > > > > > >>>>> the same NetworkClient object is monotonically
> > >> increasing
> > >> > > and
> > >> > > > > > never
> > >> > > > > > > > >>>> reset,
> > >> > > > > > > > >>>>> hence a broker can leverage that to properly
> reject
> > >> > > obsolete
> > >> > > > > > > > >> requests.
> > >> > > > > > > > >>>>> Thoughts?
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> Thanks,
> > >> > > > > > > > >>>>> Lucas
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat
> <
> > >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
> > >> > connection
> > >> > > > > > loss, I
> > >> > > > > > > > >>>> think.
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>> Thanks,
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>> Mayuresh
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat
> <
> > >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > >> > > > > > > > >>>>>> wrote:
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing
> can
> > >> > happen
> > >> > > > > with
> > >> > > > > > > > >>>> having 2
> > >> > > > > > > > >>>>>>> separate queues as well and it can even happen
> > >> today.
> > >> > > > > > > > >>>>>>> Can we use the correlationId in the request from
> > the
> > >> > > > > controller
> > >> > > > > > > > >> to
> > >> > > > > > > > >>>> the
> > >> > > > > > > > >>>>>>> broker to handle ordering ?
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> Mayuresh
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > >> > > > > > becket.qin@gmail.com
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>>> wrote:
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
> > >> controller
> > >> > > > > request
> > >> > > > > > > > >>>> handling
> > >> > > > > > > > >>>>>>>> thread would be a better isolation. It also
> > solves
> > >> the
> > >> > > > > > > > >> reordering
> > >> > > > > > > > >>>>> issue.
> > >> > > > > > > > >>>>>>>>
> > >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > >> > > > > > > > >> jjkoshy.w@gmail.com>
> > >> > > > > > > > >>>>>> wrote:
> > >> > > > > > > > >>>>>>>>
> > >> > > > > > > > >>>>>>>>> Good example. I think this scenario can occur
> in
> > >> the
> > >> > > > > current
> > >> > > > > > > > >>> code
> > >> > > > > > > > >>>> as
> > >> > > > > > > > >>>>>>>> well
> > >> > > > > > > > >>>>>>>>> but with even lower probability given that
> there
> > >> are
> > >> > > > other
> > >> > > > > > > > >>>>>>>> non-controller
> > >> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy
> though
> > >> and
> > >> > I
> > >> > > > > think
> > >> > > > > > a
> > >> > > > > > > > >>>> safer
> > >> > > > > > > > >>>>>>>>> approach would be separate queues and pinning
> > >> > > controller
> > >> > > > > > > > >> request
> > >> > > > > > > > >>>>>>>> handling
> > >> > > > > > > > >>>>>>>>> to one handler thread.
> > >> > > > > > > > >>>>>>>>>
> > >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > >> > > > > > > > >> lindong28@gmail.com
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> Hey Becket,
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> I think you are right that there may be
> > >> out-of-order
> > >> > > > > > > > >>> processing.
> > >> > > > > > > > >>>>>>>> However,
> > >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may
> also
> > >> > happen
> > >> > > > even
> > >> > > > > > > > >> if
> > >> > > > > > > > >>> we
> > >> > > > > > > > >>>>>> use a
> > >> > > > > > > > >>>>>>>>>> separate queue.
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> Here is the example:
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected
> > before
> > >> > > > > receiving
> > >> > > > > > > > >>>>>> response.
> > >> > > > > > > > >>>>>>>>> Then
> > >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now
> > >> stay
> > >> > in
> > >> > > > the
> > >> > > > > > > > >>>>> controller
> > >> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
> > >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue
> and
> > >> then
> > >> > > > > thread2
> > >> > > > > > > > >>>> takes
> > >> > > > > > > > >>>>>> R2
> > >> > > > > > > > >>>>>>>>> from
> > >> > > > > > > > >>>>>>>>>> the request queue almost at the same time.
> > >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel.
> > >> There is
> > >> > > > > chance
> > >> > > > > > > > >>> that
> > >> > > > > > > > >>>>>> R2's
> > >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for
> both
> > >> > > > approaches
> > >> > > > > > > > >> with
> > >> > > > > > > > >>>>> very
> > >> > > > > > > > >>>>>>>> low
> > >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add
> > the
> > >> > extra
> > >> > > > > > > > >> queue.
> > >> > > > > > > > >>>> What
> > >> > > > > > > > >>>>>> do
> > >> > > > > > > > >>>>>>>> you
> > >> > > > > > > > >>>>>>>>>> think?
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>> Dong
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > >> > > > > > > > >>>> becket.qin@gmail.com
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was
> > >> bright
> > >> > up
> > >> > > > some
> > >> > > > > > > > >>> time
> > >> > > > > > > > >>>>> ago
> > >> > > > > > > > >>>>>>>> when
> > >> > > > > > > > >>>>>>>>>> we
> > >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the
> > request.
> > >> The
> > >> > > > > > > > >> concern
> > >> > > > > > > > >>>> was
> > >> > > > > > > > >>>>>> that
> > >> > > > > > > > >>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
> > >> processed in
> > >> > > > > order.
> > >> > > > > > > > >>> If
> > >> > > > > > > > >>>> we
> > >> > > > > > > > >>>>>> can
> > >> > > > > > > > >>>>>>>>>> ensure
> > >> > > > > > > > >>>>>>>>>>> that there is one controller request in the
> > >> request
> > >> > > > > > > > >> channel,
> > >> > > > > > > > >>>> the
> > >> > > > > > > > >>>>>>>> order
> > >> > > > > > > > >>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are
> > more
> > >> > than
> > >> > > > one
> > >> > > > > > > > >>>>>> controller
> > >> > > > > > > > >>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller
> > request
> > >> > order
> > >> > > > may
> > >> > > > > > > > >>>> change
> > >> > > > > > > > >>>>>> and
> > >> > > > > > > > >>>>>>>>>> cause
> > >> > > > > > > > >>>>>>>>>>> problem. For example, think about the
> > following
> > >> > > > sequence:
> > >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1
> > to
> > >> > > broker
> > >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to
> > the
> > >> > head
> > >> > > > of
> > >> > > > > > > > >> the
> > >> > > > > > > > >>>>>> request
> > >> > > > > > > > >>>>>>>>>> queue.
> > >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed
> and
> > >> the
> > >> > > > > > > > >> controller
> > >> > > > > > > > >>>>>>>>> reconnected
> > >> > > > > > > > >>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>> the broker.
> > >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the
> broker
> > >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head
> > of
> > >> the
> > >> > > > > > > > >> request
> > >> > > > > > > > >>>>> queue.
> > >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed
> > >> before
> > >> > > R1
> > >> > > > is
> > >> > > > > > > > >>>>>> processed,
> > >> > > > > > > > >>>>>>>>>> which
> > >> > > > > > > > >>>>>>>>>>> may cause problem.
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy
> <
> > >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > >> > > > > > > > >>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to
> > be
> > >> a
> > >> > > > simpler
> > >> > > > > > > > >>>> less
> > >> > > > > > > > >>>>>>>>> invasive
> > >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> > >> Jun/Becket/others,
> > >> > > do
> > >> > > > > > > > >> you
> > >> > > > > > > > >>>> see
> > >> > > > > > > > >>>>>> any
> > >> > > > > > > > >>>>>>>>>>> pitfalls
> > >> > > > > > > > >>>>>>>>>>>> with this approach?
> > >> > > > > > > > >>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas
> Wang
> > <
> > >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > >> > > > > > > > >>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I
> > haven't
> > >> > > thought
> > >> > > > > > > > >>>>> before.
> > >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand
> pretty
> > >> > well,
> > >> > > > and
> > >> > > > > > > > >>>> also
> > >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric
> > and
> > >> > > > capacity
> > >> > > > > > > > >>>>> config
> > >> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact,
> > if
> > >> we
> > >> > > were
> > >> > > > > > > > >> to
> > >> > > > > > > > >>>>> adopt
> > >> > > > > > > > >>>>>>>>>>>>> this design, there is no public interface
> > >> change,
> > >> > > and
> > >> > > > > > > > >> we
> > >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can
> > readily
> > >> > > > satisfy
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>>>>>> requirement
> > >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also
> allowing
> > >> > > inserting
> > >> > > > > > > > >> at
> > >> > > > > > > > >>>>> both
> > >> > > > > > > > >>>>>>>> ends.
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is
> tied
> > to
> > >> > the
> > >> > > > > > > > >>>>> coincidence
> > >> > > > > > > > >>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and there
> are
> > >> two
> > >> > > ends
> > >> > > > > > > > >>> to a
> > >> > > > > > > > >>>>>>>> deque.
> > >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it
> seems
> > >> the
> > >> > > > > > > > >> network
> > >> > > > > > > > >>>>> layer
> > >> > > > > > > > >>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer
> logic,
> > >> e.g.
> > >> > > if
> > >> > > > > > > > >> we
> > >> > > > > > > > >>>> were
> > >> > > > > > > > >>>>>> to
> > >> > > > > > > > >>>>>>>> add
> > >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future for
> > some
> > >> > > > reason,
> > >> > > > > > > > >>> we
> > >> > > > > > > > >>>>>> would
> > >> > > > > > > > >>>>>>>>>>> probably
> > >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of separate
> > >> queues,
> > >> > > one
> > >> > > > > > > > >>> for
> > >> > > > > > > > >>>>> each
> > >> > > > > > > > >>>>>>>>>> priority
> > >> > > > > > > > >>>>>>>>>>>>> level.
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and
> > lean
> > >> > > toward
> > >> > > > > > > > >>> your
> > >> > > > > > > > >>>>>>>> suggested
> > >> > > > > > > > >>>>>>>>>>>>> approach.
> > >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> @Becket,
> > >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new
> design,
> > >> I'm
> > >> > > > > > > > >>> answering
> > >> > > > > > > > >>>>>> your
> > >> > > > > > > > >>>>>>>>>>> question
> > >> > > > > > > > >>>>>>>>>>>>> only in the context
> > >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
> > >> > suggestion
> > >> > > > > > > > >> makes
> > >> > > > > > > > >>>>>> sense,
> > >> > > > > > > > >>>>>>>> and
> > >> > > > > > > > >>>>>>>>>> I'm
> > >> > > > > > > > >>>>>>>>>>>> ok
> > >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > >> > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20
> > being
> > >> > > > > > > > >> sufficient
> > >> > > > > > > > >>>>>> enough.
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>> Lucas
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh
> > >> Gharat
> > >> > <
> > >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
> > >> prioritize
> > >> > > the
> > >> > > > > > > > >>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request
> > queue
> > >> > to a
> > >> > > > > > > > >>>>> dequeue,
> > >> > > > > > > > >>>>>>>> where
> > >> > > > > > > > >>>>>>>>>> you
> > >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests
> (produce,
> > >> > > > > > > > >>>> consume,..etc)
> > >> > > > > > > > >>>>>> to
> > >> > > > > > > > >>>>>>>> the
> > >> > > > > > > > >>>>>>>>>> end
> > >> > > > > > > > >>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller
> > request,
> > >> > you
> > >> > > > > > > > >>> insert
> > >> > > > > > > > >>>>> it
> > >> > > > > > > > >>>>>> to
> > >> > > > > > > > >>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>> head
> > >> > > > > > > > >>>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the
> controller
> > >> > > request
> > >> > > > > > > > >>> will
> > >> > > > > > > > >>>>> be
> > >> > > > > > > > >>>>>>>> given
> > >> > > > > > > > >>>>>>>>>>>> higher
> > >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request from
> > the
> > >> > > socket
> > >> > > > > > > > >>> and
> > >> > > > > > > > >>>>> mute
> > >> > > > > > > > >>>>>>>> it
> > >> > > > > > > > >>>>>>>>> and
> > >> > > > > > > > >>>>>>>>>>>> only
> > >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request,
> this
> > >> would
> > >> > > > > > > > >>> ensure
> > >> > > > > > > > >>>>> that
> > >> > > > > > > > >>>>>>>> we
> > >> > > > > > > > >>>>>>>>>> don't
> > >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the
> second
> > >> queue
> > >> > > and
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>>>>> additional
> > >> > > > > > > > >>>>>>>>>>>>> config
> > >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket
> Qin
> > <
> > >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > >> > > > > > > > >>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > >> > > > > > > > >>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I
> agree
> > >> the
> > >> > > > > > > > >>> current
> > >> > > > > > > > >>>>>> design
> > >> > > > > > > > >>>>>>>>>> makes
> > >> > > > > > > > >>>>>>>>>>>>> sense.
> > >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new
> > config
> > >> > for
> > >> > > > > > > > >> the
> > >> > > > > > > > >>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>> queue
> > >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think
> of a
> > >> case
> > >> > > in
> > >> > > > > > > > >>>> which
> > >> > > > > > > > >>>>>>>> users
> > >> > > > > > > > >>>>>>>>>>> would
> > >> > > > > > > > >>>>>>>>>>>>>> change
> > >> > > > > > > > >>>>>>>>>>>>>>> it.
> > >> > > > > > > > >>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >> > > > > > > > >>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket
> > Qin
> > >> <
> > >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > >> > > > > > > > >>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > >> > > > > > > > >>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to
> > >> "do we
> > >> > > > > > > > >>>> expect
> > >> > > > > > > > >>>>>>>> user to
> > >> > > > > > > > >>>>>>>>>>> ever
> > >> > > > > > > > >>>>>>>>>>>>>> change
> > >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"?
> > If
> > >> we
> > >> > > > > > > > >>> agree
> > >> > > > > > > > >>>>> that
> > >> > > > > > > > >>>>>>>> 20
> > >> > > > > > > > >>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>> already
> > >> > > > > > > > >>>>>>>>>>>>>> a
> > >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do
> > not
> > >> > > > > > > > >> expect
> > >> > > > > > > > >>>> user
> > >> > > > > > > > >>>>>> to
> > >> > > > > > > > >>>>>>>>>> change
> > >> > > > > > > > >>>>>>>>>>>> it,
> > >> > > > > > > > >>>>>>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>>>> it
> > >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a
> > config?
> > >> > > > > > > > >>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >> > > > > > > > >>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas
> > >> Wang <
> > >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > >> > > > > > > > >>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are
> right
> > >> that
> > >> > > > > > > > >>>>> normally
> > >> > > > > > > > >>>>>>>> there
> > >> > > > > > > > >>>>>>>>>>>> should
> > >> > > > > > > > >>>>>>>>>>>>> be
> > >> > > > > > > > >>>>>>>>>>>>>>>>> just
> > >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of
> > muting,
> > >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there
> > would
> > >> be
> > >> > > > > > > > >> many
> > >> > > > > > > > >>>>>>>> enqueued
> > >> > > > > > > > >>>>>>>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm
> > not
> > >> > sure
> > >> > > > > > > > >>>> which
> > >> > > > > > > > >>>>>> part
> > >> > > > > > > > >>>>>>>>>>> conveys
> > >> > > > > > > > >>>>>>>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it
> > out
> > >> > the
> > >> > > > > > > > >>>>> section.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in
> normal
> > >> > > > > > > > >>>> conditions,
> > >> > > > > > > > >>>>>> the
> > >> > > > > > > > >>>>>>>>>> current
> > >> > > > > > > > >>>>>>>>>>>>>> design
> > >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers
> > >> > running
> > >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't
> have
> > >> the
> > >> > > > > > > > >>>>> controller
> > >> > > > > > > > >>>>>>>>> queue
> > >> > > > > > > > >>>>>>>>>>>>> capacity
> > >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to
> > be
> > >> 1,
> > >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > >> > > > > > > > >> different
> > >> > > > > > > > >>>>>>>> controllers
> > >> > > > > > > > >>>>>>>>>>> will
> > >> > > > > > > > >>>>>>>>>>>> be
> > >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome
> times,
> > >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On
> > the
> > >> > > > > > > > >> other
> > >> > > > > > > > >>>>> hand,
> > >> > > > > > > > >>>>>>>>> adding
> > >> > > > > > > > >>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>> extra
> > >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20,
> > >> guards
> > >> > us
> > >> > > > > > > > >>> from
> > >> > > > > > > > >>>>>>>> issues
> > >> > > > > > > > >>>>>>>>> in
> > >> > > > > > > > >>>>>>>>>>>> those
> > >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't
> > >> much
> > >> > > > > > > > >>>> downside
> > >> > > > > > > > >>>>> of
> > >> > > > > > > > >>>>>>>>> adding
> > >> > > > > > > > >>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>> extra
> > >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an
> obsolete
> > >> > > > > > > > >>> statement
> > >> > > > > > > > >>>>>> based
> > >> > > > > > > > >>>>>>>> on
> > >> > > > > > > > >>>>>>>>> a
> > >> > > > > > > > >>>>>>>>>>>>> previous
> > >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in
> the
> > >> KIP.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM,
> > Mayuresh
> > >> > > > > > > > >>> Gharat <
> > >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you
> think
> > >> "The
> > >> > > > > > > > >>>> memory
> > >> > > > > > > > >>>>>>>>>>> consumption
> > >> > > > > > > > >>>>>>>>>>>>> can
> > >> > > > > > > > >>>>>>>>>>>>>>> rise
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued
> > requests
> > >> > can
> > >> > > > > > > > >>> go
> > >> > > > > > > > >>>> up
> > >> > > > > > > > >>>>>> to
> > >> > > > > > > > >>>>>>>> 2x"
> > >> > > > > > > > >>>>>>>>>> in
> > >> > > > > > > > >>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>> impact
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > >> > > > > > > > >> controller
> > >> > > > > > > > >>>> to a
> > >> > > > > > > > >>>>>>>> Broker
> > >> > > > > > > > >>>>>>>>>> are
> > >> > > > > > > > >>>>>>>>>>>> not
> > >> > > > > > > > >>>>>>>>>>>>>>> high
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM
> Becket
> > >> Qin <
> > >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas.
> Separating
> > >> the
> > >> > > > > > > > >>>> control
> > >> > > > > > > > >>>>>>>> plane
> > >> > > > > > > > >>>>>>>>>> from
> > >> > > > > > > > >>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>> data
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > >> > > > > > > > >> controller
> > >> > > > > > > > >>>>>> request
> > >> > > > > > > > >>>>>>>>> queue
> > >> > > > > > > > >>>>>>>>>>> may
> > >> > > > > > > > >>>>>>>>>>>>>> have
> > >> > > > > > > > >>>>>>>>>>>>>>>>> many
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a
> common
> > >> case?
> > >> > > > > > > > >>> The
> > >> > > > > > > > >>>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>>>>> requests
> > >> > > > > > > > >>>>>>>>>>>>>>>>> still
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > >> > > > > > > > >>> SocketServer
> > >> > > > > > > > >>>>>> will
> > >> > > > > > > > >>>>>>>>> mute
> > >> > > > > > > > >>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>> channel
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the
> > >> request
> > >> > > > > > > > >>>>> channel.
> > >> > > > > > > > >>>>>>>> So
> > >> > > > > > > > >>>>>>>>>>>> assuming
> > >> > > > > > > > >>>>>>>>>>>>>>> there
> > >> > > > > > > > >>>>>>>>>>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between
> controller
> > >> and
> > >> > > > > > > > >>> each
> > >> > > > > > > > >>>>>>>> broker,
> > >> > > > > > > > >>>>>>>>> on
> > >> > > > > > > > >>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>> broker
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> side,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller
> > >> request
> > >> > > > > > > > >>> in
> > >> > > > > > > > >>>>> the
> > >> > > > > > > > >>>>>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> queue
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the
> > case,
> > >> do
> > >> > > > > > > > >> we
> > >> > > > > > > > >>>>> need
> > >> > > > > > > > >>>>>> a
> > >> > > > > > > > >>>>>>>>>>> separate
> > >> > > > > > > > >>>>>>>>>>>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The
> > >> default
> > >> > > > > > > > >>>> value
> > >> > > > > > > > >>>>> 20
> > >> > > > > > > > >>>>>>>>> means
> > >> > > > > > > > >>>>>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>> we
> > >> > > > > > > > >>>>>>>>>>>>>>>>> expect
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to
> > >> happen
> > >> > > > > > > > >>> in a
> > >> > > > > > > > >>>>>> short
> > >> > > > > > > > >>>>>>>>>> period
> > >> > > > > > > > >>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>> time.
> > >> > > > > > > > >>>>>>>>>>>>>>>>> I
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> am
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should
> > increase
> > >> > > > > > > > >> the
> > >> > > > > > > > >>>>>>>> controller
> > >> > > > > > > > >>>>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>>>>>> queue
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it
> > >> seems
> > >> > > > > > > > >>>>>> indicating
> > >> > > > > > > > >>>>>>>>>>> something
> > >> > > > > > > > >>>>>>>>>>>>>> very
> > >> > > > > > > > >>>>>>>>>>>>>>>>> wrong
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM,
> Dong
> > >> Lin <
> > >> > > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > >> > > > > > > > >>> intuitive.
> > >> > > > > > > > >>>> It
> > >> > > > > > > > >>>>>>>> will
> > >> > > > > > > > >>>>>>>>> be
> > >> > > > > > > > >>>>>>>>>>> good
> > >> > > > > > > > >>>>>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>>>>>> learn
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other
> > >> reviewers.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM,
> > Lucas
> > >> > > > > > > > >>> Wang <
> > >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation
> section
> > of
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>> KIP
> > >> > > > > > > > >>>>>> by
> > >> > > > > > > > >>>>>>>>>>>> explaining
> > >> > > > > > > > >>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> cases
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know
> > your
> > >> > > > > > > > >>>>>> comments.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM,
> > Lucas
> > >> > > > > > > > >>> Wang
> > >> > > > > > > > >>>> <
> > >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow
> > is
> > >> > > > > > > > >>>> merely
> > >> > > > > > > > >>>>>>>> for me
> > >> > > > > > > > >>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>> easily
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> construct
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> a
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce
> requests.
> > >> > > > > > > > >> In
> > >> > > > > > > > >>>>>>>> production,
> > >> > > > > > > > >>>>>>>>>>> other
> > >> > > > > > > > >>>>>>>>>>>>>> than
> > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> disk
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be
> caused
> > >> > > > > > > > >> by
> > >> > > > > > > > >>>> high
> > >> > > > > > > > >>>>>>>>> produce
> > >> > > > > > > > >>>>>>>>>>> QPS.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to
> > kill
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>>>> broker
> > >> > > > > > > > >>>>>>>>> and
> > >> > > > > > > > >>>>>>>>>>>>> that's
> > >> > > > > > > > >>>>>>>>>>>>>>> when
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> this
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous
> > question
> > >> > > > > > > > >>>> about
> > >> > > > > > > > >>>>>> each
> > >> > > > > > > > >>>>>>>>>>>>>> ProduceRequest
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> covering
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a
> > LeaderAndIsr
> > >> > > > > > > > >>>>> request
> > >> > > > > > > > >>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>> enqueued
> > >> > > > > > > > >>>>>>>>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> tries
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > >> > > > > > > > >> broker0,
> > >> > > > > > > > >>>>> from
> > >> > > > > > > > >>>>>>>>> leader
> > >> > > > > > > > >>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>>>> follower
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > >> > > > > > > > >>>> *test-0*.
> > >> > > > > > > > >>>>>> For
> > >> > > > > > > > >>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>> sake
> > >> > > > > > > > >>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> argument,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other
> > brokers,
> > >> > > > > > > > >>> say
> > >> > > > > > > > >>>>>>>> broker1,
> > >> > > > > > > > >>>>>>>>>> have
> > >> > > > > > > > >>>>>>>>>>>>>>> *stopped*
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce
> requests
> > >> > > > > > > > >>> have
> > >> > > > > > > > >>>>>> acks =
> > >> > > > > > > > >>>>>>>>> -1
> > >> > > > > > > > >>>>>>>>>>>> (ALL)
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > >> > > > > > > > >>>> ProduceRequests
> > >> > > > > > > > >>>>>>>> ahead
> > >> > > > > > > > >>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> will
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> be
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > >> > > > > > > > >>>>> replicated
> > >> > > > > > > > >>>>>>>> to
> > >> > > > > > > > >>>>>>>>>> other
> > >> > > > > > > > >>>>>>>>>>>>>> brokers
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> (because
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they
> > will
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when
> the
> > >> > > > > > > > >>>>>>>> LeaderAndISR
> > >> > > > > > > > >>>>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> processed
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> or
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > >> > > > > > > > >>>>> immediately
> > >> > > > > > > > >>>>>>>>>>> transition
> > >> > > > > > > > >>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> partition
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker
> > sees
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>>>>>> replication
> > >> > > > > > > > >>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> remaining
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> 19
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a
> response
> > >> > > > > > > > >>>>> indicating
> > >> > > > > > > > >>>>>>>> that
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader
> > for
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>>>>> "test-0".
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > >> > > > > > > > >> between
> > >> > > > > > > > >>>> 1.1
> > >> > > > > > > > >>>>>> and
> > >> > > > > > > > >>>>>>>>> 1.2,
> > >> > > > > > > > >>>>>>>>>>>> let's
> > >> > > > > > > > >>>>>>>>>>>>>> say
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> there
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> are
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > >> > > > > > > > >>>>>> LeaderAndISR,
> > >> > > > > > > > >>>>>>>>> and
> > >> > > > > > > > >>>>>>>>>>>> there
> > >> > > > > > > > >>>>>>>>>>>>>> are
> > >> > > > > > > > >>>>>>>>>>>>>>> 8
> > >> > > > > > > > >>>>>>>>>>>>>>>>> io
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > >> > > > > > > > >>>>>> approximately
> > >> > > > > > > > >>>>>>>>> 3000
> > >> > > > > > > > >>>>>>>>>>>>> produce
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> requests.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Now
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread
> > that
> > >> > > > > > > > >>>>> finally
> > >> > > > > > > > >>>>>>>>>> processed
> > >> > > > > > > > >>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests,
> if
> > >> > > > > > > > >> we
> > >> > > > > > > > >>>>> model
> > >> > > > > > > > >>>>>>>> the
> > >> > > > > > > > >>>>>>>>>> time
> > >> > > > > > > > >>>>>>>>>>>> when
> > >> > > > > > > > >>>>>>>>>>>>>>> their
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > >> > > > > > > > >>>> ...t2999,
> > >> > > > > > > > >>>>>> and
> > >> > > > > > > > >>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> request
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> is
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st
> produce
> > >> > > > > > > > >>>> request
> > >> > > > > > > > >>>>>>>> would
> > >> > > > > > > > >>>>>>>>>> have
> > >> > > > > > > > >>>>>>>>>>>>>> waited
> > >> > > > > > > > >>>>>>>>>>>>>>> an
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> extra
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory,
> > the
> > >> > > > > > > > >>> 2nd
> > >> > > > > > > > >>>>> an
> > >> > > > > > > > >>>>>>>> extra
> > >> > > > > > > > >>>>>>>>>>> time
> > >> > > > > > > > >>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>>>> t3000 -
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> t1,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > >> > > > > > > > >>>> difference
> > >> > > > > > > > >>>>> is
> > >> > > > > > > > >>>>>>>>> bigger
> > >> > > > > > > > >>>>>>>>>>> for
> > >> > > > > > > > >>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> earlier
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the
> later
> > >> > > > > > > > >>> ones.
> > >> > > > > > > > >>>>> For
> > >> > > > > > > > >>>>>>>> the
> > >> > > > > > > > >>>>>>>>>> same
> > >> > > > > > > > >>>>>>>>>>>>>> reason,
> > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the
> > bigger
> > >> > > > > > > > >>>>> benefit
> > >> > > > > > > > >>>>>>>> we
> > >> > > > > > > > >>>>>>>>> get
> > >> > > > > > > > >>>>>>>>>>>>> (capped
> > >> > > > > > > > >>>>>>>>>>>>>>> by
> > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce
> requests
> > >> > > > > > > > >>> have
> > >> > > > > > > > >>>>>>>> acks=0 or
> > >> > > > > > > > >>>>>>>>>>>> acks=1
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > >> > > > > > > > >> differences
> > >> > > > > > > > >>> in
> > >> > > > > > > > >>>>>> this
> > >> > > > > > > > >>>>>>>>> case,
> > >> > > > > > > > >>>>>>>>>>> but
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the
> records
> > >> > > > > > > > >> of
> > >> > > > > > > > >>>>>>>> partition
> > >> > > > > > > > >>>>>>>>>>> test-0
> > >> > > > > > > > >>>>>>>>>>>> in
> > >> > > > > > > > >>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > >> > > > > > > > >>> LeaderAndISR
> > >> > > > > > > > >>>>>> will
> > >> > > > > > > > >>>>>>>> be
> > >> > > > > > > > >>>>>>>>>>>> appended
> > >> > > > > > > > >>>>>>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> local
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be
> truncated
> > >> > > > > > > > >>> after
> > >> > > > > > > > >>>>>>>>> processing
> > >> > > > > > > > >>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial
> definition
> > >> > > > > > > > >> of
> > >> > > > > > > > >>>> data
> > >> > > > > > > > >>>>>>>> loss
> > >> > > > > > > > >>>>>>>>> in
> > >> > > > > > > > >>>>>>>>>>>> terms
> > >> > > > > > > > >>>>>>>>>>>>> of
> > >> > > > > > > > >>>>>>>>>>>>>>>>>> messages
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can
> mitigate
> > >> > > > > > > > >>> the
> > >> > > > > > > > >>>>>> effect
> > >> > > > > > > > >>>>>>>>>> since
> > >> > > > > > > > >>>>>>>>>>> if
> > >> > > > > > > > >>>>>>>>>>>>> the
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > >> > > > > > > > >> response
> > >> > > > > > > > >>> to
> > >> > > > > > > > >>>>>>>>> producers
> > >> > > > > > > > >>>>>>>>>>> will
> > >> > > > > > > > >>>>>>>>>>>>>> have
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > >> > > > > > > > >>> error,
> > >> > > > > > > > >>>>>>>> causing
> > >> > > > > > > > >>>>>>>>>>>> producers
> > >> > > > > > > > >>>>>>>>>>>>>> to
> > >> > > > > > > > >>>>>>>>>>>>>>>>> retry
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the
> > benefit
> > >> > > > > > > > >>> for
> > >> > > > > > > > >>>>>>>> reducing
> > >> >
> > >>
> > >>
> > >> --
> > >> -Regards,
> > >> Mayuresh R. Gharat
> > >> (862) 250-7125
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

Thanks for the update.

The current KIP propose new broker configs "listeners.for.controller" and
"advertised.listeners.for.controller". This is going to be a big change
since listeners are among the most important configs that every user needs
to change. According to the rejected alternative section, it seems that the
reason to add these two configs is to improve performance when the data
request queue is full rather than for correctness. It should be a very rare
scenario and I am not sure we should add configs for all users just to
improve the performance in such rare scenario.

Also, if the new design is based on the issues which are discovered in the
recent discussion, e.g. out of order processing if we don't use a dedicated
thread for controller request, it may be useful to explain the problem in
the motivation section.

Thanks,
Dong

On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang <lu...@gmail.com> wrote:

> A kind reminder for review of this KIP.
>
> Thank you very much!
> Lucas
>
> On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I've updated the KIP by adding the dedicated endpoints for controller
> > connections,
> > and pinning threads for controller requests.
> > Also I've updated the title of this KIP. Please take a look and let me
> > know your feedback.
> >
> > Thanks a lot for your time!
> > Lucas
> >
> > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com> wrote:
> >
> >> Hi Lucas,
> >> I agree, if we want to go forward with a separate controller plane and
> >> data
> >> plane and completely isolate them, having a separate port for controller
> >> with a separate Acceptor and a Processor sounds ideal to me.
> >>
> >> Thanks,
> >>
> >> Mayuresh
> >>
> >>
> >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com>
> wrote:
> >>
> >> > Hi Lucas,
> >> >
> >> > Yes, I agree that a dedicated end to end control flow would be ideal.
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com>
> >> wrote:
> >> >
> >> > > Thanks for the comment, Becket.
> >> > > So far, we've been trying to avoid making any request handler thread
> >> > > special.
> >> > > But if we were to follow that path in order to make the two planes
> >> more
> >> > > isolated,
> >> > > what do you think about also having a dedicated processor thread,
> >> > > and dedicated port for the controller?
> >> > >
> >> > > Today one processor thread can handle multiple connections, let's
> say
> >> 100
> >> > > connections
> >> > >
> >> > > represented by connection0, ... connection99, among which
> >> connection0-98
> >> > > are from clients, while connection99 is from
> >> > >
> >> > > the controller. Further let's say after one selector polling, there
> >> are
> >> > > incoming requests on all connections.
> >> > >
> >> > > When the request queue is full, (either the data request being full
> in
> >> > the
> >> > > two queue design, or
> >> > >
> >> > > the one single queue being full in the deque design), the processor
> >> > thread
> >> > > will be blocked first
> >> > >
> >> > > when trying to enqueue the data request from connection0, then
> >> possibly
> >> > > blocked for the data request
> >> > >
> >> > > from connection1, ... etc even though the controller request is
> ready
> >> to
> >> > be
> >> > > enqueued.
> >> > >
> >> > > To solve this problem, it seems we would need to have a separate
> port
> >> > > dedicated to
> >> > >
> >> > > the controller, a dedicated processor thread, a dedicated controller
> >> > > request queue,
> >> > >
> >> > > and pinning of one request handler thread for controller requests.
> >> > >
> >> > > Thanks,
> >> > > Lucas
> >> > >
> >> > >
> >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Personally I am not fond of the dequeue approach simply because it
> >> is
> >> > > > against the basic idea of isolating the controller plane and data
> >> > plane.
> >> > > > With a single dequeue, theoretically speaking the controller
> >> requests
> >> > can
> >> > > > starve the clients requests. I would prefer the approach with a
> >> > separate
> >> > > > controller request queue and a dedicated controller request
> handler
> >> > > thread.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> lucasatucla@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Sure, I can summarize the usage of correlation id. But before I
> do
> >> > > that,
> >> > > > it
> >> > > > > seems
> >> > > > > the same out-of-order processing can also happen to Produce
> >> requests
> >> > > sent
> >> > > > > by producers,
> >> > > > > following the same example you described earlier.
> >> > > > > If that's the case, I think this probably deserves a separate
> doc
> >> and
> >> > > > > design independent of this KIP.
> >> > > > >
> >> > > > > Lucas
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <lindong28@gmail.com
> >
> >> > > wrote:
> >> > > > >
> >> > > > > > Hey Lucas,
> >> > > > > >
> >> > > > > > Could you update the KIP if you are confident with the
> approach
> >> > which
> >> > > > > uses
> >> > > > > > correlation id? The idea around correlation id is kind of
> >> scattered
> >> > > > > across
> >> > > > > > multiple emails. It will be useful if other reviews can read
> the
> >> > KIP
> >> > > to
> >> > > > > > understand the latest proposal.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Dong
> >> > > > > >
> >> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> >> > > > > > gharatmayuresh15@gmail.com> wrote:
> >> > > > > >
> >> > > > > > > I like the idea of the dequeue implementation by Lucas. This
> >> will
> >> > > > help
> >> > > > > us
> >> > > > > > > avoid additional queue for controller and additional configs
> >> in
> >> > > > Kafka.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Mayuresh
> >> > > > > > >
> >> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> >> becket.qin@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Jun,
> >> > > > > > > >
> >> > > > > > > > The usage of correlation ID might still be useful to
> address
> >> > the
> >> > > > > cases
> >> > > > > > > > that the controller epoch and leader epoch check are not
> >> > > sufficient
> >> > > > > to
> >> > > > > > > > guarantee correct behavior. For example, if the controller
> >> > sends
> >> > > a
> >> > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and
> >> the
> >> > > > broker
> >> > > > > > > > processes it in the reverse order, the replica may still
> be
> >> > > wrongly
> >> > > > > > > > recreated, right?
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > >
> >> > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > >
> >> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <jun@confluent.io
> >
> >> > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > Hmm, since we already use controller epoch and leader
> >> epoch
> >> > for
> >> > > > > > > properly
> >> > > > > > > > > caching the latest partition state, do we really need
> >> > > correlation
> >> > > > > id
> >> > > > > > > for
> >> > > > > > > > > ordering the controller requests?
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >
> >> > > > > > > > > Jun
> >> > > > > > > > >
> >> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> >> > > > becket.qin@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> Lucas and Mayuresh,
> >> > > > > > > > >>
> >> > > > > > > > >> Good idea. The correlation id should work.
> >> > > > > > > > >>
> >> > > > > > > > >> In the ControllerChannelManager, a request will be
> resent
> >> > > until
> >> > > > a
> >> > > > > > > > response
> >> > > > > > > > >> is received. So if the controller to broker connection
> >> > > > disconnects
> >> > > > > > > after
> >> > > > > > > > >> controller sends R1_a, but before the response of R1_a
> is
> >> > > > > received,
> >> > > > > > a
> >> > > > > > > > >> disconnection may cause the controller to resend R1_b.
> >> i.e.
> >> > > > until
> >> > > > > R1
> >> > > > > > > is
> >> > > > > > > > >> acked, R2 won't be sent by the controller.
> >> > > > > > > > >> This gives two guarantees:
> >> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> >> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have
> been
> >> > > > > processed
> >> > > > > > at
> >> > > > > > > > >> least once.
> >> > > > > > > > >>
> >> > > > > > > > >> So on the broker side, with a single thread controller
> >> > request
> >> > > > > > > handler,
> >> > > > > > > > the
> >> > > > > > > > >> logic should be:
> >> > > > > > > > >> 1. Process what ever request seen in the controller
> >> request
> >> > > > queue
> >> > > > > > > > >> 2. For the given epoch, drop request if its correlation
> >> id
> >> > is
> >> > > > > > smaller
> >> > > > > > > > than
> >> > > > > > > > >> that of the last processed request.
> >> > > > > > > > >>
> >> > > > > > > > >> Thanks,
> >> > > > > > > > >>
> >> > > > > > > > >> Jiangjie (Becket) Qin
> >> > > > > > > > >>
> >> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> >> jun@confluent.io>
> >> > > > > wrote:
> >> > > > > > > > >>
> >> > > > > > > > >>> I agree that there is no strong ordering when there
> are
> >> > more
> >> > > > than
> >> > > > > > one
> >> > > > > > > > >>> socket connections. Currently, we rely on
> >> controllerEpoch
> >> > and
> >> > > > > > > > leaderEpoch
> >> > > > > > > > >>> to ensure that the receiving broker picks up the
> latest
> >> > state
> >> > > > for
> >> > > > > > > each
> >> > > > > > > > >>> partition.
> >> > > > > > > > >>>
> >> > > > > > > > >>> One potential issue with the dequeue approach is that
> if
> >> > the
> >> > > > > queue
> >> > > > > > is
> >> > > > > > > > >> full,
> >> > > > > > > > >>> there is no guarantee that the controller requests
> will
> >> be
> >> > > > > enqueued
> >> > > > > > > > >>> quickly.
> >> > > > > > > > >>>
> >> > > > > > > > >>> Thanks,
> >> > > > > > > > >>>
> >> > > > > > > > >>> Jun
> >> > > > > > > > >>>
> >> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> >> > > > > > > > >>> gharatmayuresh15@gmail.com
> >> > > > > > > > >>>> wrote:
> >> > > > > > > > >>>
> >> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> >> > NetworkClient
> >> > > > > > > > >> constructor.
> >> > > > > > > > >>>> Since we reuse the same NetworkClient between
> >> Controller
> >> > and
> >> > > > the
> >> > > > > > > > >> broker,
> >> > > > > > > > >>> a
> >> > > > > > > > >>>> disconnection should not cause it to reset to 0, in
> >> which
> >> > > case
> >> > > > > it
> >> > > > > > > can
> >> > > > > > > > >> be
> >> > > > > > > > >>>> used to reject obsolete requests.
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> Thanks,
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> Mayuresh
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> >> > > > > lucasatucla@gmail.com
> >> > > > > > >
> >> > > > > > > > >>> wrote:
> >> > > > > > > > >>>>
> >> > > > > > > > >>>>> @Dong,
> >> > > > > > > > >>>>> Great example and explanation, thanks!
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> @All
> >> > > > > > > > >>>>> Regarding the example given by Dong, it seems even
> if
> >> we
> >> > > use
> >> > > > a
> >> > > > > > > queue,
> >> > > > > > > > >>>> and a
> >> > > > > > > > >>>>> dedicated controller request handling thread,
> >> > > > > > > > >>>>> the same result can still happen because R1_a will
> be
> >> > sent
> >> > > on
> >> > > > > one
> >> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a
> different
> >> > > > > connection,
> >> > > > > > > > >>>>> and there is no ordering between different
> >> connections on
> >> > > the
> >> > > > > > > broker
> >> > > > > > > > >>>> side.
> >> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it seems
> >> > > > > correlation
> >> > > > > > id
> >> > > > > > > > >>>> within
> >> > > > > > > > >>>>> the same NetworkClient object is monotonically
> >> increasing
> >> > > and
> >> > > > > > never
> >> > > > > > > > >>>> reset,
> >> > > > > > > > >>>>> hence a broker can leverage that to properly reject
> >> > > obsolete
> >> > > > > > > > >> requests.
> >> > > > > > > > >>>>> Thoughts?
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> Thanks,
> >> > > > > > > > >>>>> Lucas
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> >> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
> >> > connection
> >> > > > > > loss, I
> >> > > > > > > > >>>> think.
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>> Thanks,
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>> Mayuresh
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> >> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> >> > > > > > > > >>>>>> wrote:
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing can
> >> > happen
> >> > > > > with
> >> > > > > > > > >>>> having 2
> >> > > > > > > > >>>>>>> separate queues as well and it can even happen
> >> today.
> >> > > > > > > > >>>>>>> Can we use the correlationId in the request from
> the
> >> > > > > controller
> >> > > > > > > > >> to
> >> > > > > > > > >>>> the
> >> > > > > > > > >>>>>>> broker to handle ordering ?
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Thanks,
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Mayuresh
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> >> > > > > > becket.qin@gmail.com
> >> > > > > > > > >>>
> >> > > > > > > > >>>>> wrote:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
> >> controller
> >> > > > > request
> >> > > > > > > > >>>> handling
> >> > > > > > > > >>>>>>>> thread would be a better isolation. It also
> solves
> >> the
> >> > > > > > > > >> reordering
> >> > > > > > > > >>>>> issue.
> >> > > > > > > > >>>>>>>>
> >> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> >> > > > > > > > >> jjkoshy.w@gmail.com>
> >> > > > > > > > >>>>>> wrote:
> >> > > > > > > > >>>>>>>>
> >> > > > > > > > >>>>>>>>> Good example. I think this scenario can occur in
> >> the
> >> > > > > current
> >> > > > > > > > >>> code
> >> > > > > > > > >>>> as
> >> > > > > > > > >>>>>>>> well
> >> > > > > > > > >>>>>>>>> but with even lower probability given that there
> >> are
> >> > > > other
> >> > > > > > > > >>>>>>>> non-controller
> >> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy though
> >> and
> >> > I
> >> > > > > think
> >> > > > > > a
> >> > > > > > > > >>>> safer
> >> > > > > > > > >>>>>>>>> approach would be separate queues and pinning
> >> > > controller
> >> > > > > > > > >> request
> >> > > > > > > > >>>>>>>> handling
> >> > > > > > > > >>>>>>>>> to one handler thread.
> >> > > > > > > > >>>>>>>>>
> >> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> >> > > > > > > > >> lindong28@gmail.com
> >> > > > > > > > >>>>
> >> > > > > > > > >>>>>> wrote:
> >> > > > > > > > >>>>>>>>>
> >> > > > > > > > >>>>>>>>>> Hey Becket,
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> I think you are right that there may be
> >> out-of-order
> >> > > > > > > > >>> processing.
> >> > > > > > > > >>>>>>>> However,
> >> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may also
> >> > happen
> >> > > > even
> >> > > > > > > > >> if
> >> > > > > > > > >>> we
> >> > > > > > > > >>>>>> use a
> >> > > > > > > > >>>>>>>>>> separate queue.
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> Here is the example:
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected
> before
> >> > > > > receiving
> >> > > > > > > > >>>>>> response.
> >> > > > > > > > >>>>>>>>> Then
> >> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now
> >> stay
> >> > in
> >> > > > the
> >> > > > > > > > >>>>> controller
> >> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
> >> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and
> >> then
> >> > > > > thread2
> >> > > > > > > > >>>> takes
> >> > > > > > > > >>>>>> R2
> >> > > > > > > > >>>>>>>>> from
> >> > > > > > > > >>>>>>>>>> the request queue almost at the same time.
> >> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel.
> >> There is
> >> > > > > chance
> >> > > > > > > > >>> that
> >> > > > > > > > >>>>>> R2's
> >> > > > > > > > >>>>>>>>>> processing is completed before R1.
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for both
> >> > > > approaches
> >> > > > > > > > >> with
> >> > > > > > > > >>>>> very
> >> > > > > > > > >>>>>>>> low
> >> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add
> the
> >> > extra
> >> > > > > > > > >> queue.
> >> > > > > > > > >>>> What
> >> > > > > > > > >>>>>> do
> >> > > > > > > > >>>>>>>> you
> >> > > > > > > > >>>>>>>>>> think?
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>> Dong
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> >> > > > > > > > >>>> becket.qin@gmail.com
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was
> >> bright
> >> > up
> >> > > > some
> >> > > > > > > > >>> time
> >> > > > > > > > >>>>> ago
> >> > > > > > > > >>>>>>>> when
> >> > > > > > > > >>>>>>>>>> we
> >> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the
> request.
> >> The
> >> > > > > > > > >> concern
> >> > > > > > > > >>>> was
> >> > > > > > > > >>>>>> that
> >> > > > > > > > >>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
> >> processed in
> >> > > > > order.
> >> > > > > > > > >>> If
> >> > > > > > > > >>>> we
> >> > > > > > > > >>>>>> can
> >> > > > > > > > >>>>>>>>>> ensure
> >> > > > > > > > >>>>>>>>>>> that there is one controller request in the
> >> request
> >> > > > > > > > >> channel,
> >> > > > > > > > >>>> the
> >> > > > > > > > >>>>>>>> order
> >> > > > > > > > >>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are
> more
> >> > than
> >> > > > one
> >> > > > > > > > >>>>>> controller
> >> > > > > > > > >>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller
> request
> >> > order
> >> > > > may
> >> > > > > > > > >>>> change
> >> > > > > > > > >>>>>> and
> >> > > > > > > > >>>>>>>>>> cause
> >> > > > > > > > >>>>>>>>>>> problem. For example, think about the
> following
> >> > > > sequence:
> >> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1
> to
> >> > > broker
> >> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to
> the
> >> > head
> >> > > > of
> >> > > > > > > > >> the
> >> > > > > > > > >>>>>> request
> >> > > > > > > > >>>>>>>>>> queue.
> >> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and
> >> the
> >> > > > > > > > >> controller
> >> > > > > > > > >>>>>>>>> reconnected
> >> > > > > > > > >>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>> the broker.
> >> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> >> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head
> of
> >> the
> >> > > > > > > > >> request
> >> > > > > > > > >>>>> queue.
> >> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed
> >> before
> >> > > R1
> >> > > > is
> >> > > > > > > > >>>>>> processed,
> >> > > > > > > > >>>>>>>>>> which
> >> > > > > > > > >>>>>>>>>>> may cause problem.
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> >> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> >> > > > > > > > >>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to
> be
> >> a
> >> > > > simpler
> >> > > > > > > > >>>> less
> >> > > > > > > > >>>>>>>>> invasive
> >> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> >> Jun/Becket/others,
> >> > > do
> >> > > > > > > > >> you
> >> > > > > > > > >>>> see
> >> > > > > > > > >>>>>> any
> >> > > > > > > > >>>>>>>>>>> pitfalls
> >> > > > > > > > >>>>>>>>>>>> with this approach?
> >> > > > > > > > >>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang
> <
> >> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> >> > > > > > > > >>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> >> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I
> haven't
> >> > > thought
> >> > > > > > > > >>>>> before.
> >> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty
> >> > well,
> >> > > > and
> >> > > > > > > > >>>> also
> >> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric
> and
> >> > > > capacity
> >> > > > > > > > >>>>> config
> >> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact,
> if
> >> we
> >> > > were
> >> > > > > > > > >> to
> >> > > > > > > > >>>>> adopt
> >> > > > > > > > >>>>>>>>>>>>> this design, there is no public interface
> >> change,
> >> > > and
> >> > > > > > > > >> we
> >> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> >> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> >> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can
> readily
> >> > > > satisfy
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>>>>>> requirement
> >> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing
> >> > > inserting
> >> > > > > > > > >> at
> >> > > > > > > > >>>>> both
> >> > > > > > > > >>>>>>>> ends.
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is tied
> to
> >> > the
> >> > > > > > > > >>>>> coincidence
> >> > > > > > > > >>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>> we have two request priorities and there are
> >> two
> >> > > ends
> >> > > > > > > > >>> to a
> >> > > > > > > > >>>>>>>> deque.
> >> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems
> >> the
> >> > > > > > > > >> network
> >> > > > > > > > >>>>> layer
> >> > > > > > > > >>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic,
> >> e.g.
> >> > > if
> >> > > > > > > > >> we
> >> > > > > > > > >>>> were
> >> > > > > > > > >>>>>> to
> >> > > > > > > > >>>>>>>> add
> >> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future for
> some
> >> > > > reason,
> >> > > > > > > > >>> we
> >> > > > > > > > >>>>>> would
> >> > > > > > > > >>>>>>>>>>> probably
> >> > > > > > > > >>>>>>>>>>>>> need to go back to the design of separate
> >> queues,
> >> > > one
> >> > > > > > > > >>> for
> >> > > > > > > > >>>>> each
> >> > > > > > > > >>>>>>>>>> priority
> >> > > > > > > > >>>>>>>>>>>>> level.
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and
> lean
> >> > > toward
> >> > > > > > > > >>> your
> >> > > > > > > > >>>>>>>> suggested
> >> > > > > > > > >>>>>>>>>>>>> approach.
> >> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> @Becket,
> >> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design,
> >> I'm
> >> > > > > > > > >>> answering
> >> > > > > > > > >>>>>> your
> >> > > > > > > > >>>>>>>>>>> question
> >> > > > > > > > >>>>>>>>>>>>> only in the context
> >> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
> >> > suggestion
> >> > > > > > > > >> makes
> >> > > > > > > > >>>>>> sense,
> >> > > > > > > > >>>>>>>> and
> >> > > > > > > > >>>>>>>>>> I'm
> >> > > > > > > > >>>>>>>>>>>> ok
> >> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> >> > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20
> being
> >> > > > > > > > >> sufficient
> >> > > > > > > > >>>>>> enough.
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>> Lucas
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh
> >> Gharat
> >> > <
> >> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> >> > > > > > > > >>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
> >> prioritize
> >> > > the
> >> > > > > > > > >>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> >> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request
> queue
> >> > to a
> >> > > > > > > > >>>>> dequeue,
> >> > > > > > > > >>>>>>>> where
> >> > > > > > > > >>>>>>>>>> you
> >> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> >> > > > > > > > >>>> consume,..etc)
> >> > > > > > > > >>>>>> to
> >> > > > > > > > >>>>>>>> the
> >> > > > > > > > >>>>>>>>>> end
> >> > > > > > > > >>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller
> request,
> >> > you
> >> > > > > > > > >>> insert
> >> > > > > > > > >>>>> it
> >> > > > > > > > >>>>>> to
> >> > > > > > > > >>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>> head
> >> > > > > > > > >>>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller
> >> > > request
> >> > > > > > > > >>> will
> >> > > > > > > > >>>>> be
> >> > > > > > > > >>>>>>>> given
> >> > > > > > > > >>>>>>>>>>>> higher
> >> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request from
> the
> >> > > socket
> >> > > > > > > > >>> and
> >> > > > > > > > >>>>> mute
> >> > > > > > > > >>>>>>>> it
> >> > > > > > > > >>>>>>>>> and
> >> > > > > > > > >>>>>>>>>>>> only
> >> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this
> >> would
> >> > > > > > > > >>> ensure
> >> > > > > > > > >>>>> that
> >> > > > > > > > >>>>>>>> we
> >> > > > > > > > >>>>>>>>>> don't
> >> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second
> >> queue
> >> > > and
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>>>>> additional
> >> > > > > > > > >>>>>>>>>>>>> config
> >> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin
> <
> >> > > > > > > > >>>>>>>> becket.qin@gmail.com
> >> > > > > > > > >>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> >> > > > > > > > >>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree
> >> the
> >> > > > > > > > >>> current
> >> > > > > > > > >>>>>> design
> >> > > > > > > > >>>>>>>>>> makes
> >> > > > > > > > >>>>>>>>>>>>> sense.
> >> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new
> config
> >> > for
> >> > > > > > > > >> the
> >> > > > > > > > >>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>> queue
> >> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a
> >> case
> >> > > in
> >> > > > > > > > >>>> which
> >> > > > > > > > >>>>>>>> users
> >> > > > > > > > >>>>>>>>>>> would
> >> > > > > > > > >>>>>>>>>>>>>> change
> >> > > > > > > > >>>>>>>>>>>>>>> it.
> >> > > > > > > > >>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >> > > > > > > > >>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket
> Qin
> >> <
> >> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> >> > > > > > > > >>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> >> > > > > > > > >>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to
> >> "do we
> >> > > > > > > > >>>> expect
> >> > > > > > > > >>>>>>>> user to
> >> > > > > > > > >>>>>>>>>>> ever
> >> > > > > > > > >>>>>>>>>>>>>> change
> >> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"?
> If
> >> we
> >> > > > > > > > >>> agree
> >> > > > > > > > >>>>> that
> >> > > > > > > > >>>>>>>> 20
> >> > > > > > > > >>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>> already
> >> > > > > > > > >>>>>>>>>>>>>> a
> >> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do
> not
> >> > > > > > > > >> expect
> >> > > > > > > > >>>> user
> >> > > > > > > > >>>>>> to
> >> > > > > > > > >>>>>>>>>> change
> >> > > > > > > > >>>>>>>>>>>> it,
> >> > > > > > > > >>>>>>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>>>> it
> >> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a
> config?
> >> > > > > > > > >>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >> > > > > > > > >>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas
> >> Wang <
> >> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> >> > > > > > > > >>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> >> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right
> >> that
> >> > > > > > > > >>>>> normally
> >> > > > > > > > >>>>>>>> there
> >> > > > > > > > >>>>>>>>>>>> should
> >> > > > > > > > >>>>>>>>>>>>> be
> >> > > > > > > > >>>>>>>>>>>>>>>>> just
> >> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of
> muting,
> >> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there
> would
> >> be
> >> > > > > > > > >> many
> >> > > > > > > > >>>>>>>> enqueued
> >> > > > > > > > >>>>>>>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> >> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm
> not
> >> > sure
> >> > > > > > > > >>>> which
> >> > > > > > > > >>>>>> part
> >> > > > > > > > >>>>>>>>>>> conveys
> >> > > > > > > > >>>>>>>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>>>>>> info.
> >> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it
> out
> >> > the
> >> > > > > > > > >>>>> section.
> >> > > > > > > > >>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> >> > > > > > > > >>>> conditions,
> >> > > > > > > > >>>>>> the
> >> > > > > > > > >>>>>>>>>> current
> >> > > > > > > > >>>>>>>>>>>>>> design
> >> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers
> >> > running
> >> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have
> >> the
> >> > > > > > > > >>>>> controller
> >> > > > > > > > >>>>>>>>> queue
> >> > > > > > > > >>>>>>>>>>>>> capacity
> >> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to
> be
> >> 1,
> >> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> >> > > > > > > > >> different
> >> > > > > > > > >>>>>>>> controllers
> >> > > > > > > > >>>>>>>>>>> will
> >> > > > > > > > >>>>>>>>>>>> be
> >> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> >> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On
> the
> >> > > > > > > > >> other
> >> > > > > > > > >>>>> hand,
> >> > > > > > > > >>>>>>>>> adding
> >> > > > > > > > >>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>> extra
> >> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20,
> >> guards
> >> > us
> >> > > > > > > > >>> from
> >> > > > > > > > >>>>>>>> issues
> >> > > > > > > > >>>>>>>>> in
> >> > > > > > > > >>>>>>>>>>>> those
> >> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't
> >> much
> >> > > > > > > > >>>> downside
> >> > > > > > > > >>>>> of
> >> > > > > > > > >>>>>>>>> adding
> >> > > > > > > > >>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>> extra
> >> > > > > > > > >>>>>>>>>>>>>>>>> config.
> >> > > > > > > > >>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> >> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> >> > > > > > > > >>> statement
> >> > > > > > > > >>>>>> based
> >> > > > > > > > >>>>>>>> on
> >> > > > > > > > >>>>>>>>> a
> >> > > > > > > > >>>>>>>>>>>>> previous
> >> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the
> >> KIP.
> >> > > > > > > > >>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> >> > > > > > > > >>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM,
> Mayuresh
> >> > > > > > > > >>> Gharat <
> >> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think
> >> "The
> >> > > > > > > > >>>> memory
> >> > > > > > > > >>>>>>>>>>> consumption
> >> > > > > > > > >>>>>>>>>>>>> can
> >> > > > > > > > >>>>>>>>>>>>>>> rise
> >> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued
> requests
> >> > can
> >> > > > > > > > >>> go
> >> > > > > > > > >>>> up
> >> > > > > > > > >>>>>> to
> >> > > > > > > > >>>>>>>> 2x"
> >> > > > > > > > >>>>>>>>>> in
> >> > > > > > > > >>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>> impact
> >> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> >> > > > > > > > >> controller
> >> > > > > > > > >>>> to a
> >> > > > > > > > >>>>>>>> Broker
> >> > > > > > > > >>>>>>>>>> are
> >> > > > > > > > >>>>>>>>>>>> not
> >> > > > > > > > >>>>>>>>>>>>>>> high
> >> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket
> >> Qin <
> >> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating
> >> the
> >> > > > > > > > >>>> control
> >> > > > > > > > >>>>>>>> plane
> >> > > > > > > > >>>>>>>>>> from
> >> > > > > > > > >>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>> data
> >> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> >> > > > > > > > >> controller
> >> > > > > > > > >>>>>> request
> >> > > > > > > > >>>>>>>>> queue
> >> > > > > > > > >>>>>>>>>>> may
> >> > > > > > > > >>>>>>>>>>>>>> have
> >> > > > > > > > >>>>>>>>>>>>>>>>> many
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common
> >> case?
> >> > > > > > > > >>> The
> >> > > > > > > > >>>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>>>>> requests
> >> > > > > > > > >>>>>>>>>>>>>>>>> still
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> >> > > > > > > > >>> SocketServer
> >> > > > > > > > >>>>>> will
> >> > > > > > > > >>>>>>>>> mute
> >> > > > > > > > >>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>> channel
> >> > > > > > > > >>>>>>>>>>>>>>>>>> once
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the
> >> request
> >> > > > > > > > >>>>> channel.
> >> > > > > > > > >>>>>>>> So
> >> > > > > > > > >>>>>>>>>>>> assuming
> >> > > > > > > > >>>>>>>>>>>>>>> there
> >> > > > > > > > >>>>>>>>>>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller
> >> and
> >> > > > > > > > >>> each
> >> > > > > > > > >>>>>>>> broker,
> >> > > > > > > > >>>>>>>>> on
> >> > > > > > > > >>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>> broker
> >> > > > > > > > >>>>>>>>>>>>>>>>>> side,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller
> >> request
> >> > > > > > > > >>> in
> >> > > > > > > > >>>>> the
> >> > > > > > > > >>>>>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>>>>>>>>> queue
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the
> case,
> >> do
> >> > > > > > > > >> we
> >> > > > > > > > >>>>> need
> >> > > > > > > > >>>>>> a
> >> > > > > > > > >>>>>>>>>>> separate
> >> > > > > > > > >>>>>>>>>>>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The
> >> default
> >> > > > > > > > >>>> value
> >> > > > > > > > >>>>> 20
> >> > > > > > > > >>>>>>>>> means
> >> > > > > > > > >>>>>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>> we
> >> > > > > > > > >>>>>>>>>>>>>>>>> expect
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to
> >> happen
> >> > > > > > > > >>> in a
> >> > > > > > > > >>>>>> short
> >> > > > > > > > >>>>>>>>>> period
> >> > > > > > > > >>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>> time.
> >> > > > > > > > >>>>>>>>>>>>>>>>> I
> >> > > > > > > > >>>>>>>>>>>>>>>>>> am
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should
> increase
> >> > > > > > > > >> the
> >> > > > > > > > >>>>>>>> controller
> >> > > > > > > > >>>>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>>>>>> queue
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it
> >> seems
> >> > > > > > > > >>>>>> indicating
> >> > > > > > > > >>>>>>>>>>> something
> >> > > > > > > > >>>>>>>>>>>>>> very
> >> > > > > > > > >>>>>>>>>>>>>>>>> wrong
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong
> >> Lin <
> >> > > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> >> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> >> > > > > > > > >>> intuitive.
> >> > > > > > > > >>>> It
> >> > > > > > > > >>>>>>>> will
> >> > > > > > > > >>>>>>>>> be
> >> > > > > > > > >>>>>>>>>>> good
> >> > > > > > > > >>>>>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>>>>>> learn
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other
> >> reviewers.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM,
> Lucas
> >> > > > > > > > >>> Wang <
> >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section
> of
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>> KIP
> >> > > > > > > > >>>>>> by
> >> > > > > > > > >>>>>>>>>>>> explaining
> >> > > > > > > > >>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>> cases
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know
> your
> >> > > > > > > > >>>>>> comments.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM,
> Lucas
> >> > > > > > > > >>> Wang
> >> > > > > > > > >>>> <
> >> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> >> > > > > > > > >>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow
> is
> >> > > > > > > > >>>> merely
> >> > > > > > > > >>>>>>>> for me
> >> > > > > > > > >>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>> easily
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> construct
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> a
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> >> > > > > > > > >> In
> >> > > > > > > > >>>>>>>> production,
> >> > > > > > > > >>>>>>>>>>> other
> >> > > > > > > > >>>>>>>>>>>>>> than
> >> > > > > > > > >>>>>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> disk
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> >> > > > > > > > >> by
> >> > > > > > > > >>>> high
> >> > > > > > > > >>>>>>>>> produce
> >> > > > > > > > >>>>>>>>>>> QPS.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to
> kill
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>>>> broker
> >> > > > > > > > >>>>>>>>> and
> >> > > > > > > > >>>>>>>>>>>>> that's
> >> > > > > > > > >>>>>>>>>>>>>>> when
> >> > > > > > > > >>>>>>>>>>>>>>>>>> this
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous
> question
> >> > > > > > > > >>>> about
> >> > > > > > > > >>>>>> each
> >> > > > > > > > >>>>>>>>>>>>>> ProduceRequest
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> covering
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a
> LeaderAndIsr
> >> > > > > > > > >>>>> request
> >> > > > > > > > >>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>> enqueued
> >> > > > > > > > >>>>>>>>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>>>>>>> tries
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> >> > > > > > > > >> broker0,
> >> > > > > > > > >>>>> from
> >> > > > > > > > >>>>>>>>> leader
> >> > > > > > > > >>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>>>> follower
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> >> > > > > > > > >>>> *test-0*.
> >> > > > > > > > >>>>>> For
> >> > > > > > > > >>>>>>>> the
> >> > > > > > > > >>>>>>>>>>> sake
> >> > > > > > > > >>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>>>>> argument,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other
> brokers,
> >> > > > > > > > >>> say
> >> > > > > > > > >>>>>>>> broker1,
> >> > > > > > > > >>>>>>>>>> have
> >> > > > > > > > >>>>>>>>>>>>>>> *stopped*
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> >> > > > > > > > >>> have
> >> > > > > > > > >>>>>> acks =
> >> > > > > > > > >>>>>>>>> -1
> >> > > > > > > > >>>>>>>>>>>> (ALL)
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> >> > > > > > > > >>>> ProduceRequests
> >> > > > > > > > >>>>>>>> ahead
> >> > > > > > > > >>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> will
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> be
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> >> > > > > > > > >>>>> replicated
> >> > > > > > > > >>>>>>>> to
> >> > > > > > > > >>>>>>>>>> other
> >> > > > > > > > >>>>>>>>>>>>>> brokers
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> (because
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they
> will
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> >> > > > > > > > >>>>>>>> LeaderAndISR
> >> > > > > > > > >>>>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> processed
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> or
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> >> > > > > > > > >>>>> immediately
> >> > > > > > > > >>>>>>>>>>> transition
> >> > > > > > > > >>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> partition
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker
> sees
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>>>>>> replication
> >> > > > > > > > >>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> remaining
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> 19
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> >> > > > > > > > >>>>> indicating
> >> > > > > > > > >>>>>>>> that
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader
> for
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>>>>> "test-0".
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> >> > > > > > > > >> between
> >> > > > > > > > >>>> 1.1
> >> > > > > > > > >>>>>> and
> >> > > > > > > > >>>>>>>>> 1.2,
> >> > > > > > > > >>>>>>>>>>>> let's
> >> > > > > > > > >>>>>>>>>>>>>> say
> >> > > > > > > > >>>>>>>>>>>>>>>>>> there
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> are
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> >> > > > > > > > >>>>>> LeaderAndISR,
> >> > > > > > > > >>>>>>>>> and
> >> > > > > > > > >>>>>>>>>>>> there
> >> > > > > > > > >>>>>>>>>>>>>> are
> >> > > > > > > > >>>>>>>>>>>>>>> 8
> >> > > > > > > > >>>>>>>>>>>>>>>>> io
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> >> > > > > > > > >>>>>> approximately
> >> > > > > > > > >>>>>>>>> 3000
> >> > > > > > > > >>>>>>>>>>>>> produce
> >> > > > > > > > >>>>>>>>>>>>>>>>>> requests.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> Now
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread
> that
> >> > > > > > > > >>>>> finally
> >> > > > > > > > >>>>>>>>>> processed
> >> > > > > > > > >>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> >> > > > > > > > >> we
> >> > > > > > > > >>>>> model
> >> > > > > > > > >>>>>>>> the
> >> > > > > > > > >>>>>>>>>> time
> >> > > > > > > > >>>>>>>>>>>> when
> >> > > > > > > > >>>>>>>>>>>>>>> their
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> >> > > > > > > > >>>> ...t2999,
> >> > > > > > > > >>>>>> and
> >> > > > > > > > >>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> request
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> is
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> >> > > > > > > > >>>> request
> >> > > > > > > > >>>>>>>> would
> >> > > > > > > > >>>>>>>>>> have
> >> > > > > > > > >>>>>>>>>>>>>> waited
> >> > > > > > > > >>>>>>>>>>>>>>> an
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> extra
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory,
> the
> >> > > > > > > > >>> 2nd
> >> > > > > > > > >>>>> an
> >> > > > > > > > >>>>>>>> extra
> >> > > > > > > > >>>>>>>>>>> time
> >> > > > > > > > >>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>>>> t3000 -
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> t1,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> >> > > > > > > > >>>> difference
> >> > > > > > > > >>>>> is
> >> > > > > > > > >>>>>>>>> bigger
> >> > > > > > > > >>>>>>>>>>> for
> >> > > > > > > > >>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>> earlier
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> >> > > > > > > > >>> ones.
> >> > > > > > > > >>>>> For
> >> > > > > > > > >>>>>>>> the
> >> > > > > > > > >>>>>>>>>> same
> >> > > > > > > > >>>>>>>>>>>>>> reason,
> >> > > > > > > > >>>>>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the
> bigger
> >> > > > > > > > >>>>> benefit
> >> > > > > > > > >>>>>>>> we
> >> > > > > > > > >>>>>>>>> get
> >> > > > > > > > >>>>>>>>>>>>> (capped
> >> > > > > > > > >>>>>>>>>>>>>>> by
> >> > > > > > > > >>>>>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> >> > > > > > > > >>> have
> >> > > > > > > > >>>>>>>> acks=0 or
> >> > > > > > > > >>>>>>>>>>>> acks=1
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> >> > > > > > > > >> differences
> >> > > > > > > > >>> in
> >> > > > > > > > >>>>>> this
> >> > > > > > > > >>>>>>>>> case,
> >> > > > > > > > >>>>>>>>>>> but
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> >> > > > > > > > >> of
> >> > > > > > > > >>>>>>>> partition
> >> > > > > > > > >>>>>>>>>>> test-0
> >> > > > > > > > >>>>>>>>>>>> in
> >> > > > > > > > >>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> >> > > > > > > > >>> LeaderAndISR
> >> > > > > > > > >>>>>> will
> >> > > > > > > > >>>>>>>> be
> >> > > > > > > > >>>>>>>>>>>> appended
> >> > > > > > > > >>>>>>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> local
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> >> > > > > > > > >>> after
> >> > > > > > > > >>>>>>>>> processing
> >> > > > > > > > >>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> >> > > > > > > > >> of
> >> > > > > > > > >>>> data
> >> > > > > > > > >>>>>>>> loss
> >> > > > > > > > >>>>>>>>> in
> >> > > > > > > > >>>>>>>>>>>> terms
> >> > > > > > > > >>>>>>>>>>>>> of
> >> > > > > > > > >>>>>>>>>>>>>>>>>> messages
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> >> > > > > > > > >>> the
> >> > > > > > > > >>>>>> effect
> >> > > > > > > > >>>>>>>>>> since
> >> > > > > > > > >>>>>>>>>>> if
> >> > > > > > > > >>>>>>>>>>>>> the
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> >> > > > > > > > >> response
> >> > > > > > > > >>> to
> >> > > > > > > > >>>>>>>>> producers
> >> > > > > > > > >>>>>>>>>>> will
> >> > > > > > > > >>>>>>>>>>>>>> have
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> >> > > > > > > > >>> error,
> >> > > > > > > > >>>>>>>> causing
> >> > > > > > > > >>>>>>>>>>>> producers
> >> > > > > > > > >>>>>>>>>>>>>> to
> >> > > > > > > > >>>>>>>>>>>>>>>>> retry
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> >> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the
> benefit
> >> > > > > > > > >>> for
> >> > > > > > > > >>>>>>>> reducing
> >> >
> >>
> >>
> >> --
> >> -Regards,
> >> Mayuresh R. Gharat
> >> (862) 250-7125
> >>
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

A kind reminder for review of this KIP.

Thank you very much!
Lucas

On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi All,
>
> I've updated the KIP by adding the dedicated endpoints for controller
> connections,
> and pinning threads for controller requests.
> Also I've updated the title of this KIP. Please take a look and let me
> know your feedback.
>
> Thanks a lot for your time!
> Lucas
>
> On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
>
>> Hi Lucas,
>> I agree, if we want to go forward with a separate controller plane and
>> data
>> plane and completely isolate them, having a separate port for controller
>> with a separate Acceptor and a Processor sounds ideal to me.
>>
>> Thanks,
>>
>> Mayuresh
>>
>>
>> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com> wrote:
>>
>> > Hi Lucas,
>> >
>> > Yes, I agree that a dedicated end to end control flow would be ideal.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com>
>> wrote:
>> >
>> > > Thanks for the comment, Becket.
>> > > So far, we've been trying to avoid making any request handler thread
>> > > special.
>> > > But if we were to follow that path in order to make the two planes
>> more
>> > > isolated,
>> > > what do you think about also having a dedicated processor thread,
>> > > and dedicated port for the controller?
>> > >
>> > > Today one processor thread can handle multiple connections, let's say
>> 100
>> > > connections
>> > >
>> > > represented by connection0, ... connection99, among which
>> connection0-98
>> > > are from clients, while connection99 is from
>> > >
>> > > the controller. Further let's say after one selector polling, there
>> are
>> > > incoming requests on all connections.
>> > >
>> > > When the request queue is full, (either the data request being full in
>> > the
>> > > two queue design, or
>> > >
>> > > the one single queue being full in the deque design), the processor
>> > thread
>> > > will be blocked first
>> > >
>> > > when trying to enqueue the data request from connection0, then
>> possibly
>> > > blocked for the data request
>> > >
>> > > from connection1, ... etc even though the controller request is ready
>> to
>> > be
>> > > enqueued.
>> > >
>> > > To solve this problem, it seems we would need to have a separate port
>> > > dedicated to
>> > >
>> > > the controller, a dedicated processor thread, a dedicated controller
>> > > request queue,
>> > >
>> > > and pinning of one request handler thread for controller requests.
>> > >
>> > > Thanks,
>> > > Lucas
>> > >
>> > >
>> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
>> > wrote:
>> > >
>> > > > Personally I am not fond of the dequeue approach simply because it
>> is
>> > > > against the basic idea of isolating the controller plane and data
>> > plane.
>> > > > With a single dequeue, theoretically speaking the controller
>> requests
>> > can
>> > > > starve the clients requests. I would prefer the approach with a
>> > separate
>> > > > controller request queue and a dedicated controller request handler
>> > > thread.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Sure, I can summarize the usage of correlation id. But before I do
>> > > that,
>> > > > it
>> > > > > seems
>> > > > > the same out-of-order processing can also happen to Produce
>> requests
>> > > sent
>> > > > > by producers,
>> > > > > following the same example you described earlier.
>> > > > > If that's the case, I think this probably deserves a separate doc
>> and
>> > > > > design independent of this KIP.
>> > > > >
>> > > > > Lucas
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Hey Lucas,
>> > > > > >
>> > > > > > Could you update the KIP if you are confident with the approach
>> > which
>> > > > > uses
>> > > > > > correlation id? The idea around correlation id is kind of
>> scattered
>> > > > > across
>> > > > > > multiple emails. It will be useful if other reviews can read the
>> > KIP
>> > > to
>> > > > > > understand the latest proposal.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Dong
>> > > > > >
>> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
>> > > > > > gharatmayuresh15@gmail.com> wrote:
>> > > > > >
>> > > > > > > I like the idea of the dequeue implementation by Lucas. This
>> will
>> > > > help
>> > > > > us
>> > > > > > > avoid additional queue for controller and additional configs
>> in
>> > > > Kafka.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Mayuresh
>> > > > > > >
>> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
>> becket.qin@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Jun,
>> > > > > > > >
>> > > > > > > > The usage of correlation ID might still be useful to address
>> > the
>> > > > > cases
>> > > > > > > > that the controller epoch and leader epoch check are not
>> > > sufficient
>> > > > > to
>> > > > > > > > guarantee correct behavior. For example, if the controller
>> > sends
>> > > a
>> > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and
>> the
>> > > > broker
>> > > > > > > > processes it in the reverse order, the replica may still be
>> > > wrongly
>> > > > > > > > recreated, right?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >
>> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io>
>> > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hmm, since we already use controller epoch and leader
>> epoch
>> > for
>> > > > > > > properly
>> > > > > > > > > caching the latest partition state, do we really need
>> > > correlation
>> > > > > id
>> > > > > > > for
>> > > > > > > > > ordering the controller requests?
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Jun
>> > > > > > > > >
>> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
>> > > > becket.qin@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> Lucas and Mayuresh,
>> > > > > > > > >>
>> > > > > > > > >> Good idea. The correlation id should work.
>> > > > > > > > >>
>> > > > > > > > >> In the ControllerChannelManager, a request will be resent
>> > > until
>> > > > a
>> > > > > > > > response
>> > > > > > > > >> is received. So if the controller to broker connection
>> > > > disconnects
>> > > > > > > after
>> > > > > > > > >> controller sends R1_a, but before the response of R1_a is
>> > > > > received,
>> > > > > > a
>> > > > > > > > >> disconnection may cause the controller to resend R1_b.
>> i.e.
>> > > > until
>> > > > > R1
>> > > > > > > is
>> > > > > > > > >> acked, R2 won't be sent by the controller.
>> > > > > > > > >> This gives two guarantees:
>> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
>> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have been
>> > > > > processed
>> > > > > > at
>> > > > > > > > >> least once.
>> > > > > > > > >>
>> > > > > > > > >> So on the broker side, with a single thread controller
>> > request
>> > > > > > > handler,
>> > > > > > > > the
>> > > > > > > > >> logic should be:
>> > > > > > > > >> 1. Process what ever request seen in the controller
>> request
>> > > > queue
>> > > > > > > > >> 2. For the given epoch, drop request if its correlation
>> id
>> > is
>> > > > > > smaller
>> > > > > > > > than
>> > > > > > > > >> that of the last processed request.
>> > > > > > > > >>
>> > > > > > > > >> Thanks,
>> > > > > > > > >>
>> > > > > > > > >> Jiangjie (Becket) Qin
>> > > > > > > > >>
>> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
>> jun@confluent.io>
>> > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >>> I agree that there is no strong ordering when there are
>> > more
>> > > > than
>> > > > > > one
>> > > > > > > > >>> socket connections. Currently, we rely on
>> controllerEpoch
>> > and
>> > > > > > > > leaderEpoch
>> > > > > > > > >>> to ensure that the receiving broker picks up the latest
>> > state
>> > > > for
>> > > > > > > each
>> > > > > > > > >>> partition.
>> > > > > > > > >>>
>> > > > > > > > >>> One potential issue with the dequeue approach is that if
>> > the
>> > > > > queue
>> > > > > > is
>> > > > > > > > >> full,
>> > > > > > > > >>> there is no guarantee that the controller requests will
>> be
>> > > > > enqueued
>> > > > > > > > >>> quickly.
>> > > > > > > > >>>
>> > > > > > > > >>> Thanks,
>> > > > > > > > >>>
>> > > > > > > > >>> Jun
>> > > > > > > > >>>
>> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
>> > > > > > > > >>> gharatmayuresh15@gmail.com
>> > > > > > > > >>>> wrote:
>> > > > > > > > >>>
>> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
>> > NetworkClient
>> > > > > > > > >> constructor.
>> > > > > > > > >>>> Since we reuse the same NetworkClient between
>> Controller
>> > and
>> > > > the
>> > > > > > > > >> broker,
>> > > > > > > > >>> a
>> > > > > > > > >>>> disconnection should not cause it to reset to 0, in
>> which
>> > > case
>> > > > > it
>> > > > > > > can
>> > > > > > > > >> be
>> > > > > > > > >>>> used to reject obsolete requests.
>> > > > > > > > >>>>
>> > > > > > > > >>>> Thanks,
>> > > > > > > > >>>>
>> > > > > > > > >>>> Mayuresh
>> > > > > > > > >>>>
>> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
>> > > > > lucasatucla@gmail.com
>> > > > > > >
>> > > > > > > > >>> wrote:
>> > > > > > > > >>>>
>> > > > > > > > >>>>> @Dong,
>> > > > > > > > >>>>> Great example and explanation, thanks!
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> @All
>> > > > > > > > >>>>> Regarding the example given by Dong, it seems even if
>> we
>> > > use
>> > > > a
>> > > > > > > queue,
>> > > > > > > > >>>> and a
>> > > > > > > > >>>>> dedicated controller request handling thread,
>> > > > > > > > >>>>> the same result can still happen because R1_a will be
>> > sent
>> > > on
>> > > > > one
>> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a different
>> > > > > connection,
>> > > > > > > > >>>>> and there is no ordering between different
>> connections on
>> > > the
>> > > > > > > broker
>> > > > > > > > >>>> side.
>> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it seems
>> > > > > correlation
>> > > > > > id
>> > > > > > > > >>>> within
>> > > > > > > > >>>>> the same NetworkClient object is monotonically
>> increasing
>> > > and
>> > > > > > never
>> > > > > > > > >>>> reset,
>> > > > > > > > >>>>> hence a broker can leverage that to properly reject
>> > > obsolete
>> > > > > > > > >> requests.
>> > > > > > > > >>>>> Thoughts?
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> Thanks,
>> > > > > > > > >>>>> Lucas
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
>> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
>> > > > > > > > >>>>>
>> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
>> > connection
>> > > > > > loss, I
>> > > > > > > > >>>> think.
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>> Thanks,
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>> Mayuresh
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
>> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
>> > > > > > > > >>>>>> wrote:
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing can
>> > happen
>> > > > > with
>> > > > > > > > >>>> having 2
>> > > > > > > > >>>>>>> separate queues as well and it can even happen
>> today.
>> > > > > > > > >>>>>>> Can we use the correlationId in the request from the
>> > > > > controller
>> > > > > > > > >> to
>> > > > > > > > >>>> the
>> > > > > > > > >>>>>>> broker to handle ordering ?
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Thanks,
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Mayuresh
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
>> > > > > > becket.qin@gmail.com
>> > > > > > > > >>>
>> > > > > > > > >>>>> wrote:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
>> controller
>> > > > > request
>> > > > > > > > >>>> handling
>> > > > > > > > >>>>>>>> thread would be a better isolation. It also solves
>> the
>> > > > > > > > >> reordering
>> > > > > > > > >>>>> issue.
>> > > > > > > > >>>>>>>>
>> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
>> > > > > > > > >> jjkoshy.w@gmail.com>
>> > > > > > > > >>>>>> wrote:
>> > > > > > > > >>>>>>>>
>> > > > > > > > >>>>>>>>> Good example. I think this scenario can occur in
>> the
>> > > > > current
>> > > > > > > > >>> code
>> > > > > > > > >>>> as
>> > > > > > > > >>>>>>>> well
>> > > > > > > > >>>>>>>>> but with even lower probability given that there
>> are
>> > > > other
>> > > > > > > > >>>>>>>> non-controller
>> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy though
>> and
>> > I
>> > > > > think
>> > > > > > a
>> > > > > > > > >>>> safer
>> > > > > > > > >>>>>>>>> approach would be separate queues and pinning
>> > > controller
>> > > > > > > > >> request
>> > > > > > > > >>>>>>>> handling
>> > > > > > > > >>>>>>>>> to one handler thread.
>> > > > > > > > >>>>>>>>>
>> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
>> > > > > > > > >> lindong28@gmail.com
>> > > > > > > > >>>>
>> > > > > > > > >>>>>> wrote:
>> > > > > > > > >>>>>>>>>
>> > > > > > > > >>>>>>>>>> Hey Becket,
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> I think you are right that there may be
>> out-of-order
>> > > > > > > > >>> processing.
>> > > > > > > > >>>>>>>> However,
>> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may also
>> > happen
>> > > > even
>> > > > > > > > >> if
>> > > > > > > > >>> we
>> > > > > > > > >>>>>> use a
>> > > > > > > > >>>>>>>>>> separate queue.
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> Here is the example:
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
>> > > > > receiving
>> > > > > > > > >>>>>> response.
>> > > > > > > > >>>>>>>>> Then
>> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now
>> stay
>> > in
>> > > > the
>> > > > > > > > >>>>> controller
>> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
>> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and
>> then
>> > > > > thread2
>> > > > > > > > >>>> takes
>> > > > > > > > >>>>>> R2
>> > > > > > > > >>>>>>>>> from
>> > > > > > > > >>>>>>>>>> the request queue almost at the same time.
>> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel.
>> There is
>> > > > > chance
>> > > > > > > > >>> that
>> > > > > > > > >>>>>> R2's
>> > > > > > > > >>>>>>>>>> processing is completed before R1.
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for both
>> > > > approaches
>> > > > > > > > >> with
>> > > > > > > > >>>>> very
>> > > > > > > > >>>>>>>> low
>> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add the
>> > extra
>> > > > > > > > >> queue.
>> > > > > > > > >>>> What
>> > > > > > > > >>>>>> do
>> > > > > > > > >>>>>>>> you
>> > > > > > > > >>>>>>>>>> think?
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>> Dong
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
>> > > > > > > > >>>> becket.qin@gmail.com
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was
>> bright
>> > up
>> > > > some
>> > > > > > > > >>> time
>> > > > > > > > >>>>> ago
>> > > > > > > > >>>>>>>> when
>> > > > > > > > >>>>>>>>>> we
>> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the request.
>> The
>> > > > > > > > >> concern
>> > > > > > > > >>>> was
>> > > > > > > > >>>>>> that
>> > > > > > > > >>>>>>>>> the
>> > > > > > > > >>>>>>>>>>> controller requests are supposed to be
>> processed in
>> > > > > order.
>> > > > > > > > >>> If
>> > > > > > > > >>>> we
>> > > > > > > > >>>>>> can
>> > > > > > > > >>>>>>>>>> ensure
>> > > > > > > > >>>>>>>>>>> that there is one controller request in the
>> request
>> > > > > > > > >> channel,
>> > > > > > > > >>>> the
>> > > > > > > > >>>>>>>> order
>> > > > > > > > >>>>>>>>> is
>> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are more
>> > than
>> > > > one
>> > > > > > > > >>>>>> controller
>> > > > > > > > >>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller request
>> > order
>> > > > may
>> > > > > > > > >>>> change
>> > > > > > > > >>>>>> and
>> > > > > > > > >>>>>>>>>> cause
>> > > > > > > > >>>>>>>>>>> problem. For example, think about the following
>> > > > sequence:
>> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to
>> > > broker
>> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the
>> > head
>> > > > of
>> > > > > > > > >> the
>> > > > > > > > >>>>>> request
>> > > > > > > > >>>>>>>>>> queue.
>> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and
>> the
>> > > > > > > > >> controller
>> > > > > > > > >>>>>>>>> reconnected
>> > > > > > > > >>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>> the broker.
>> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
>> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of
>> the
>> > > > > > > > >> request
>> > > > > > > > >>>>> queue.
>> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed
>> before
>> > > R1
>> > > > is
>> > > > > > > > >>>>>> processed,
>> > > > > > > > >>>>>>>>>> which
>> > > > > > > > >>>>>>>>>>> may cause problem.
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
>> > > > > > > > >>>>> jjkoshy.w@gmail.com>
>> > > > > > > > >>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be
>> a
>> > > > simpler
>> > > > > > > > >>>> less
>> > > > > > > > >>>>>>>>> invasive
>> > > > > > > > >>>>>>>>>>>> alternative and it should work.
>> Jun/Becket/others,
>> > > do
>> > > > > > > > >> you
>> > > > > > > > >>>> see
>> > > > > > > > >>>>>> any
>> > > > > > > > >>>>>>>>>>> pitfalls
>> > > > > > > > >>>>>>>>>>>> with this approach?
>> > > > > > > > >>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
>> > > > > > > > >>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
>> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't
>> > > thought
>> > > > > > > > >>>>> before.
>> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty
>> > well,
>> > > > and
>> > > > > > > > >>>> also
>> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and
>> > > > capacity
>> > > > > > > > >>>>> config
>> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if
>> we
>> > > were
>> > > > > > > > >> to
>> > > > > > > > >>>>> adopt
>> > > > > > > > >>>>>>>>>>>>> this design, there is no public interface
>> change,
>> > > and
>> > > > > > > > >> we
>> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
>> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
>> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily
>> > > > satisfy
>> > > > > > > > >>> the
>> > > > > > > > >>>>>>>>>> requirement
>> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing
>> > > inserting
>> > > > > > > > >> at
>> > > > > > > > >>>>> both
>> > > > > > > > >>>>>>>> ends.
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is tied to
>> > the
>> > > > > > > > >>>>> coincidence
>> > > > > > > > >>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>> we have two request priorities and there are
>> two
>> > > ends
>> > > > > > > > >>> to a
>> > > > > > > > >>>>>>>> deque.
>> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems
>> the
>> > > > > > > > >> network
>> > > > > > > > >>>>> layer
>> > > > > > > > >>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic,
>> e.g.
>> > > if
>> > > > > > > > >> we
>> > > > > > > > >>>> were
>> > > > > > > > >>>>>> to
>> > > > > > > > >>>>>>>> add
>> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future for some
>> > > > reason,
>> > > > > > > > >>> we
>> > > > > > > > >>>>>> would
>> > > > > > > > >>>>>>>>>>> probably
>> > > > > > > > >>>>>>>>>>>>> need to go back to the design of separate
>> queues,
>> > > one
>> > > > > > > > >>> for
>> > > > > > > > >>>>> each
>> > > > > > > > >>>>>>>>>> priority
>> > > > > > > > >>>>>>>>>>>>> level.
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean
>> > > toward
>> > > > > > > > >>> your
>> > > > > > > > >>>>>>>> suggested
>> > > > > > > > >>>>>>>>>>>>> approach.
>> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> @Becket,
>> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design,
>> I'm
>> > > > > > > > >>> answering
>> > > > > > > > >>>>>> your
>> > > > > > > > >>>>>>>>>>> question
>> > > > > > > > >>>>>>>>>>>>> only in the context
>> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
>> > suggestion
>> > > > > > > > >> makes
>> > > > > > > > >>>>>> sense,
>> > > > > > > > >>>>>>>> and
>> > > > > > > > >>>>>>>>>> I'm
>> > > > > > > > >>>>>>>>>>>> ok
>> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
>> > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20 being
>> > > > > > > > >> sufficient
>> > > > > > > > >>>>>> enough.
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>> Lucas
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh
>> Gharat
>> > <
>> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
>> > > > > > > > >>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
>> prioritize
>> > > the
>> > > > > > > > >>>>>>>> controller
>> > > > > > > > >>>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>>>>> over any other requests.
>> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request queue
>> > to a
>> > > > > > > > >>>>> dequeue,
>> > > > > > > > >>>>>>>> where
>> > > > > > > > >>>>>>>>>> you
>> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
>> > > > > > > > >>>> consume,..etc)
>> > > > > > > > >>>>>> to
>> > > > > > > > >>>>>>>> the
>> > > > > > > > >>>>>>>>>> end
>> > > > > > > > >>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request,
>> > you
>> > > > > > > > >>> insert
>> > > > > > > > >>>>> it
>> > > > > > > > >>>>>> to
>> > > > > > > > >>>>>>>>> the
>> > > > > > > > >>>>>>>>>>> head
>> > > > > > > > >>>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller
>> > > request
>> > > > > > > > >>> will
>> > > > > > > > >>>>> be
>> > > > > > > > >>>>>>>> given
>> > > > > > > > >>>>>>>>>>>> higher
>> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request from the
>> > > socket
>> > > > > > > > >>> and
>> > > > > > > > >>>>> mute
>> > > > > > > > >>>>>>>> it
>> > > > > > > > >>>>>>>>> and
>> > > > > > > > >>>>>>>>>>>> only
>> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this
>> would
>> > > > > > > > >>> ensure
>> > > > > > > > >>>>> that
>> > > > > > > > >>>>>>>> we
>> > > > > > > > >>>>>>>>>> don't
>> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second
>> queue
>> > > and
>> > > > > > > > >>> the
>> > > > > > > > >>>>>>>>> additional
>> > > > > > > > >>>>>>>>>>>>> config
>> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> What do you think ?
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> Mayuresh
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>> > > > > > > > >>>>>>>> becket.qin@gmail.com
>> > > > > > > > >>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
>> > > > > > > > >>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree
>> the
>> > > > > > > > >>> current
>> > > > > > > > >>>>>> design
>> > > > > > > > >>>>>>>>>> makes
>> > > > > > > > >>>>>>>>>>>>> sense.
>> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config
>> > for
>> > > > > > > > >> the
>> > > > > > > > >>>>>>>> controller
>> > > > > > > > >>>>>>>>>>> queue
>> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a
>> case
>> > > in
>> > > > > > > > >>>> which
>> > > > > > > > >>>>>>>> users
>> > > > > > > > >>>>>>>>>>> would
>> > > > > > > > >>>>>>>>>>>>>> change
>> > > > > > > > >>>>>>>>>>>>>>> it.
>> > > > > > > > >>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>> > > > > > > > >>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin
>> <
>> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
>> > > > > > > > >>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
>> > > > > > > > >>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to
>> "do we
>> > > > > > > > >>>> expect
>> > > > > > > > >>>>>>>> user to
>> > > > > > > > >>>>>>>>>>> ever
>> > > > > > > > >>>>>>>>>>>>>> change
>> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If
>> we
>> > > > > > > > >>> agree
>> > > > > > > > >>>>> that
>> > > > > > > > >>>>>>>> 20
>> > > > > > > > >>>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>> already
>> > > > > > > > >>>>>>>>>>>>>> a
>> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do not
>> > > > > > > > >> expect
>> > > > > > > > >>>> user
>> > > > > > > > >>>>>> to
>> > > > > > > > >>>>>>>>>> change
>> > > > > > > > >>>>>>>>>>>> it,
>> > > > > > > > >>>>>>>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>>>> it
>> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
>> > > > > > > > >>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>> > > > > > > > >>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas
>> Wang <
>> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
>> > > > > > > > >>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
>> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right
>> that
>> > > > > > > > >>>>> normally
>> > > > > > > > >>>>>>>> there
>> > > > > > > > >>>>>>>>>>>> should
>> > > > > > > > >>>>>>>>>>>>> be
>> > > > > > > > >>>>>>>>>>>>>>>>> just
>> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
>> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would
>> be
>> > > > > > > > >> many
>> > > > > > > > >>>>>>>> enqueued
>> > > > > > > > >>>>>>>>>>>>> controller
>> > > > > > > > >>>>>>>>>>>>>>>>> requests.
>> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not
>> > sure
>> > > > > > > > >>>> which
>> > > > > > > > >>>>>> part
>> > > > > > > > >>>>>>>>>>> conveys
>> > > > > > > > >>>>>>>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>>>>>> info.
>> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out
>> > the
>> > > > > > > > >>>>> section.
>> > > > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
>> > > > > > > > >>>> conditions,
>> > > > > > > > >>>>>> the
>> > > > > > > > >>>>>>>>>> current
>> > > > > > > > >>>>>>>>>>>>>> design
>> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers
>> > running
>> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have
>> the
>> > > > > > > > >>>>> controller
>> > > > > > > > >>>>>>>>> queue
>> > > > > > > > >>>>>>>>>>>>> capacity
>> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be
>> 1,
>> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
>> > > > > > > > >> different
>> > > > > > > > >>>>>>>> controllers
>> > > > > > > > >>>>>>>>>>> will
>> > > > > > > > >>>>>>>>>>>> be
>> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
>> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
>> > > > > > > > >> other
>> > > > > > > > >>>>> hand,
>> > > > > > > > >>>>>>>>> adding
>> > > > > > > > >>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>> extra
>> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20,
>> guards
>> > us
>> > > > > > > > >>> from
>> > > > > > > > >>>>>>>> issues
>> > > > > > > > >>>>>>>>> in
>> > > > > > > > >>>>>>>>>>>> those
>> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't
>> much
>> > > > > > > > >>>> downside
>> > > > > > > > >>>>> of
>> > > > > > > > >>>>>>>>> adding
>> > > > > > > > >>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>> extra
>> > > > > > > > >>>>>>>>>>>>>>>>> config.
>> > > > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
>> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
>> > > > > > > > >>> statement
>> > > > > > > > >>>>>> based
>> > > > > > > > >>>>>>>> on
>> > > > > > > > >>>>>>>>> a
>> > > > > > > > >>>>>>>>>>>>> previous
>> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the
>> KIP.
>> > > > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
>> > > > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
>> > > > > > > > >>> Gharat <
>> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
>> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think
>> "The
>> > > > > > > > >>>> memory
>> > > > > > > > >>>>>>>>>>> consumption
>> > > > > > > > >>>>>>>>>>>>> can
>> > > > > > > > >>>>>>>>>>>>>>> rise
>> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests
>> > can
>> > > > > > > > >>> go
>> > > > > > > > >>>> up
>> > > > > > > > >>>>>> to
>> > > > > > > > >>>>>>>> 2x"
>> > > > > > > > >>>>>>>>>> in
>> > > > > > > > >>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>> impact
>> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
>> > > > > > > > >> controller
>> > > > > > > > >>>> to a
>> > > > > > > > >>>>>>>> Broker
>> > > > > > > > >>>>>>>>>> are
>> > > > > > > > >>>>>>>>>>>> not
>> > > > > > > > >>>>>>>>>>>>>>> high
>> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket
>> Qin <
>> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
>> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating
>> the
>> > > > > > > > >>>> control
>> > > > > > > > >>>>>>>> plane
>> > > > > > > > >>>>>>>>>> from
>> > > > > > > > >>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>> data
>> > > > > > > > >>>>>>>>>>>>>>>>>> plane
>> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
>> > > > > > > > >> controller
>> > > > > > > > >>>>>> request
>> > > > > > > > >>>>>>>>> queue
>> > > > > > > > >>>>>>>>>>> may
>> > > > > > > > >>>>>>>>>>>>>> have
>> > > > > > > > >>>>>>>>>>>>>>>>> many
>> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common
>> case?
>> > > > > > > > >>> The
>> > > > > > > > >>>>>>>>> controller
>> > > > > > > > >>>>>>>>>>>>>> requests
>> > > > > > > > >>>>>>>>>>>>>>>>> still
>> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
>> > > > > > > > >>> SocketServer
>> > > > > > > > >>>>>> will
>> > > > > > > > >>>>>>>>> mute
>> > > > > > > > >>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>> channel
>> > > > > > > > >>>>>>>>>>>>>>>>>> once
>> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the
>> request
>> > > > > > > > >>>>> channel.
>> > > > > > > > >>>>>>>> So
>> > > > > > > > >>>>>>>>>>>> assuming
>> > > > > > > > >>>>>>>>>>>>>>> there
>> > > > > > > > >>>>>>>>>>>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller
>> and
>> > > > > > > > >>> each
>> > > > > > > > >>>>>>>> broker,
>> > > > > > > > >>>>>>>>> on
>> > > > > > > > >>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>> broker
>> > > > > > > > >>>>>>>>>>>>>>>>>> side,
>> > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller
>> request
>> > > > > > > > >>> in
>> > > > > > > > >>>>> the
>> > > > > > > > >>>>>>>>>>> controller
>> > > > > > > > >>>>>>>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>>>>>>>>> queue
>> > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case,
>> do
>> > > > > > > > >> we
>> > > > > > > > >>>>> need
>> > > > > > > > >>>>>> a
>> > > > > > > > >>>>>>>>>>> separate
>> > > > > > > > >>>>>>>>>>>>>>>>> controller
>> > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The
>> default
>> > > > > > > > >>>> value
>> > > > > > > > >>>>> 20
>> > > > > > > > >>>>>>>>> means
>> > > > > > > > >>>>>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>> we
>> > > > > > > > >>>>>>>>>>>>>>>>> expect
>> > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to
>> happen
>> > > > > > > > >>> in a
>> > > > > > > > >>>>>> short
>> > > > > > > > >>>>>>>>>> period
>> > > > > > > > >>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>> time.
>> > > > > > > > >>>>>>>>>>>>>>>>> I
>> > > > > > > > >>>>>>>>>>>>>>>>>> am
>> > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
>> > > > > > > > >> the
>> > > > > > > > >>>>>>>> controller
>> > > > > > > > >>>>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>>>>>> queue
>> > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it
>> seems
>> > > > > > > > >>>>>> indicating
>> > > > > > > > >>>>>>>>>>> something
>> > > > > > > > >>>>>>>>>>>>>> very
>> > > > > > > > >>>>>>>>>>>>>>>>> wrong
>> > > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong
>> Lin <
>> > > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
>> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
>> > > > > > > > >>> intuitive.
>> > > > > > > > >>>> It
>> > > > > > > > >>>>>>>> will
>> > > > > > > > >>>>>>>>> be
>> > > > > > > > >>>>>>>>>>> good
>> > > > > > > > >>>>>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>>>>>> learn
>> > > > > > > > >>>>>>>>>>>>>>>>>>> more
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other
>> reviewers.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
>> > > > > > > > >>> Wang <
>> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
>> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
>> > > > > > > > >>> the
>> > > > > > > > >>>>> KIP
>> > > > > > > > >>>>>> by
>> > > > > > > > >>>>>>>>>>>> explaining
>> > > > > > > > >>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>> cases
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
>> > > > > > > > >>>>>> comments.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
>> > > > > > > > >>> Wang
>> > > > > > > > >>>> <
>> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
>> > > > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
>> > > > > > > > >>>> merely
>> > > > > > > > >>>>>>>> for me
>> > > > > > > > >>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>> easily
>> > > > > > > > >>>>>>>>>>>>>>>>>>> construct
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> a
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
>> > > > > > > > >> In
>> > > > > > > > >>>>>>>> production,
>> > > > > > > > >>>>>>>>>>> other
>> > > > > > > > >>>>>>>>>>>>>> than
>> > > > > > > > >>>>>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> disk
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
>> > > > > > > > >> by
>> > > > > > > > >>>> high
>> > > > > > > > >>>>>>>>> produce
>> > > > > > > > >>>>>>>>>>> QPS.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
>> > > > > > > > >>> the
>> > > > > > > > >>>>>>>> broker
>> > > > > > > > >>>>>>>>> and
>> > > > > > > > >>>>>>>>>>>>> that's
>> > > > > > > > >>>>>>>>>>>>>>> when
>> > > > > > > > >>>>>>>>>>>>>>>>>> this
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
>> > > > > > > > >>>> about
>> > > > > > > > >>>>>> each
>> > > > > > > > >>>>>>>>>>>>>> ProduceRequest
>> > > > > > > > >>>>>>>>>>>>>>>>>>> covering
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
>> > > > > > > > >>>>> request
>> > > > > > > > >>>>>>>> is
>> > > > > > > > >>>>>>>>>>>> enqueued
>> > > > > > > > >>>>>>>>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>>>>>>> tries
>> > > > > > > > >>>>>>>>>>>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
>> > > > > > > > >> broker0,
>> > > > > > > > >>>>> from
>> > > > > > > > >>>>>>>>> leader
>> > > > > > > > >>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>>>> follower
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
>> > > > > > > > >>>> *test-0*.
>> > > > > > > > >>>>>> For
>> > > > > > > > >>>>>>>> the
>> > > > > > > > >>>>>>>>>>> sake
>> > > > > > > > >>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>>>>> argument,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
>> > > > > > > > >>> say
>> > > > > > > > >>>>>>>> broker1,
>> > > > > > > > >>>>>>>>>> have
>> > > > > > > > >>>>>>>>>>>>>>> *stopped*
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
>> > > > > > > > >>> have
>> > > > > > > > >>>>>> acks =
>> > > > > > > > >>>>>>>>> -1
>> > > > > > > > >>>>>>>>>>>> (ALL)
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
>> > > > > > > > >>>> ProduceRequests
>> > > > > > > > >>>>>>>> ahead
>> > > > > > > > >>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
>> > > > > > > > >>>>>>>>>>>>>>>>>>> will
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> be
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
>> > > > > > > > >>>>> replicated
>> > > > > > > > >>>>>>>> to
>> > > > > > > > >>>>>>>>>> other
>> > > > > > > > >>>>>>>>>>>>>> brokers
>> > > > > > > > >>>>>>>>>>>>>>>>>>> (because
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
>> > > > > > > > >>>>>>>> LeaderAndISR
>> > > > > > > > >>>>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>>>>>>>> processed
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> or
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
>> > > > > > > > >>>>> immediately
>> > > > > > > > >>>>>>>>>>> transition
>> > > > > > > > >>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> partition
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
>> > > > > > > > >>> the
>> > > > > > > > >>>>>>>>>> replication
>> > > > > > > > >>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> remaining
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> 19
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
>> > > > > > > > >>>>> indicating
>> > > > > > > > >>>>>>>> that
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
>> > > > > > > > >>> the
>> > > > > > > > >>>>>>>>> "test-0".
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
>> > > > > > > > >> between
>> > > > > > > > >>>> 1.1
>> > > > > > > > >>>>>> and
>> > > > > > > > >>>>>>>>> 1.2,
>> > > > > > > > >>>>>>>>>>>> let's
>> > > > > > > > >>>>>>>>>>>>>> say
>> > > > > > > > >>>>>>>>>>>>>>>>>> there
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> are
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
>> > > > > > > > >>>>>> LeaderAndISR,
>> > > > > > > > >>>>>>>>> and
>> > > > > > > > >>>>>>>>>>>> there
>> > > > > > > > >>>>>>>>>>>>>> are
>> > > > > > > > >>>>>>>>>>>>>>> 8
>> > > > > > > > >>>>>>>>>>>>>>>>> io
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
>> > > > > > > > >>>>>> approximately
>> > > > > > > > >>>>>>>>> 3000
>> > > > > > > > >>>>>>>>>>>>> produce
>> > > > > > > > >>>>>>>>>>>>>>>>>> requests.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> Now
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
>> > > > > > > > >>>>> finally
>> > > > > > > > >>>>>>>>>> processed
>> > > > > > > > >>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
>> > > > > > > > >> we
>> > > > > > > > >>>>> model
>> > > > > > > > >>>>>>>> the
>> > > > > > > > >>>>>>>>>> time
>> > > > > > > > >>>>>>>>>>>> when
>> > > > > > > > >>>>>>>>>>>>>>> their
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
>> > > > > > > > >>>> ...t2999,
>> > > > > > > > >>>>>> and
>> > > > > > > > >>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> request
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> is
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
>> > > > > > > > >>>> request
>> > > > > > > > >>>>>>>> would
>> > > > > > > > >>>>>>>>>> have
>> > > > > > > > >>>>>>>>>>>>>> waited
>> > > > > > > > >>>>>>>>>>>>>>> an
>> > > > > > > > >>>>>>>>>>>>>>>>>>> extra
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
>> > > > > > > > >>> 2nd
>> > > > > > > > >>>>> an
>> > > > > > > > >>>>>>>> extra
>> > > > > > > > >>>>>>>>>>> time
>> > > > > > > > >>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>>>> t3000 -
>> > > > > > > > >>>>>>>>>>>>>>>>>>> t1,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
>> > > > > > > > >>>> difference
>> > > > > > > > >>>>> is
>> > > > > > > > >>>>>>>>> bigger
>> > > > > > > > >>>>>>>>>>> for
>> > > > > > > > >>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>> earlier
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
>> > > > > > > > >>> ones.
>> > > > > > > > >>>>> For
>> > > > > > > > >>>>>>>> the
>> > > > > > > > >>>>>>>>>> same
>> > > > > > > > >>>>>>>>>>>>>> reason,
>> > > > > > > > >>>>>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> more
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
>> > > > > > > > >>>>> benefit
>> > > > > > > > >>>>>>>> we
>> > > > > > > > >>>>>>>>> get
>> > > > > > > > >>>>>>>>>>>>> (capped
>> > > > > > > > >>>>>>>>>>>>>>> by
>> > > > > > > > >>>>>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
>> > > > > > > > >>> have
>> > > > > > > > >>>>>>>> acks=0 or
>> > > > > > > > >>>>>>>>>>>> acks=1
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
>> > > > > > > > >> differences
>> > > > > > > > >>> in
>> > > > > > > > >>>>>> this
>> > > > > > > > >>>>>>>>> case,
>> > > > > > > > >>>>>>>>>>> but
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
>> > > > > > > > >> of
>> > > > > > > > >>>>>>>> partition
>> > > > > > > > >>>>>>>>>>> test-0
>> > > > > > > > >>>>>>>>>>>> in
>> > > > > > > > >>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
>> > > > > > > > >>> LeaderAndISR
>> > > > > > > > >>>>>> will
>> > > > > > > > >>>>>>>> be
>> > > > > > > > >>>>>>>>>>>> appended
>> > > > > > > > >>>>>>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> local
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
>> > > > > > > > >>> after
>> > > > > > > > >>>>>>>>> processing
>> > > > > > > > >>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
>> > > > > > > > >> of
>> > > > > > > > >>>> data
>> > > > > > > > >>>>>>>> loss
>> > > > > > > > >>>>>>>>> in
>> > > > > > > > >>>>>>>>>>>> terms
>> > > > > > > > >>>>>>>>>>>>> of
>> > > > > > > > >>>>>>>>>>>>>>>>>> messages
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
>> > > > > > > > >>> the
>> > > > > > > > >>>>>> effect
>> > > > > > > > >>>>>>>>>> since
>> > > > > > > > >>>>>>>>>>> if
>> > > > > > > > >>>>>>>>>>>>> the
>> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
>> > > > > > > > >> response
>> > > > > > > > >>> to
>> > > > > > > > >>>>>>>>> producers
>> > > > > > > > >>>>>>>>>>> will
>> > > > > > > > >>>>>>>>>>>>>> have
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
>> > > > > > > > >>> error,
>> > > > > > > > >>>>>>>> causing
>> > > > > > > > >>>>>>>>>>>> producers
>> > > > > > > > >>>>>>>>>>>>>> to
>> > > > > > > > >>>>>>>>>>>>>>>>> retry
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
>> > > > > > > > >>> for
>> > > > > > > > >>>>>>>> reducing
>> >
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi All,

I've updated the KIP by adding the dedicated endpoints for controller
connections,
and pinning threads for controller requests.
Also I've updated the title of this KIP. Please take a look and let me know
your feedback.

Thanks a lot for your time!
Lucas

On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
gharatmayuresh15@gmail.com> wrote:

> Hi Lucas,
> I agree, if we want to go forward with a separate controller plane and data
> plane and completely isolate them, having a separate port for controller
> with a separate Acceptor and a Processor sounds ideal to me.
>
> Thanks,
>
> Mayuresh
>
>
> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com> wrote:
>
> > Hi Lucas,
> >
> > Yes, I agree that a dedicated end to end control flow would be ideal.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Thanks for the comment, Becket.
> > > So far, we've been trying to avoid making any request handler thread
> > > special.
> > > But if we were to follow that path in order to make the two planes more
> > > isolated,
> > > what do you think about also having a dedicated processor thread,
> > > and dedicated port for the controller?
> > >
> > > Today one processor thread can handle multiple connections, let's say
> 100
> > > connections
> > >
> > > represented by connection0, ... connection99, among which
> connection0-98
> > > are from clients, while connection99 is from
> > >
> > > the controller. Further let's say after one selector polling, there are
> > > incoming requests on all connections.
> > >
> > > When the request queue is full, (either the data request being full in
> > the
> > > two queue design, or
> > >
> > > the one single queue being full in the deque design), the processor
> > thread
> > > will be blocked first
> > >
> > > when trying to enqueue the data request from connection0, then possibly
> > > blocked for the data request
> > >
> > > from connection1, ... etc even though the controller request is ready
> to
> > be
> > > enqueued.
> > >
> > > To solve this problem, it seems we would need to have a separate port
> > > dedicated to
> > >
> > > the controller, a dedicated processor thread, a dedicated controller
> > > request queue,
> > >
> > > and pinning of one request handler thread for controller requests.
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > > > Personally I am not fond of the dequeue approach simply because it is
> > > > against the basic idea of isolating the controller plane and data
> > plane.
> > > > With a single dequeue, theoretically speaking the controller requests
> > can
> > > > starve the clients requests. I would prefer the approach with a
> > separate
> > > > controller request queue and a dedicated controller request handler
> > > thread.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Sure, I can summarize the usage of correlation id. But before I do
> > > that,
> > > > it
> > > > > seems
> > > > > the same out-of-order processing can also happen to Produce
> requests
> > > sent
> > > > > by producers,
> > > > > following the same example you described earlier.
> > > > > If that's the case, I think this probably deserves a separate doc
> and
> > > > > design independent of this KIP.
> > > > >
> > > > > Lucas
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hey Lucas,
> > > > > >
> > > > > > Could you update the KIP if you are confident with the approach
> > which
> > > > > uses
> > > > > > correlation id? The idea around correlation id is kind of
> scattered
> > > > > across
> > > > > > multiple emails. It will be useful if other reviews can read the
> > KIP
> > > to
> > > > > > understand the latest proposal.
> > > > > >
> > > > > > Thanks,
> > > > > > Dong
> > > > > >
> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > > > gharatmayuresh15@gmail.com> wrote:
> > > > > >
> > > > > > > I like the idea of the dequeue implementation by Lucas. This
> will
> > > > help
> > > > > us
> > > > > > > avoid additional queue for controller and additional configs in
> > > > Kafka.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > The usage of correlation ID might still be useful to address
> > the
> > > > > cases
> > > > > > > > that the controller epoch and leader epoch check are not
> > > sufficient
> > > > > to
> > > > > > > > guarantee correct behavior. For example, if the controller
> > sends
> > > a
> > > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the
> > > > broker
> > > > > > > > processes it in the reverse order, the replica may still be
> > > wrongly
> > > > > > > > recreated, right?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > > > >
> > > > > > > > > Hmm, since we already use controller epoch and leader epoch
> > for
> > > > > > > properly
> > > > > > > > > caching the latest partition state, do we really need
> > > correlation
> > > > > id
> > > > > > > for
> > > > > > > > > ordering the controller requests?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Lucas and Mayuresh,
> > > > > > > > >>
> > > > > > > > >> Good idea. The correlation id should work.
> > > > > > > > >>
> > > > > > > > >> In the ControllerChannelManager, a request will be resent
> > > until
> > > > a
> > > > > > > > response
> > > > > > > > >> is received. So if the controller to broker connection
> > > > disconnects
> > > > > > > after
> > > > > > > > >> controller sends R1_a, but before the response of R1_a is
> > > > > received,
> > > > > > a
> > > > > > > > >> disconnection may cause the controller to resend R1_b.
> i.e.
> > > > until
> > > > > R1
> > > > > > > is
> > > > > > > > >> acked, R2 won't be sent by the controller.
> > > > > > > > >> This gives two guarantees:
> > > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have been
> > > > > processed
> > > > > > at
> > > > > > > > >> least once.
> > > > > > > > >>
> > > > > > > > >> So on the broker side, with a single thread controller
> > request
> > > > > > > handler,
> > > > > > > > the
> > > > > > > > >> logic should be:
> > > > > > > > >> 1. Process what ever request seen in the controller
> request
> > > > queue
> > > > > > > > >> 2. For the given epoch, drop request if its correlation id
> > is
> > > > > > smaller
> > > > > > > > than
> > > > > > > > >> that of the last processed request.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > > >>
> > > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <
> jun@confluent.io>
> > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> I agree that there is no strong ordering when there are
> > more
> > > > than
> > > > > > one
> > > > > > > > >>> socket connections. Currently, we rely on controllerEpoch
> > and
> > > > > > > > leaderEpoch
> > > > > > > > >>> to ensure that the receiving broker picks up the latest
> > state
> > > > for
> > > > > > > each
> > > > > > > > >>> partition.
> > > > > > > > >>>
> > > > > > > > >>> One potential issue with the dequeue approach is that if
> > the
> > > > > queue
> > > > > > is
> > > > > > > > >> full,
> > > > > > > > >>> there is no guarantee that the controller requests will
> be
> > > > > enqueued
> > > > > > > > >>> quickly.
> > > > > > > > >>>
> > > > > > > > >>> Thanks,
> > > > > > > > >>>
> > > > > > > > >>> Jun
> > > > > > > > >>>
> > > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > > > >>>> wrote:
> > > > > > > > >>>
> > > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> > NetworkClient
> > > > > > > > >> constructor.
> > > > > > > > >>>> Since we reuse the same NetworkClient between Controller
> > and
> > > > the
> > > > > > > > >> broker,
> > > > > > > > >>> a
> > > > > > > > >>>> disconnection should not cause it to reset to 0, in
> which
> > > case
> > > > > it
> > > > > > > can
> > > > > > > > >> be
> > > > > > > > >>>> used to reject obsolete requests.
> > > > > > > > >>>>
> > > > > > > > >>>> Thanks,
> > > > > > > > >>>>
> > > > > > > > >>>> Mayuresh
> > > > > > > > >>>>
> > > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > > >
> > > > > > > > >>> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> @Dong,
> > > > > > > > >>>>> Great example and explanation, thanks!
> > > > > > > > >>>>>
> > > > > > > > >>>>> @All
> > > > > > > > >>>>> Regarding the example given by Dong, it seems even if
> we
> > > use
> > > > a
> > > > > > > queue,
> > > > > > > > >>>> and a
> > > > > > > > >>>>> dedicated controller request handling thread,
> > > > > > > > >>>>> the same result can still happen because R1_a will be
> > sent
> > > on
> > > > > one
> > > > > > > > >>>>> connection, and R1_b & R2 will be sent on a different
> > > > > connection,
> > > > > > > > >>>>> and there is no ordering between different connections
> on
> > > the
> > > > > > > broker
> > > > > > > > >>>> side.
> > > > > > > > >>>>> I was discussing with Mayuresh offline, and it seems
> > > > > correlation
> > > > > > id
> > > > > > > > >>>> within
> > > > > > > > >>>>> the same NetworkClient object is monotonically
> increasing
> > > and
> > > > > > never
> > > > > > > > >>>> reset,
> > > > > > > > >>>>> hence a broker can leverage that to properly reject
> > > obsolete
> > > > > > > > >> requests.
> > > > > > > > >>>>> Thoughts?
> > > > > > > > >>>>>
> > > > > > > > >>>>> Thanks,
> > > > > > > > >>>>> Lucas
> > > > > > > > >>>>>
> > > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >>>>>
> > > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
> > connection
> > > > > > loss, I
> > > > > > > > >>>> think.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Thanks,
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Mayuresh
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > > > >>>>>> wrote:
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>> I agree with Dong that out-of-order processing can
> > happen
> > > > > with
> > > > > > > > >>>> having 2
> > > > > > > > >>>>>>> separate queues as well and it can even happen today.
> > > > > > > > >>>>>>> Can we use the correlationId in the request from the
> > > > > controller
> > > > > > > > >> to
> > > > > > > > >>>> the
> > > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Thanks,
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Mayuresh
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > > > > becket.qin@gmail.com
> > > > > > > > >>>
> > > > > > > > >>>>> wrote:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated
> controller
> > > > > request
> > > > > > > > >>>> handling
> > > > > > > > >>>>>>>> thread would be a better isolation. It also solves
> the
> > > > > > > > >> reordering
> > > > > > > > >>>>> issue.
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > > > >>>>>> wrote:
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>>> Good example. I think this scenario can occur in
> the
> > > > > current
> > > > > > > > >>> code
> > > > > > > > >>>> as
> > > > > > > > >>>>>>>> well
> > > > > > > > >>>>>>>>> but with even lower probability given that there
> are
> > > > other
> > > > > > > > >>>>>>>> non-controller
> > > > > > > > >>>>>>>>> requests interleaved. It is still sketchy though
> and
> > I
> > > > > think
> > > > > > a
> > > > > > > > >>>> safer
> > > > > > > > >>>>>>>>> approach would be separate queues and pinning
> > > controller
> > > > > > > > >> request
> > > > > > > > >>>>>>>> handling
> > > > > > > > >>>>>>>>> to one handler thread.
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > > > > > > >> lindong28@gmail.com
> > > > > > > > >>>>
> > > > > > > > >>>>>> wrote:
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> I think you are right that there may be
> out-of-order
> > > > > > > > >>> processing.
> > > > > > > > >>>>>>>> However,
> > > > > > > > >>>>>>>>>> it seems that out-of-order processing may also
> > happen
> > > > even
> > > > > > > > >> if
> > > > > > > > >>> we
> > > > > > > > >>>>>> use a
> > > > > > > > >>>>>>>>>> separate queue.
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> Here is the example:
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
> > > > > receiving
> > > > > > > > >>>>>> response.
> > > > > > > > >>>>>>>>> Then
> > > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay
> > in
> > > > the
> > > > > > > > >>>>> controller
> > > > > > > > >>>>>>>>>> request queue in the order they are sent.
> > > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and
> then
> > > > > thread2
> > > > > > > > >>>> takes
> > > > > > > > >>>>>> R2
> > > > > > > > >>>>>>>>> from
> > > > > > > > >>>>>>>>>> the request queue almost at the same time.
> > > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There
> is
> > > > > chance
> > > > > > > > >>> that
> > > > > > > > >>>>>> R2's
> > > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> If out-of-order processing can happen for both
> > > > approaches
> > > > > > > > >> with
> > > > > > > > >>>>> very
> > > > > > > > >>>>>>>> low
> > > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add the
> > extra
> > > > > > > > >> queue.
> > > > > > > > >>>> What
> > > > > > > > >>>>>> do
> > > > > > > > >>>>>>>> you
> > > > > > > > >>>>>>>>>> think?
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>> Dong
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > > > > > > >>>> becket.qin@gmail.com
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright
> > up
> > > > some
> > > > > > > > >>> time
> > > > > > > > >>>>> ago
> > > > > > > > >>>>>>>> when
> > > > > > > > >>>>>>>>>> we
> > > > > > > > >>>>>>>>>>> initially thinking of prioritizing the request.
> The
> > > > > > > > >> concern
> > > > > > > > >>>> was
> > > > > > > > >>>>>> that
> > > > > > > > >>>>>>>>> the
> > > > > > > > >>>>>>>>>>> controller requests are supposed to be processed
> in
> > > > > order.
> > > > > > > > >>> If
> > > > > > > > >>>> we
> > > > > > > > >>>>>> can
> > > > > > > > >>>>>>>>>> ensure
> > > > > > > > >>>>>>>>>>> that there is one controller request in the
> request
> > > > > > > > >> channel,
> > > > > > > > >>>> the
> > > > > > > > >>>>>>>> order
> > > > > > > > >>>>>>>>> is
> > > > > > > > >>>>>>>>>>> not a concern. But in cases that there are more
> > than
> > > > one
> > > > > > > > >>>>>> controller
> > > > > > > > >>>>>>>>>> request
> > > > > > > > >>>>>>>>>>> inserted into the queue, the controller request
> > order
> > > > may
> > > > > > > > >>>> change
> > > > > > > > >>>>>> and
> > > > > > > > >>>>>>>>>> cause
> > > > > > > > >>>>>>>>>>> problem. For example, think about the following
> > > > sequence:
> > > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to
> > > broker
> > > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the
> > head
> > > > of
> > > > > > > > >> the
> > > > > > > > >>>>>> request
> > > > > > > > >>>>>>>>>> queue.
> > > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > > > > > > >> controller
> > > > > > > > >>>>>>>>> reconnected
> > > > > > > > >>>>>>>>>> to
> > > > > > > > >>>>>>>>>>> the broker.
> > > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of
> the
> > > > > > > > >> request
> > > > > > > > >>>>> queue.
> > > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed
> before
> > > R1
> > > > is
> > > > > > > > >>>>>> processed,
> > > > > > > > >>>>>>>>>> which
> > > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > > > >>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a
> > > > simpler
> > > > > > > > >>>> less
> > > > > > > > >>>>>>>>> invasive
> > > > > > > > >>>>>>>>>>>> alternative and it should work.
> Jun/Becket/others,
> > > do
> > > > > > > > >> you
> > > > > > > > >>>> see
> > > > > > > > >>>>>> any
> > > > > > > > >>>>>>>>>>> pitfalls
> > > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > > > >>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't
> > > thought
> > > > > > > > >>>>> before.
> > > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty
> > well,
> > > > and
> > > > > > > > >>>> also
> > > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and
> > > > capacity
> > > > > > > > >>>>> config
> > > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if
> we
> > > were
> > > > > > > > >> to
> > > > > > > > >>>>> adopt
> > > > > > > > >>>>>>>>>>>>> this design, there is no public interface
> change,
> > > and
> > > > > > > > >> we
> > > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily
> > > > satisfy
> > > > > > > > >>> the
> > > > > > > > >>>>>>>>>> requirement
> > > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing
> > > inserting
> > > > > > > > >> at
> > > > > > > > >>>>> both
> > > > > > > > >>>>>>>> ends.
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> My only concern is that this design is tied to
> > the
> > > > > > > > >>>>> coincidence
> > > > > > > > >>>>>>>> that
> > > > > > > > >>>>>>>>>>>>> we have two request priorities and there are
> two
> > > ends
> > > > > > > > >>> to a
> > > > > > > > >>>>>>>> deque.
> > > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems
> the
> > > > > > > > >> network
> > > > > > > > >>>>> layer
> > > > > > > > >>>>>>>> is
> > > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic,
> e.g.
> > > if
> > > > > > > > >> we
> > > > > > > > >>>> were
> > > > > > > > >>>>>> to
> > > > > > > > >>>>>>>> add
> > > > > > > > >>>>>>>>>>>>> an extra priority level in the future for some
> > > > reason,
> > > > > > > > >>> we
> > > > > > > > >>>>>> would
> > > > > > > > >>>>>>>>>>> probably
> > > > > > > > >>>>>>>>>>>>> need to go back to the design of separate
> queues,
> > > one
> > > > > > > > >>> for
> > > > > > > > >>>>> each
> > > > > > > > >>>>>>>>>> priority
> > > > > > > > >>>>>>>>>>>>> level.
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean
> > > toward
> > > > > > > > >>> your
> > > > > > > > >>>>>>>> suggested
> > > > > > > > >>>>>>>>>>>>> approach.
> > > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design,
> I'm
> > > > > > > > >>> answering
> > > > > > > > >>>>>> your
> > > > > > > > >>>>>>>>>>> question
> > > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
> > suggestion
> > > > > > > > >> makes
> > > > > > > > >>>>>> sense,
> > > > > > > > >>>>>>>> and
> > > > > > > > >>>>>>>>>> I'm
> > > > > > > > >>>>>>>>>>>> ok
> > > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > > > > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > > > > > > >> sufficient
> > > > > > > > >>>>>> enough.
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh
> Gharat
> > <
> > > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to
> prioritize
> > > the
> > > > > > > > >>>>>>>> controller
> > > > > > > > >>>>>>>>>>> request
> > > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > > > >>>>>>>>>>>>>> In that case, we can change the request queue
> > to a
> > > > > > > > >>>>> dequeue,
> > > > > > > > >>>>>>>> where
> > > > > > > > >>>>>>>>>> you
> > > > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > > > > > > >>>> consume,..etc)
> > > > > > > > >>>>>> to
> > > > > > > > >>>>>>>> the
> > > > > > > > >>>>>>>>>> end
> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request,
> > you
> > > > > > > > >>> insert
> > > > > > > > >>>>> it
> > > > > > > > >>>>>> to
> > > > > > > > >>>>>>>>> the
> > > > > > > > >>>>>>>>>>> head
> > > > > > > > >>>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller
> > > request
> > > > > > > > >>> will
> > > > > > > > >>>>> be
> > > > > > > > >>>>>>>> given
> > > > > > > > >>>>>>>>>>>> higher
> > > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> Also since we only read one request from the
> > > socket
> > > > > > > > >>> and
> > > > > > > > >>>>> mute
> > > > > > > > >>>>>>>> it
> > > > > > > > >>>>>>>>> and
> > > > > > > > >>>>>>>>>>>> only
> > > > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this
> would
> > > > > > > > >>> ensure
> > > > > > > > >>>>> that
> > > > > > > > >>>>>>>> we
> > > > > > > > >>>>>>>>>> don't
> > > > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second
> queue
> > > and
> > > > > > > > >>> the
> > > > > > > > >>>>>>>>> additional
> > > > > > > > >>>>>>>>>>>>> config
> > > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > > > >>>>>>>>>>
> > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > > > > > > >>> current
> > > > > > > > >>>>>> design
> > > > > > > > >>>>>>>>>> makes
> > > > > > > > >>>>>>>>>>>>> sense.
> > > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config
> > for
> > > > > > > > >> the
> > > > > > > > >>>>>>>> controller
> > > > > > > > >>>>>>>>>>> queue
> > > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a
> case
> > > in
> > > > > > > > >>>> which
> > > > > > > > >>>>>>>> users
> > > > > > > > >>>>>>>>>>> would
> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do
> we
> > > > > > > > >>>> expect
> > > > > > > > >>>>>>>> user to
> > > > > > > > >>>>>>>>>>> ever
> > > > > > > > >>>>>>>>>>>>>> change
> > > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If
> we
> > > > > > > > >>> agree
> > > > > > > > >>>>> that
> > > > > > > > >>>>>>>> 20
> > > > > > > > >>>>>>>>> is
> > > > > > > > >>>>>>>>>>>>> already
> > > > > > > > >>>>>>>>>>>>>> a
> > > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > > > > > > >> expect
> > > > > > > > >>>> user
> > > > > > > > >>>>>> to
> > > > > > > > >>>>>>>>>> change
> > > > > > > > >>>>>>>>>>>> it,
> > > > > > > > >>>>>>>>>>>>> is
> > > > > > > > >>>>>>>>>>>>>>> it
> > > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang
> <
> > > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > > >>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right
> that
> > > > > > > > >>>>> normally
> > > > > > > > >>>>>>>> there
> > > > > > > > >>>>>>>>>>>> should
> > > > > > > > >>>>>>>>>>>>> be
> > > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would
> be
> > > > > > > > >> many
> > > > > > > > >>>>>>>> enqueued
> > > > > > > > >>>>>>>>>>>>> controller
> > > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not
> > sure
> > > > > > > > >>>> which
> > > > > > > > >>>>>> part
> > > > > > > > >>>>>>>>>>> conveys
> > > > > > > > >>>>>>>>>>>>> that
> > > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out
> > the
> > > > > > > > >>>>> section.
> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > > > > > > >>>> conditions,
> > > > > > > > >>>>>> the
> > > > > > > > >>>>>>>>>> current
> > > > > > > > >>>>>>>>>>>>>> design
> > > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers
> > running
> > > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have
> the
> > > > > > > > >>>>> controller
> > > > > > > > >>>>>>>>> queue
> > > > > > > > >>>>>>>>>>>>> capacity
> > > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be
> 1,
> > > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > > > > > > >> different
> > > > > > > > >>>>>>>> controllers
> > > > > > > > >>>>>>>>>>> will
> > > > > > > > >>>>>>>>>>>> be
> > > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > > > > > > >> other
> > > > > > > > >>>>> hand,
> > > > > > > > >>>>>>>>> adding
> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards
> > us
> > > > > > > > >>> from
> > > > > > > > >>>>>>>> issues
> > > > > > > > >>>>>>>>> in
> > > > > > > > >>>>>>>>>>>> those
> > > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > > > > > > >>>> downside
> > > > > > > > >>>>> of
> > > > > > > > >>>>>>>>> adding
> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > > > > > > >>> statement
> > > > > > > > >>>>>> based
> > > > > > > > >>>>>>>> on
> > > > > > > > >>>>>>>>> a
> > > > > > > > >>>>>>>>>>>>> previous
> > > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the
> KIP.
> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > > > > > > >>> Gharat <
> > > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think
> "The
> > > > > > > > >>>> memory
> > > > > > > > >>>>>>>>>>> consumption
> > > > > > > > >>>>>>>>>>>>> can
> > > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests
> > can
> > > > > > > > >>> go
> > > > > > > > >>>> up
> > > > > > > > >>>>>> to
> > > > > > > > >>>>>>>> 2x"
> > > > > > > > >>>>>>>>>> in
> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > > > > > > >> controller
> > > > > > > > >>>> to a
> > > > > > > > >>>>>>>> Broker
> > > > > > > > >>>>>>>>>> are
> > > > > > > > >>>>>>>>>>>> not
> > > > > > > > >>>>>>>>>>>>>>> high
> > > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket
> Qin <
> > > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > > > > > > >>>> control
> > > > > > > > >>>>>>>> plane
> > > > > > > > >>>>>>>>>> from
> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>> data
> > > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > > > > > > >> controller
> > > > > > > > >>>>>> request
> > > > > > > > >>>>>>>>> queue
> > > > > > > > >>>>>>>>>>> may
> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common
> case?
> > > > > > > > >>> The
> > > > > > > > >>>>>>>>> controller
> > > > > > > > >>>>>>>>>>>>>> requests
> > > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > > > > > > >>> SocketServer
> > > > > > > > >>>>>> will
> > > > > > > > >>>>>>>>> mute
> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the
> request
> > > > > > > > >>>>> channel.
> > > > > > > > >>>>>>>> So
> > > > > > > > >>>>>>>>>>>> assuming
> > > > > > > > >>>>>>>>>>>>>>> there
> > > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller
> and
> > > > > > > > >>> each
> > > > > > > > >>>>>>>> broker,
> > > > > > > > >>>>>>>>> on
> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>> broker
> > > > > > > > >>>>>>>>>>>>>>>>>> side,
> > > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller
> request
> > > > > > > > >>> in
> > > > > > > > >>>>> the
> > > > > > > > >>>>>>>>>>> controller
> > > > > > > > >>>>>>>>>>>>>>> request
> > > > > > > > >>>>>>>>>>>>>>>>>> queue
> > > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case,
> do
> > > > > > > > >> we
> > > > > > > > >>>>> need
> > > > > > > > >>>>>> a
> > > > > > > > >>>>>>>>>>> separate
> > > > > > > > >>>>>>>>>>>>>>>>> controller
> > > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The
> default
> > > > > > > > >>>> value
> > > > > > > > >>>>> 20
> > > > > > > > >>>>>>>>> means
> > > > > > > > >>>>>>>>>>> that
> > > > > > > > >>>>>>>>>>>>> we
> > > > > > > > >>>>>>>>>>>>>>>>> expect
> > > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to
> happen
> > > > > > > > >>> in a
> > > > > > > > >>>>>> short
> > > > > > > > >>>>>>>>>> period
> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>> time.
> > > > > > > > >>>>>>>>>>>>>>>>> I
> > > > > > > > >>>>>>>>>>>>>>>>>> am
> > > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > > > > > > >> the
> > > > > > > > >>>>>>>> controller
> > > > > > > > >>>>>>>>>>>> request
> > > > > > > > >>>>>>>>>>>>>>> queue
> > > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > > > > > > >>>>>> indicating
> > > > > > > > >>>>>>>>>>> something
> > > > > > > > >>>>>>>>>>>>>> very
> > > > > > > > >>>>>>>>>>>>>>>>> wrong
> > > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong
> Lin <
> > > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > > > > > > >>> intuitive.
> > > > > > > > >>>> It
> > > > > > > > >>>>>>>> will
> > > > > > > > >>>>>>>>> be
> > > > > > > > >>>>>>>>>>> good
> > > > > > > > >>>>>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>>>>>> learn
> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > > > > > > >>> Wang <
> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > > > > > > >>> the
> > > > > > > > >>>>> KIP
> > > > > > > > >>>>>> by
> > > > > > > > >>>>>>>>>>>> explaining
> > > > > > > > >>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>> cases
> > > > > > > > >>>>>>>>>>>>>>>>>>>> that
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > > > > > > >>>>>> comments.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > > > > > > >>> Wang
> > > > > > > > >>>> <
> > > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > > > > > > >>>> merely
> > > > > > > > >>>>>>>> for me
> > > > > > > > >>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>> easily
> > > > > > > > >>>>>>>>>>>>>>>>>>> construct
> > > > > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > > > > > > >> In
> > > > > > > > >>>>>>>> production,
> > > > > > > > >>>>>>>>>>> other
> > > > > > > > >>>>>>>>>>>>>> than
> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> disk
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > > > > > > >> by
> > > > > > > > >>>> high
> > > > > > > > >>>>>>>>> produce
> > > > > > > > >>>>>>>>>>> QPS.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > > > > > > >>> the
> > > > > > > > >>>>>>>> broker
> > > > > > > > >>>>>>>>> and
> > > > > > > > >>>>>>>>>>>>> that's
> > > > > > > > >>>>>>>>>>>>>>> when
> > > > > > > > >>>>>>>>>>>>>>>>>> this
> > > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > > > > > > >>>> about
> > > > > > > > >>>>>> each
> > > > > > > > >>>>>>>>>>>>>> ProduceRequest
> > > > > > > > >>>>>>>>>>>>>>>>>>> covering
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > > > > > > >>>>> request
> > > > > > > > >>>>>>>> is
> > > > > > > > >>>>>>>>>>>> enqueued
> > > > > > > > >>>>>>>>>>>>>> that
> > > > > > > > >>>>>>>>>>>>>>>>>> tries
> > > > > > > > >>>>>>>>>>>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > > > > > > >> broker0,
> > > > > > > > >>>>> from
> > > > > > > > >>>>>>>>> leader
> > > > > > > > >>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>>>> follower
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > > > > > > >>>> *test-0*.
> > > > > > > > >>>>>> For
> > > > > > > > >>>>>>>> the
> > > > > > > > >>>>>>>>>>> sake
> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>>>>> argument,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > > > > > > >>> say
> > > > > > > > >>>>>>>> broker1,
> > > > > > > > >>>>>>>>>> have
> > > > > > > > >>>>>>>>>>>>>>> *stopped*
> > > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > > > > > > >>> have
> > > > > > > > >>>>>> acks =
> > > > > > > > >>>>>>>>> -1
> > > > > > > > >>>>>>>>>>>> (ALL)
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > > > > > > >>>> ProduceRequests
> > > > > > > > >>>>>>>> ahead
> > > > > > > > >>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > > > > >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > > > > > > >>>>> replicated
> > > > > > > > >>>>>>>> to
> > > > > > > > >>>>>>>>>> other
> > > > > > > > >>>>>>>>>>>>>> brokers
> > > > > > > > >>>>>>>>>>>>>>>>>>> (because
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > > > > > > >>>>>>>> LeaderAndISR
> > > > > > > > >>>>>>>>>>>> request
> > > > > > > > >>>>>>>>>>>>> is
> > > > > > > > >>>>>>>>>>>>>>>>>>> processed
> > > > > > > > >>>>>>>>>>>>>>>>>>>> or
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > > > > > > >>>>> immediately
> > > > > > > > >>>>>>>>>>> transition
> > > > > > > > >>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> partition
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > > > > > > >>> the
> > > > > > > > >>>>>>>>>> replication
> > > > > > > > >>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> remaining
> > > > > > > > >>>>>>>>>>>>>>>>>>>> 19
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > > > > > > >>>>> indicating
> > > > > > > > >>>>>>>> that
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > > > > > > >>> the
> > > > > > > > >>>>>>>>> "test-0".
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > > > > > > >> between
> > > > > > > > >>>> 1.1
> > > > > > > > >>>>>> and
> > > > > > > > >>>>>>>>> 1.2,
> > > > > > > > >>>>>>>>>>>> let's
> > > > > > > > >>>>>>>>>>>>>> say
> > > > > > > > >>>>>>>>>>>>>>>>>> there
> > > > > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > > > > > > >>>>>> LeaderAndISR,
> > > > > > > > >>>>>>>>> and
> > > > > > > > >>>>>>>>>>>> there
> > > > > > > > >>>>>>>>>>>>>> are
> > > > > > > > >>>>>>>>>>>>>>> 8
> > > > > > > > >>>>>>>>>>>>>>>>> io
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > > > > > > >>>>>> approximately
> > > > > > > > >>>>>>>>> 3000
> > > > > > > > >>>>>>>>>>>>> produce
> > > > > > > > >>>>>>>>>>>>>>>>>> requests.
> > > > > > > > >>>>>>>>>>>>>>>>>>>> Now
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > > > > > > >>>>> finally
> > > > > > > > >>>>>>>>>> processed
> > > > > > > > >>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > > > > > > >> we
> > > > > > > > >>>>> model
> > > > > > > > >>>>>>>> the
> > > > > > > > >>>>>>>>>> time
> > > > > > > > >>>>>>>>>>>> when
> > > > > > > > >>>>>>>>>>>>>>> their
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > > > > > > >>>> ...t2999,
> > > > > > > > >>>>>> and
> > > > > > > > >>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > > >>>>>>>>>>>>>>>>>>>> request
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> is
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > > > > > > >>>> request
> > > > > > > > >>>>>>>> would
> > > > > > > > >>>>>>>>>> have
> > > > > > > > >>>>>>>>>>>>>> waited
> > > > > > > > >>>>>>>>>>>>>>> an
> > > > > > > > >>>>>>>>>>>>>>>>>>> extra
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > > > > > > >>> 2nd
> > > > > > > > >>>>> an
> > > > > > > > >>>>>>>> extra
> > > > > > > > >>>>>>>>>>> time
> > > > > > > > >>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>>>> t3000 -
> > > > > > > > >>>>>>>>>>>>>>>>>>> t1,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > > > > > > >>>> difference
> > > > > > > > >>>>> is
> > > > > > > > >>>>>>>>> bigger
> > > > > > > > >>>>>>>>>>> for
> > > > > > > > >>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>> earlier
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > > > > > > >>> ones.
> > > > > > > > >>>>> For
> > > > > > > > >>>>>>>> the
> > > > > > > > >>>>>>>>>> same
> > > > > > > > >>>>>>>>>>>>>> reason,
> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > > > > > > >>>>> benefit
> > > > > > > > >>>>>>>> we
> > > > > > > > >>>>>>>>> get
> > > > > > > > >>>>>>>>>>>>> (capped
> > > > > > > > >>>>>>>>>>>>>>> by
> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > > > > > > >>> have
> > > > > > > > >>>>>>>> acks=0 or
> > > > > > > > >>>>>>>>>>>> acks=1
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > > > > > > >> differences
> > > > > > > > >>> in
> > > > > > > > >>>>>> this
> > > > > > > > >>>>>>>>> case,
> > > > > > > > >>>>>>>>>>> but
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > > > > > > >> of
> > > > > > > > >>>>>>>> partition
> > > > > > > > >>>>>>>>>>> test-0
> > > > > > > > >>>>>>>>>>>> in
> > > > > > > > >>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > > > > > > >>> LeaderAndISR
> > > > > > > > >>>>>> will
> > > > > > > > >>>>>>>> be
> > > > > > > > >>>>>>>>>>>> appended
> > > > > > > > >>>>>>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> local
> > > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > > > > > > >>> after
> > > > > > > > >>>>>>>>> processing
> > > > > > > > >>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > > > > > > >> of
> > > > > > > > >>>> data
> > > > > > > > >>>>>>>> loss
> > > > > > > > >>>>>>>>> in
> > > > > > > > >>>>>>>>>>>> terms
> > > > > > > > >>>>>>>>>>>>> of
> > > > > > > > >>>>>>>>>>>>>>>>>> messages
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > > > > > > >>> the
> > > > > > > > >>>>>> effect
> > > > > > > > >>>>>>>>>> since
> > > > > > > > >>>>>>>>>>> if
> > > > > > > > >>>>>>>>>>>>> the
> > > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > > > > > > >> response
> > > > > > > > >>> to
> > > > > > > > >>>>>>>>> producers
> > > > > > > > >>>>>>>>>>> will
> > > > > > > > >>>>>>>>>>>>>> have
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > > > > > > >>> error,
> > > > > > > > >>>>>>>> causing
> > > > > > > > >>>>>>>>>>>> producers
> > > > > > > > >>>>>>>>>>>>>> to
> > > > > > > > >>>>>>>>>>>>>>>>> retry
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > > > > > > >>> for
> > > > > > > > >>>>>>>> reducing
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

Hi Lucas,
I agree, if we want to go forward with a separate controller plane and data
plane and completely isolate them, having a separate port for controller
with a separate Acceptor and a Processor sounds ideal to me.

Thanks,

Mayuresh


On Mon, Jul 23, 2018 at 11:04 PM Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> Yes, I agree that a dedicated end to end control flow would be ideal.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the comment, Becket.
> > So far, we've been trying to avoid making any request handler thread
> > special.
> > But if we were to follow that path in order to make the two planes more
> > isolated,
> > what do you think about also having a dedicated processor thread,
> > and dedicated port for the controller?
> >
> > Today one processor thread can handle multiple connections, let's say 100
> > connections
> >
> > represented by connection0, ... connection99, among which connection0-98
> > are from clients, while connection99 is from
> >
> > the controller. Further let's say after one selector polling, there are
> > incoming requests on all connections.
> >
> > When the request queue is full, (either the data request being full in
> the
> > two queue design, or
> >
> > the one single queue being full in the deque design), the processor
> thread
> > will be blocked first
> >
> > when trying to enqueue the data request from connection0, then possibly
> > blocked for the data request
> >
> > from connection1, ... etc even though the controller request is ready to
> be
> > enqueued.
> >
> > To solve this problem, it seems we would need to have a separate port
> > dedicated to
> >
> > the controller, a dedicated processor thread, a dedicated controller
> > request queue,
> >
> > and pinning of one request handler thread for controller requests.
> >
> > Thanks,
> > Lucas
> >
> >
> > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Personally I am not fond of the dequeue approach simply because it is
> > > against the basic idea of isolating the controller plane and data
> plane.
> > > With a single dequeue, theoretically speaking the controller requests
> can
> > > starve the clients requests. I would prefer the approach with a
> separate
> > > controller request queue and a dedicated controller request handler
> > thread.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Sure, I can summarize the usage of correlation id. But before I do
> > that,
> > > it
> > > > seems
> > > > the same out-of-order processing can also happen to Produce requests
> > sent
> > > > by producers,
> > > > following the same example you described earlier.
> > > > If that's the case, I think this probably deserves a separate doc and
> > > > design independent of this KIP.
> > > >
> > > > Lucas
> > > >
> > > >
> > > >
> > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > > >
> > > > > Hey Lucas,
> > > > >
> > > > > Could you update the KIP if you are confident with the approach
> which
> > > > uses
> > > > > correlation id? The idea around correlation id is kind of scattered
> > > > across
> > > > > multiple emails. It will be useful if other reviews can read the
> KIP
> > to
> > > > > understand the latest proposal.
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com> wrote:
> > > > >
> > > > > > I like the idea of the dequeue implementation by Lucas. This will
> > > help
> > > > us
> > > > > > avoid additional queue for controller and additional configs in
> > > Kafka.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > The usage of correlation ID might still be useful to address
> the
> > > > cases
> > > > > > > that the controller epoch and leader epoch check are not
> > sufficient
> > > > to
> > > > > > > guarantee correct behavior. For example, if the controller
> sends
> > a
> > > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the
> > > broker
> > > > > > > processes it in the reverse order, the replica may still be
> > wrongly
> > > > > > > recreated, right?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > > >
> > > > > > > > Hmm, since we already use controller epoch and leader epoch
> for
> > > > > > properly
> > > > > > > > caching the latest partition state, do we really need
> > correlation
> > > > id
> > > > > > for
> > > > > > > > ordering the controller requests?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Lucas and Mayuresh,
> > > > > > > >>
> > > > > > > >> Good idea. The correlation id should work.
> > > > > > > >>
> > > > > > > >> In the ControllerChannelManager, a request will be resent
> > until
> > > a
> > > > > > > response
> > > > > > > >> is received. So if the controller to broker connection
> > > disconnects
> > > > > > after
> > > > > > > >> controller sends R1_a, but before the response of R1_a is
> > > > received,
> > > > > a
> > > > > > > >> disconnection may cause the controller to resend R1_b. i.e.
> > > until
> > > > R1
> > > > > > is
> > > > > > > >> acked, R2 won't be sent by the controller.
> > > > > > > >> This gives two guarantees:
> > > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > > >> 2. On the broker side, when R2 is seen, R1 must have been
> > > > processed
> > > > > at
> > > > > > > >> least once.
> > > > > > > >>
> > > > > > > >> So on the broker side, with a single thread controller
> request
> > > > > > handler,
> > > > > > > the
> > > > > > > >> logic should be:
> > > > > > > >> 1. Process what ever request seen in the controller request
> > > queue
> > > > > > > >> 2. For the given epoch, drop request if its correlation id
> is
> > > > > smaller
> > > > > > > than
> > > > > > > >> that of the last processed request.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > >>
> > > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io>
> > > > wrote:
> > > > > > > >>
> > > > > > > >>> I agree that there is no strong ordering when there are
> more
> > > than
> > > > > one
> > > > > > > >>> socket connections. Currently, we rely on controllerEpoch
> and
> > > > > > > leaderEpoch
> > > > > > > >>> to ensure that the receiving broker picks up the latest
> state
> > > for
> > > > > > each
> > > > > > > >>> partition.
> > > > > > > >>>
> > > > > > > >>> One potential issue with the dequeue approach is that if
> the
> > > > queue
> > > > > is
> > > > > > > >> full,
> > > > > > > >>> there is no guarantee that the controller requests will be
> > > > enqueued
> > > > > > > >>> quickly.
> > > > > > > >>>
> > > > > > > >>> Thanks,
> > > > > > > >>>
> > > > > > > >>> Jun
> > > > > > > >>>
> > > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > > >>>> wrote:
> > > > > > > >>>
> > > > > > > >>>> Yea, the correlationId is only set to 0 in the
> NetworkClient
> > > > > > > >> constructor.
> > > > > > > >>>> Since we reuse the same NetworkClient between Controller
> and
> > > the
> > > > > > > >> broker,
> > > > > > > >>> a
> > > > > > > >>>> disconnection should not cause it to reset to 0, in which
> > case
> > > > it
> > > > > > can
> > > > > > > >> be
> > > > > > > >>>> used to reject obsolete requests.
> > > > > > > >>>>
> > > > > > > >>>> Thanks,
> > > > > > > >>>>
> > > > > > > >>>> Mayuresh
> > > > > > > >>>>
> > > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > >>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> @Dong,
> > > > > > > >>>>> Great example and explanation, thanks!
> > > > > > > >>>>>
> > > > > > > >>>>> @All
> > > > > > > >>>>> Regarding the example given by Dong, it seems even if we
> > use
> > > a
> > > > > > queue,
> > > > > > > >>>> and a
> > > > > > > >>>>> dedicated controller request handling thread,
> > > > > > > >>>>> the same result can still happen because R1_a will be
> sent
> > on
> > > > one
> > > > > > > >>>>> connection, and R1_b & R2 will be sent on a different
> > > > connection,
> > > > > > > >>>>> and there is no ordering between different connections on
> > the
> > > > > > broker
> > > > > > > >>>> side.
> > > > > > > >>>>> I was discussing with Mayuresh offline, and it seems
> > > > correlation
> > > > > id
> > > > > > > >>>> within
> > > > > > > >>>>> the same NetworkClient object is monotonically increasing
> > and
> > > > > never
> > > > > > > >>>> reset,
> > > > > > > >>>>> hence a broker can leverage that to properly reject
> > obsolete
> > > > > > > >> requests.
> > > > > > > >>>>> Thoughts?
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks,
> > > > > > > >>>>> Lucas
> > > > > > > >>>>>
> > > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Actually nvm, correlationId is reset in case of
> connection
> > > > > loss, I
> > > > > > > >>>> think.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Thanks,
> > > > > > > >>>>>>
> > > > > > > >>>>>> Mayuresh
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> I agree with Dong that out-of-order processing can
> happen
> > > > with
> > > > > > > >>>> having 2
> > > > > > > >>>>>>> separate queues as well and it can even happen today.
> > > > > > > >>>>>>> Can we use the correlationId in the request from the
> > > > controller
> > > > > > > >> to
> > > > > > > >>>> the
> > > > > > > >>>>>>> broker to handle ordering ?
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Thanks,
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Mayuresh
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > > > becket.qin@gmail.com
> > > > > > > >>>
> > > > > > > >>>>> wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller
> > > > request
> > > > > > > >>>> handling
> > > > > > > >>>>>>>> thread would be a better isolation. It also solves the
> > > > > > > >> reordering
> > > > > > > >>>>> issue.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > > > > > >> jjkoshy.w@gmail.com>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> Good example. I think this scenario can occur in the
> > > > current
> > > > > > > >>> code
> > > > > > > >>>> as
> > > > > > > >>>>>>>> well
> > > > > > > >>>>>>>>> but with even lower probability given that there are
> > > other
> > > > > > > >>>>>>>> non-controller
> > > > > > > >>>>>>>>> requests interleaved. It is still sketchy though and
> I
> > > > think
> > > > > a
> > > > > > > >>>> safer
> > > > > > > >>>>>>>>> approach would be separate queues and pinning
> > controller
> > > > > > > >> request
> > > > > > > >>>>>>>> handling
> > > > > > > >>>>>>>>> to one handler thread.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > > > > > >> lindong28@gmail.com
> > > > > > > >>>>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>> Hey Becket,
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I think you are right that there may be out-of-order
> > > > > > > >>> processing.
> > > > > > > >>>>>>>> However,
> > > > > > > >>>>>>>>>> it seems that out-of-order processing may also
> happen
> > > even
> > > > > > > >> if
> > > > > > > >>> we
> > > > > > > >>>>>> use a
> > > > > > > >>>>>>>>>> separate queue.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Here is the example:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
> > > > receiving
> > > > > > > >>>>>> response.
> > > > > > > >>>>>>>>> Then
> > > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay
> in
> > > the
> > > > > > > >>>>> controller
> > > > > > > >>>>>>>>>> request queue in the order they are sent.
> > > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then
> > > > thread2
> > > > > > > >>>> takes
> > > > > > > >>>>>> R2
> > > > > > > >>>>>>>>> from
> > > > > > > >>>>>>>>>> the request queue almost at the same time.
> > > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is
> > > > chance
> > > > > > > >>> that
> > > > > > > >>>>>> R2's
> > > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> If out-of-order processing can happen for both
> > > approaches
> > > > > > > >> with
> > > > > > > >>>>> very
> > > > > > > >>>>>>>> low
> > > > > > > >>>>>>>>>> probability, it may not be worthwhile to add the
> extra
> > > > > > > >> queue.
> > > > > > > >>>> What
> > > > > > > >>>>>> do
> > > > > > > >>>>>>>> you
> > > > > > > >>>>>>>>>> think?
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>> Dong
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > > > > > >>>> becket.qin@gmail.com
> > > > > > > >>>>>>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright
> up
> > > some
> > > > > > > >>> time
> > > > > > > >>>>> ago
> > > > > > > >>>>>>>> when
> > > > > > > >>>>>>>>>> we
> > > > > > > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > > > > > > >> concern
> > > > > > > >>>> was
> > > > > > > >>>>>> that
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>> controller requests are supposed to be processed in
> > > > order.
> > > > > > > >>> If
> > > > > > > >>>> we
> > > > > > > >>>>>> can
> > > > > > > >>>>>>>>>> ensure
> > > > > > > >>>>>>>>>>> that there is one controller request in the request
> > > > > > > >> channel,
> > > > > > > >>>> the
> > > > > > > >>>>>>>> order
> > > > > > > >>>>>>>>> is
> > > > > > > >>>>>>>>>>> not a concern. But in cases that there are more
> than
> > > one
> > > > > > > >>>>>> controller
> > > > > > > >>>>>>>>>> request
> > > > > > > >>>>>>>>>>> inserted into the queue, the controller request
> order
> > > may
> > > > > > > >>>> change
> > > > > > > >>>>>> and
> > > > > > > >>>>>>>>>> cause
> > > > > > > >>>>>>>>>>> problem. For example, think about the following
> > > sequence:
> > > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to
> > broker
> > > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the
> head
> > > of
> > > > > > > >> the
> > > > > > > >>>>>> request
> > > > > > > >>>>>>>>>> queue.
> > > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > > > > > >> controller
> > > > > > > >>>>>>>>> reconnected
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>> the broker.
> > > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > > > > > > >> request
> > > > > > > >>>>> queue.
> > > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before
> > R1
> > > is
> > > > > > > >>>>>> processed,
> > > > > > > >>>>>>>>>> which
> > > > > > > >>>>>>>>>>> may cause problem.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a
> > > simpler
> > > > > > > >>>> less
> > > > > > > >>>>>>>>> invasive
> > > > > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others,
> > do
> > > > > > > >> you
> > > > > > > >>>> see
> > > > > > > >>>>>> any
> > > > > > > >>>>>>>>>>> pitfalls
> > > > > > > >>>>>>>>>>>> with this approach?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't
> > thought
> > > > > > > >>>>> before.
> > > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty
> well,
> > > and
> > > > > > > >>>> also
> > > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and
> > > capacity
> > > > > > > >>>>> config
> > > > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we
> > were
> > > > > > > >> to
> > > > > > > >>>>> adopt
> > > > > > > >>>>>>>>>>>>> this design, there is no public interface change,
> > and
> > > > > > > >> we
> > > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily
> > > satisfy
> > > > > > > >>> the
> > > > > > > >>>>>>>>>> requirement
> > > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing
> > inserting
> > > > > > > >> at
> > > > > > > >>>>> both
> > > > > > > >>>>>>>> ends.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> My only concern is that this design is tied to
> the
> > > > > > > >>>>> coincidence
> > > > > > > >>>>>>>> that
> > > > > > > >>>>>>>>>>>>> we have two request priorities and there are two
> > ends
> > > > > > > >>> to a
> > > > > > > >>>>>>>> deque.
> > > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > > > > > > >> network
> > > > > > > >>>>> layer
> > > > > > > >>>>>>>> is
> > > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g.
> > if
> > > > > > > >> we
> > > > > > > >>>> were
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> add
> > > > > > > >>>>>>>>>>>>> an extra priority level in the future for some
> > > reason,
> > > > > > > >>> we
> > > > > > > >>>>>> would
> > > > > > > >>>>>>>>>>> probably
> > > > > > > >>>>>>>>>>>>> need to go back to the design of separate queues,
> > one
> > > > > > > >>> for
> > > > > > > >>>>> each
> > > > > > > >>>>>>>>>> priority
> > > > > > > >>>>>>>>>>>>> level.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean
> > toward
> > > > > > > >>> your
> > > > > > > >>>>>>>> suggested
> > > > > > > >>>>>>>>>>>>> approach.
> > > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > > > > > > >>> answering
> > > > > > > >>>>>> your
> > > > > > > >>>>>>>>>>> question
> > > > > > > >>>>>>>>>>>>> only in the context
> > > > > > > >>>>>>>>>>>>> of the current KIP design: I think your
> suggestion
> > > > > > > >> makes
> > > > > > > >>>>>> sense,
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>> I'm
> > > > > > > >>>>>>>>>>>> ok
> > > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > > > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > > > > > >> sufficient
> > > > > > > >>>>>> enough.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>> Lucas
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat
> <
> > > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize
> > the
> > > > > > > >>>>>>>> controller
> > > > > > > >>>>>>>>>>> request
> > > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > > >>>>>>>>>>>>>> In that case, we can change the request queue
> to a
> > > > > > > >>>>> dequeue,
> > > > > > > >>>>>>>> where
> > > > > > > >>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > > > > > >>>> consume,..etc)
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>> end
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request,
> you
> > > > > > > >>> insert
> > > > > > > >>>>> it
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>> head
> > > > > > > >>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller
> > request
> > > > > > > >>> will
> > > > > > > >>>>> be
> > > > > > > >>>>>>>> given
> > > > > > > >>>>>>>>>>>> higher
> > > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Also since we only read one request from the
> > socket
> > > > > > > >>> and
> > > > > > > >>>>> mute
> > > > > > > >>>>>>>> it
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>>>> only
> > > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > > > > > > >>> ensure
> > > > > > > >>>>> that
> > > > > > > >>>>>>>> we
> > > > > > > >>>>>>>>>> don't
> > > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue
> > and
> > > > > > > >>> the
> > > > > > > >>>>>>>>> additional
> > > > > > > >>>>>>>>>>>>> config
> > > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > > > > > >>> current
> > > > > > > >>>>>> design
> > > > > > > >>>>>>>>>> makes
> > > > > > > >>>>>>>>>>>>> sense.
> > > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config
> for
> > > > > > > >> the
> > > > > > > >>>>>>>> controller
> > > > > > > >>>>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case
> > in
> > > > > > > >>>> which
> > > > > > > >>>>>>>> users
> > > > > > > >>>>>>>>>>> would
> > > > > > > >>>>>>>>>>>>>> change
> > > > > > > >>>>>>>>>>>>>>> it.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > > > > > > >>>> expect
> > > > > > > >>>>>>>> user to
> > > > > > > >>>>>>>>>>> ever
> > > > > > > >>>>>>>>>>>>>> change
> > > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > > > > > > >>> agree
> > > > > > > >>>>> that
> > > > > > > >>>>>>>> 20
> > > > > > > >>>>>>>>> is
> > > > > > > >>>>>>>>>>>>> already
> > > > > > > >>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > > > > > >> expect
> > > > > > > >>>> user
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>>>> change
> > > > > > > >>>>>>>>>>>> it,
> > > > > > > >>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > > > > > > >>>>> normally
> > > > > > > >>>>>>>> there
> > > > > > > >>>>>>>>>>>> should
> > > > > > > >>>>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > > > > > > >> many
> > > > > > > >>>>>>>> enqueued
> > > > > > > >>>>>>>>>>>>> controller
> > > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not
> sure
> > > > > > > >>>> which
> > > > > > > >>>>>> part
> > > > > > > >>>>>>>>>>> conveys
> > > > > > > >>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out
> the
> > > > > > > >>>>> section.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > > > > > >>>> conditions,
> > > > > > > >>>>>> the
> > > > > > > >>>>>>>>>> current
> > > > > > > >>>>>>>>>>>>>> design
> > > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers
> running
> > > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > > > > > > >>>>> controller
> > > > > > > >>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>> capacity
> > > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > > > > > >> different
> > > > > > > >>>>>>>> controllers
> > > > > > > >>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > > > > > >> other
> > > > > > > >>>>> hand,
> > > > > > > >>>>>>>>> adding
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards
> us
> > > > > > > >>> from
> > > > > > > >>>>>>>> issues
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>>> those
> > > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > > > > > >>>> downside
> > > > > > > >>>>> of
> > > > > > > >>>>>>>>> adding
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> extra
> > > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > > > > > >>> statement
> > > > > > > >>>>>> based
> > > > > > > >>>>>>>> on
> > > > > > > >>>>>>>>> a
> > > > > > > >>>>>>>>>>>>> previous
> > > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > > > > > >>> Gharat <
> > > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > > > > > > >>>> memory
> > > > > > > >>>>>>>>>>> consumption
> > > > > > > >>>>>>>>>>>>> can
> > > > > > > >>>>>>>>>>>>>>> rise
> > > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests
> can
> > > > > > > >>> go
> > > > > > > >>>> up
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> 2x"
> > > > > > > >>>>>>>>>> in
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> impact
> > > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > > > > > >> controller
> > > > > > > >>>> to a
> > > > > > > >>>>>>>> Broker
> > > > > > > >>>>>>>>>> are
> > > > > > > >>>>>>>>>>>> not
> > > > > > > >>>>>>>>>>>>>>> high
> > > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > > > > > >>>> control
> > > > > > > >>>>>>>> plane
> > > > > > > >>>>>>>>>> from
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> data
> > > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > > > > > >> controller
> > > > > > > >>>>>> request
> > > > > > > >>>>>>>>> queue
> > > > > > > >>>>>>>>>>> may
> > > > > > > >>>>>>>>>>>>>> have
> > > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > > > > > > >>> The
> > > > > > > >>>>>>>>> controller
> > > > > > > >>>>>>>>>>>>>> requests
> > > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > > > > > >>> SocketServer
> > > > > > > >>>>>> will
> > > > > > > >>>>>>>>> mute
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> channel
> > > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > > > > > > >>>>> channel.
> > > > > > > >>>>>>>> So
> > > > > > > >>>>>>>>>>>> assuming
> > > > > > > >>>>>>>>>>>>>>> there
> > > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > > > > > > >>> each
> > > > > > > >>>>>>>> broker,
> > > > > > > >>>>>>>>> on
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> broker
> > > > > > > >>>>>>>>>>>>>>>>>> side,
> > > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > > > > > > >>> in
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>>>> controller
> > > > > > > >>>>>>>>>>>>>>> request
> > > > > > > >>>>>>>>>>>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > > > > > > >> we
> > > > > > > >>>>> need
> > > > > > > >>>>>> a
> > > > > > > >>>>>>>>>>> separate
> > > > > > > >>>>>>>>>>>>>>>>> controller
> > > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > > > > > > >>>> value
> > > > > > > >>>>> 20
> > > > > > > >>>>>>>>> means
> > > > > > > >>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>> we
> > > > > > > >>>>>>>>>>>>>>>>> expect
> > > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > > > > > > >>> in a
> > > > > > > >>>>>> short
> > > > > > > >>>>>>>>>> period
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>> time.
> > > > > > > >>>>>>>>>>>>>>>>> I
> > > > > > > >>>>>>>>>>>>>>>>>> am
> > > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > > > > > >> the
> > > > > > > >>>>>>>> controller
> > > > > > > >>>>>>>>>>>> request
> > > > > > > >>>>>>>>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > > > > > >>>>>> indicating
> > > > > > > >>>>>>>>>>> something
> > > > > > > >>>>>>>>>>>>>> very
> > > > > > > >>>>>>>>>>>>>>>>> wrong
> > > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > > > > > >>> intuitive.
> > > > > > > >>>> It
> > > > > > > >>>>>>>> will
> > > > > > > >>>>>>>>> be
> > > > > > > >>>>>>>>>>> good
> > > > > > > >>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> learn
> > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > > > > > >>> Wang <
> > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > > > > > >>> the
> > > > > > > >>>>> KIP
> > > > > > > >>>>>> by
> > > > > > > >>>>>>>>>>>> explaining
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> cases
> > > > > > > >>>>>>>>>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > > > > > >>>>>> comments.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > > > > > >>> Wang
> > > > > > > >>>> <
> > > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > > > > > >>>> merely
> > > > > > > >>>>>>>> for me
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> easily
> > > > > > > >>>>>>>>>>>>>>>>>>> construct
> > > > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > > > > > >> In
> > > > > > > >>>>>>>> production,
> > > > > > > >>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>> than
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> disk
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > > > > > >> by
> > > > > > > >>>> high
> > > > > > > >>>>>>>>> produce
> > > > > > > >>>>>>>>>>> QPS.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > > > > > >>> the
> > > > > > > >>>>>>>> broker
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>>>>> that's
> > > > > > > >>>>>>>>>>>>>>> when
> > > > > > > >>>>>>>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > > > > > >>>> about
> > > > > > > >>>>>> each
> > > > > > > >>>>>>>>>>>>>> ProduceRequest
> > > > > > > >>>>>>>>>>>>>>>>>>> covering
> > > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > > > > > >>>>> request
> > > > > > > >>>>>>>> is
> > > > > > > >>>>>>>>>>>> enqueued
> > > > > > > >>>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>>> tries
> > > > > > > >>>>>>>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > > > > > >> broker0,
> > > > > > > >>>>> from
> > > > > > > >>>>>>>>> leader
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>> follower
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > > > > > >>>> *test-0*.
> > > > > > > >>>>>> For
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>> sake
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>>> argument,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > > > > > >>> say
> > > > > > > >>>>>>>> broker1,
> > > > > > > >>>>>>>>>> have
> > > > > > > >>>>>>>>>>>>>>> *stopped*
> > > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > > > > > >>> have
> > > > > > > >>>>>> acks =
> > > > > > > >>>>>>>>> -1
> > > > > > > >>>>>>>>>>>> (ALL)
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > > > > > >>>> ProduceRequests
> > > > > > > >>>>>>>> ahead
> > > > > > > >>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > > > > > >>>>> replicated
> > > > > > > >>>>>>>> to
> > > > > > > >>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>> brokers
> > > > > > > >>>>>>>>>>>>>>>>>>> (because
> > > > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > > > > > >>>>>>>> LeaderAndISR
> > > > > > > >>>>>>>>>>>> request
> > > > > > > >>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>>>> processed
> > > > > > > >>>>>>>>>>>>>>>>>>>> or
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > > > > > >>>>> immediately
> > > > > > > >>>>>>>>>>> transition
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> partition
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > > > > > >>> the
> > > > > > > >>>>>>>>>> replication
> > > > > > > >>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> remaining
> > > > > > > >>>>>>>>>>>>>>>>>>>> 19
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > > > > > >>>>> indicating
> > > > > > > >>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > > > > > >>> the
> > > > > > > >>>>>>>>> "test-0".
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > > > > > >> between
> > > > > > > >>>> 1.1
> > > > > > > >>>>>> and
> > > > > > > >>>>>>>>> 1.2,
> > > > > > > >>>>>>>>>>>> let's
> > > > > > > >>>>>>>>>>>>>> say
> > > > > > > >>>>>>>>>>>>>>>>>> there
> > > > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > > > > > >>>>>> LeaderAndISR,
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>>>> there
> > > > > > > >>>>>>>>>>>>>> are
> > > > > > > >>>>>>>>>>>>>>> 8
> > > > > > > >>>>>>>>>>>>>>>>> io
> > > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > > > > > >>>>>> approximately
> > > > > > > >>>>>>>>> 3000
> > > > > > > >>>>>>>>>>>>> produce
> > > > > > > >>>>>>>>>>>>>>>>>> requests.
> > > > > > > >>>>>>>>>>>>>>>>>>>> Now
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > > > > > >>>>> finally
> > > > > > > >>>>>>>>>> processed
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > > > > > >> we
> > > > > > > >>>>> model
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>> time
> > > > > > > >>>>>>>>>>>> when
> > > > > > > >>>>>>>>>>>>>>> their
> > > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > > > > > >>>> ...t2999,
> > > > > > > >>>>>> and
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > >>>>>>>>>>>>>>>>>>>> request
> > > > > > > >>>>>>>>>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > > > > > >>>> request
> > > > > > > >>>>>>>> would
> > > > > > > >>>>>>>>>> have
> > > > > > > >>>>>>>>>>>>>> waited
> > > > > > > >>>>>>>>>>>>>>> an
> > > > > > > >>>>>>>>>>>>>>>>>>> extra
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > > > > > >>> 2nd
> > > > > > > >>>>> an
> > > > > > > >>>>>>>> extra
> > > > > > > >>>>>>>>>>> time
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>> t3000 -
> > > > > > > >>>>>>>>>>>>>>>>>>> t1,
> > > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > > > > > >>>> difference
> > > > > > > >>>>> is
> > > > > > > >>>>>>>>> bigger
> > > > > > > >>>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> earlier
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > > > > > >>> ones.
> > > > > > > >>>>> For
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>> same
> > > > > > > >>>>>>>>>>>>>> reason,
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > > > > > >>>>> benefit
> > > > > > > >>>>>>>> we
> > > > > > > >>>>>>>>> get
> > > > > > > >>>>>>>>>>>>> (capped
> > > > > > > >>>>>>>>>>>>>>> by
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > > > > > >>> have
> > > > > > > >>>>>>>> acks=0 or
> > > > > > > >>>>>>>>>>>> acks=1
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > > > > > >> differences
> > > > > > > >>> in
> > > > > > > >>>>>> this
> > > > > > > >>>>>>>>> case,
> > > > > > > >>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > > > > > >> of
> > > > > > > >>>>>>>> partition
> > > > > > > >>>>>>>>>>> test-0
> > > > > > > >>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > > > > > >>> LeaderAndISR
> > > > > > > >>>>>> will
> > > > > > > >>>>>>>> be
> > > > > > > >>>>>>>>>>>> appended
> > > > > > > >>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> local
> > > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > > > > > >>> after
> > > > > > > >>>>>>>>> processing
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > > > > > >> of
> > > > > > > >>>> data
> > > > > > > >>>>>>>> loss
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>>> terms
> > > > > > > >>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>>> messages
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > > > > > >>> the
> > > > > > > >>>>>> effect
> > > > > > > >>>>>>>>>> since
> > > > > > > >>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > > > > > >> response
> > > > > > > >>> to
> > > > > > > >>>>>>>>> producers
> > > > > > > >>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>> have
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > > > > > >>> error,
> > > > > > > >>>>>>>> causing
> > > > > > > >>>>>>>>>>>> producers
> > > > > > > >>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> retry
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > > > > > >>> for
> > > > > > > >>>>>>>> reducing
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

Yes, I agree that a dedicated end to end control flow would be ideal.

Thanks,

Jiangjie (Becket) Qin

On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the comment, Becket.
> So far, we've been trying to avoid making any request handler thread
> special.
> But if we were to follow that path in order to make the two planes more
> isolated,
> what do you think about also having a dedicated processor thread,
> and dedicated port for the controller?
>
> Today one processor thread can handle multiple connections, let's say 100
> connections
>
> represented by connection0, ... connection99, among which connection0-98
> are from clients, while connection99 is from
>
> the controller. Further let's say after one selector polling, there are
> incoming requests on all connections.
>
> When the request queue is full, (either the data request being full in the
> two queue design, or
>
> the one single queue being full in the deque design), the processor thread
> will be blocked first
>
> when trying to enqueue the data request from connection0, then possibly
> blocked for the data request
>
> from connection1, ... etc even though the controller request is ready to be
> enqueued.
>
> To solve this problem, it seems we would need to have a separate port
> dedicated to
>
> the controller, a dedicated processor thread, a dedicated controller
> request queue,
>
> and pinning of one request handler thread for controller requests.
>
> Thanks,
> Lucas
>
>
> On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Personally I am not fond of the dequeue approach simply because it is
> > against the basic idea of isolating the controller plane and data plane.
> > With a single dequeue, theoretically speaking the controller requests can
> > starve the clients requests. I would prefer the approach with a separate
> > controller request queue and a dedicated controller request handler
> thread.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Sure, I can summarize the usage of correlation id. But before I do
> that,
> > it
> > > seems
> > > the same out-of-order processing can also happen to Produce requests
> sent
> > > by producers,
> > > following the same example you described earlier.
> > > If that's the case, I think this probably deserves a separate doc and
> > > design independent of this KIP.
> > >
> > > Lucas
> > >
> > >
> > >
> > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com>
> wrote:
> > >
> > > > Hey Lucas,
> > > >
> > > > Could you update the KIP if you are confident with the approach which
> > > uses
> > > > correlation id? The idea around correlation id is kind of scattered
> > > across
> > > > multiple emails. It will be useful if other reviews can read the KIP
> to
> > > > understand the latest proposal.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com> wrote:
> > > >
> > > > > I like the idea of the dequeue implementation by Lucas. This will
> > help
> > > us
> > > > > avoid additional queue for controller and additional configs in
> > Kafka.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > The usage of correlation ID might still be useful to address the
> > > cases
> > > > > > that the controller epoch and leader epoch check are not
> sufficient
> > > to
> > > > > > guarantee correct behavior. For example, if the controller sends
> a
> > > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the
> > broker
> > > > > > processes it in the reverse order, the replica may still be
> wrongly
> > > > > > recreated, right?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > > > >
> > > > > > > Hmm, since we already use controller epoch and leader epoch for
> > > > > properly
> > > > > > > caching the latest partition state, do we really need
> correlation
> > > id
> > > > > for
> > > > > > > ordering the controller requests?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Lucas and Mayuresh,
> > > > > > >>
> > > > > > >> Good idea. The correlation id should work.
> > > > > > >>
> > > > > > >> In the ControllerChannelManager, a request will be resent
> until
> > a
> > > > > > response
> > > > > > >> is received. So if the controller to broker connection
> > disconnects
> > > > > after
> > > > > > >> controller sends R1_a, but before the response of R1_a is
> > > received,
> > > > a
> > > > > > >> disconnection may cause the controller to resend R1_b. i.e.
> > until
> > > R1
> > > > > is
> > > > > > >> acked, R2 won't be sent by the controller.
> > > > > > >> This gives two guarantees:
> > > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > > >> 2. On the broker side, when R2 is seen, R1 must have been
> > > processed
> > > > at
> > > > > > >> least once.
> > > > > > >>
> > > > > > >> So on the broker side, with a single thread controller request
> > > > > handler,
> > > > > > the
> > > > > > >> logic should be:
> > > > > > >> 1. Process what ever request seen in the controller request
> > queue
> > > > > > >> 2. For the given epoch, drop request if its correlation id is
> > > > smaller
> > > > > > than
> > > > > > >> that of the last processed request.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Jiangjie (Becket) Qin
> > > > > > >>
> > > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io>
> > > wrote:
> > > > > > >>
> > > > > > >>> I agree that there is no strong ordering when there are more
> > than
> > > > one
> > > > > > >>> socket connections. Currently, we rely on controllerEpoch and
> > > > > > leaderEpoch
> > > > > > >>> to ensure that the receiving broker picks up the latest state
> > for
> > > > > each
> > > > > > >>> partition.
> > > > > > >>>
> > > > > > >>> One potential issue with the dequeue approach is that if the
> > > queue
> > > > is
> > > > > > >> full,
> > > > > > >>> there is no guarantee that the controller requests will be
> > > enqueued
> > > > > > >>> quickly.
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>>
> > > > > > >>> Jun
> > > > > > >>>
> > > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > > > > >>> gharatmayuresh15@gmail.com
> > > > > > >>>> wrote:
> > > > > > >>>
> > > > > > >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> > > > > > >> constructor.
> > > > > > >>>> Since we reuse the same NetworkClient between Controller and
> > the
> > > > > > >> broker,
> > > > > > >>> a
> > > > > > >>>> disconnection should not cause it to reset to 0, in which
> case
> > > it
> > > > > can
> > > > > > >> be
> > > > > > >>>> used to reject obsolete requests.
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>>
> > > > > > >>>> Mayuresh
> > > > > > >>>>
> > > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>>>
> > > > > > >>>>> @Dong,
> > > > > > >>>>> Great example and explanation, thanks!
> > > > > > >>>>>
> > > > > > >>>>> @All
> > > > > > >>>>> Regarding the example given by Dong, it seems even if we
> use
> > a
> > > > > queue,
> > > > > > >>>> and a
> > > > > > >>>>> dedicated controller request handling thread,
> > > > > > >>>>> the same result can still happen because R1_a will be sent
> on
> > > one
> > > > > > >>>>> connection, and R1_b & R2 will be sent on a different
> > > connection,
> > > > > > >>>>> and there is no ordering between different connections on
> the
> > > > > broker
> > > > > > >>>> side.
> > > > > > >>>>> I was discussing with Mayuresh offline, and it seems
> > > correlation
> > > > id
> > > > > > >>>> within
> > > > > > >>>>> the same NetworkClient object is monotonically increasing
> and
> > > > never
> > > > > > >>>> reset,
> > > > > > >>>>> hence a broker can leverage that to properly reject
> obsolete
> > > > > > >> requests.
> > > > > > >>>>> Thoughts?
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Lucas
> > > > > > >>>>>
> > > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Actually nvm, correlationId is reset in case of connection
> > > > loss, I
> > > > > > >>>> think.
> > > > > > >>>>>>
> > > > > > >>>>>> Thanks,
> > > > > > >>>>>>
> > > > > > >>>>>> Mayuresh
> > > > > > >>>>>>
> > > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> I agree with Dong that out-of-order processing can happen
> > > with
> > > > > > >>>> having 2
> > > > > > >>>>>>> separate queues as well and it can even happen today.
> > > > > > >>>>>>> Can we use the correlationId in the request from the
> > > controller
> > > > > > >> to
> > > > > > >>>> the
> > > > > > >>>>>>> broker to handle ordering ?
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>>
> > > > > > >>>>>>> Mayuresh
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > > becket.qin@gmail.com
> > > > > > >>>
> > > > > > >>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller
> > > request
> > > > > > >>>> handling
> > > > > > >>>>>>>> thread would be a better isolation. It also solves the
> > > > > > >> reordering
> > > > > > >>>>> issue.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > > > > >> jjkoshy.w@gmail.com>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Good example. I think this scenario can occur in the
> > > current
> > > > > > >>> code
> > > > > > >>>> as
> > > > > > >>>>>>>> well
> > > > > > >>>>>>>>> but with even lower probability given that there are
> > other
> > > > > > >>>>>>>> non-controller
> > > > > > >>>>>>>>> requests interleaved. It is still sketchy though and I
> > > think
> > > > a
> > > > > > >>>> safer
> > > > > > >>>>>>>>> approach would be separate queues and pinning
> controller
> > > > > > >> request
> > > > > > >>>>>>>> handling
> > > > > > >>>>>>>>> to one handler thread.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > > > > >> lindong28@gmail.com
> > > > > > >>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> Hey Becket,
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> I think you are right that there may be out-of-order
> > > > > > >>> processing.
> > > > > > >>>>>>>> However,
> > > > > > >>>>>>>>>> it seems that out-of-order processing may also happen
> > even
> > > > > > >> if
> > > > > > >>> we
> > > > > > >>>>>> use a
> > > > > > >>>>>>>>>> separate queue.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Here is the example:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
> > > receiving
> > > > > > >>>>>> response.
> > > > > > >>>>>>>>> Then
> > > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in
> > the
> > > > > > >>>>> controller
> > > > > > >>>>>>>>>> request queue in the order they are sent.
> > > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then
> > > thread2
> > > > > > >>>> takes
> > > > > > >>>>>> R2
> > > > > > >>>>>>>>> from
> > > > > > >>>>>>>>>> the request queue almost at the same time.
> > > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is
> > > chance
> > > > > > >>> that
> > > > > > >>>>>> R2's
> > > > > > >>>>>>>>>> processing is completed before R1.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> If out-of-order processing can happen for both
> > approaches
> > > > > > >> with
> > > > > > >>>>> very
> > > > > > >>>>>>>> low
> > > > > > >>>>>>>>>> probability, it may not be worthwhile to add the extra
> > > > > > >> queue.
> > > > > > >>>> What
> > > > > > >>>>>> do
> > > > > > >>>>>>>> you
> > > > > > >>>>>>>>>> think?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>> Dong
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > > > > >>>> becket.qin@gmail.com
> > > > > > >>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright up
> > some
> > > > > > >>> time
> > > > > > >>>>> ago
> > > > > > >>>>>>>> when
> > > > > > >>>>>>>>>> we
> > > > > > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > > > > > >> concern
> > > > > > >>>> was
> > > > > > >>>>>> that
> > > > > > >>>>>>>>> the
> > > > > > >>>>>>>>>>> controller requests are supposed to be processed in
> > > order.
> > > > > > >>> If
> > > > > > >>>> we
> > > > > > >>>>>> can
> > > > > > >>>>>>>>>> ensure
> > > > > > >>>>>>>>>>> that there is one controller request in the request
> > > > > > >> channel,
> > > > > > >>>> the
> > > > > > >>>>>>>> order
> > > > > > >>>>>>>>> is
> > > > > > >>>>>>>>>>> not a concern. But in cases that there are more than
> > one
> > > > > > >>>>>> controller
> > > > > > >>>>>>>>>> request
> > > > > > >>>>>>>>>>> inserted into the queue, the controller request order
> > may
> > > > > > >>>> change
> > > > > > >>>>>> and
> > > > > > >>>>>>>>>> cause
> > > > > > >>>>>>>>>>> problem. For example, think about the following
> > sequence:
> > > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to
> broker
> > > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head
> > of
> > > > > > >> the
> > > > > > >>>>>> request
> > > > > > >>>>>>>>>> queue.
> > > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > > > > >> controller
> > > > > > >>>>>>>>> reconnected
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>> the broker.
> > > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > > > > > >> request
> > > > > > >>>>> queue.
> > > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before
> R1
> > is
> > > > > > >>>>>> processed,
> > > > > > >>>>>>>>>> which
> > > > > > >>>>>>>>>>> may cause problem.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a
> > simpler
> > > > > > >>>> less
> > > > > > >>>>>>>>> invasive
> > > > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others,
> do
> > > > > > >> you
> > > > > > >>>> see
> > > > > > >>>>>> any
> > > > > > >>>>>>>>>>> pitfalls
> > > > > > >>>>>>>>>>>> with this approach?
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > > >>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't
> thought
> > > > > > >>>>> before.
> > > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well,
> > and
> > > > > > >>>> also
> > > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and
> > capacity
> > > > > > >>>>> config
> > > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we
> were
> > > > > > >> to
> > > > > > >>>>> adopt
> > > > > > >>>>>>>>>>>>> this design, there is no public interface change,
> and
> > > > > > >> we
> > > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily
> > satisfy
> > > > > > >>> the
> > > > > > >>>>>>>>>> requirement
> > > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing
> inserting
> > > > > > >> at
> > > > > > >>>>> both
> > > > > > >>>>>>>> ends.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> My only concern is that this design is tied to the
> > > > > > >>>>> coincidence
> > > > > > >>>>>>>> that
> > > > > > >>>>>>>>>>>>> we have two request priorities and there are two
> ends
> > > > > > >>> to a
> > > > > > >>>>>>>> deque.
> > > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > > > > > >> network
> > > > > > >>>>> layer
> > > > > > >>>>>>>> is
> > > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g.
> if
> > > > > > >> we
> > > > > > >>>> were
> > > > > > >>>>>> to
> > > > > > >>>>>>>> add
> > > > > > >>>>>>>>>>>>> an extra priority level in the future for some
> > reason,
> > > > > > >>> we
> > > > > > >>>>>> would
> > > > > > >>>>>>>>>>> probably
> > > > > > >>>>>>>>>>>>> need to go back to the design of separate queues,
> one
> > > > > > >>> for
> > > > > > >>>>> each
> > > > > > >>>>>>>>>> priority
> > > > > > >>>>>>>>>>>>> level.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean
> toward
> > > > > > >>> your
> > > > > > >>>>>>>> suggested
> > > > > > >>>>>>>>>>>>> approach.
> > > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> @Becket,
> > > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > > > > > >>> answering
> > > > > > >>>>>> your
> > > > > > >>>>>>>>>>> question
> > > > > > >>>>>>>>>>>>> only in the context
> > > > > > >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> > > > > > >> makes
> > > > > > >>>>>> sense,
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>>>> I'm
> > > > > > >>>>>>>>>>>> ok
> > > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > > > > >> sufficient
> > > > > > >>>>>> enough.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>> Lucas
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize
> the
> > > > > > >>>>>>>> controller
> > > > > > >>>>>>>>>>> request
> > > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > > >>>>>>>>>>>>>> In that case, we can change the request queue to a
> > > > > > >>>>> dequeue,
> > > > > > >>>>>>>> where
> > > > > > >>>>>>>>>> you
> > > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > > > > >>>> consume,..etc)
> > > > > > >>>>>> to
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>> end
> > > > > > >>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> > > > > > >>> insert
> > > > > > >>>>> it
> > > > > > >>>>>> to
> > > > > > >>>>>>>>> the
> > > > > > >>>>>>>>>>> head
> > > > > > >>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller
> request
> > > > > > >>> will
> > > > > > >>>>> be
> > > > > > >>>>>>>> given
> > > > > > >>>>>>>>>>>> higher
> > > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Also since we only read one request from the
> socket
> > > > > > >>> and
> > > > > > >>>>> mute
> > > > > > >>>>>>>> it
> > > > > > >>>>>>>>> and
> > > > > > >>>>>>>>>>>> only
> > > > > > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > > > > > >>> ensure
> > > > > > >>>>> that
> > > > > > >>>>>>>> we
> > > > > > >>>>>>>>>> don't
> > > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue
> and
> > > > > > >>> the
> > > > > > >>>>>>>>> additional
> > > > > > >>>>>>>>>>>>> config
> > > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > > >>>>>>>> becket.qin@gmail.com
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > > > > >>> current
> > > > > > >>>>>> design
> > > > > > >>>>>>>>>> makes
> > > > > > >>>>>>>>>>>>> sense.
> > > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config for
> > > > > > >> the
> > > > > > >>>>>>>> controller
> > > > > > >>>>>>>>>>> queue
> > > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case
> in
> > > > > > >>>> which
> > > > > > >>>>>>>> users
> > > > > > >>>>>>>>>>> would
> > > > > > >>>>>>>>>>>>>> change
> > > > > > >>>>>>>>>>>>>>> it.
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > > > > > >>>> expect
> > > > > > >>>>>>>> user to
> > > > > > >>>>>>>>>>> ever
> > > > > > >>>>>>>>>>>>>> change
> > > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > > > > > >>> agree
> > > > > > >>>>> that
> > > > > > >>>>>>>> 20
> > > > > > >>>>>>>>> is
> > > > > > >>>>>>>>>>>>> already
> > > > > > >>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > > > > >> expect
> > > > > > >>>> user
> > > > > > >>>>>> to
> > > > > > >>>>>>>>>> change
> > > > > > >>>>>>>>>>>> it,
> > > > > > >>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > > > > > >>>>> normally
> > > > > > >>>>>>>> there
> > > > > > >>>>>>>>>>>> should
> > > > > > >>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>> just
> > > > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > > > > > >> many
> > > > > > >>>>>>>> enqueued
> > > > > > >>>>>>>>>>>>> controller
> > > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> > > > > > >>>> which
> > > > > > >>>>>> part
> > > > > > >>>>>>>>>>> conveys
> > > > > > >>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>> info.
> > > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> > > > > > >>>>> section.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > > > > >>>> conditions,
> > > > > > >>>>>> the
> > > > > > >>>>>>>>>> current
> > > > > > >>>>>>>>>>>>>> design
> > > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> > > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > > > > > >>>>> controller
> > > > > > >>>>>>>>> queue
> > > > > > >>>>>>>>>>>>> capacity
> > > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > > > > >> different
> > > > > > >>>>>>>> controllers
> > > > > > >>>>>>>>>>> will
> > > > > > >>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > > > > >> other
> > > > > > >>>>> hand,
> > > > > > >>>>>>>>> adding
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>> extra
> > > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> > > > > > >>> from
> > > > > > >>>>>>>> issues
> > > > > > >>>>>>>>> in
> > > > > > >>>>>>>>>>>> those
> > > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > > > > >>>> downside
> > > > > > >>>>> of
> > > > > > >>>>>>>>> adding
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>> extra
> > > > > > >>>>>>>>>>>>>>>>> config.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > > > > >>> statement
> > > > > > >>>>>> based
> > > > > > >>>>>>>> on
> > > > > > >>>>>>>>> a
> > > > > > >>>>>>>>>>>>> previous
> > > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > > > > >>> Gharat <
> > > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > > > > > >>>> memory
> > > > > > >>>>>>>>>>> consumption
> > > > > > >>>>>>>>>>>>> can
> > > > > > >>>>>>>>>>>>>>> rise
> > > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> > > > > > >>> go
> > > > > > >>>> up
> > > > > > >>>>>> to
> > > > > > >>>>>>>> 2x"
> > > > > > >>>>>>>>>> in
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> impact
> > > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > > > > >> controller
> > > > > > >>>> to a
> > > > > > >>>>>>>> Broker
> > > > > > >>>>>>>>>> are
> > > > > > >>>>>>>>>>>> not
> > > > > > >>>>>>>>>>>>>>> high
> > > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > > > > >>>> control
> > > > > > >>>>>>>> plane
> > > > > > >>>>>>>>>> from
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> data
> > > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > > > > >> controller
> > > > > > >>>>>> request
> > > > > > >>>>>>>>> queue
> > > > > > >>>>>>>>>>> may
> > > > > > >>>>>>>>>>>>>> have
> > > > > > >>>>>>>>>>>>>>>>> many
> > > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > > > > > >>> The
> > > > > > >>>>>>>>> controller
> > > > > > >>>>>>>>>>>>>> requests
> > > > > > >>>>>>>>>>>>>>>>> still
> > > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > > > > >>> SocketServer
> > > > > > >>>>>> will
> > > > > > >>>>>>>>> mute
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> channel
> > > > > > >>>>>>>>>>>>>>>>>> once
> > > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > > > > > >>>>> channel.
> > > > > > >>>>>>>> So
> > > > > > >>>>>>>>>>>> assuming
> > > > > > >>>>>>>>>>>>>>> there
> > > > > > >>>>>>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > > > > > >>> each
> > > > > > >>>>>>>> broker,
> > > > > > >>>>>>>>> on
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> broker
> > > > > > >>>>>>>>>>>>>>>>>> side,
> > > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > > > > > >>> in
> > > > > > >>>>> the
> > > > > > >>>>>>>>>>> controller
> > > > > > >>>>>>>>>>>>>>> request
> > > > > > >>>>>>>>>>>>>>>>>> queue
> > > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > > > > > >> we
> > > > > > >>>>> need
> > > > > > >>>>>> a
> > > > > > >>>>>>>>>>> separate
> > > > > > >>>>>>>>>>>>>>>>> controller
> > > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > > > > > >>>> value
> > > > > > >>>>> 20
> > > > > > >>>>>>>>> means
> > > > > > >>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>> we
> > > > > > >>>>>>>>>>>>>>>>> expect
> > > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > > > > > >>> in a
> > > > > > >>>>>> short
> > > > > > >>>>>>>>>> period
> > > > > > >>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>> time.
> > > > > > >>>>>>>>>>>>>>>>> I
> > > > > > >>>>>>>>>>>>>>>>>> am
> > > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > > > > >> the
> > > > > > >>>>>>>> controller
> > > > > > >>>>>>>>>>>> request
> > > > > > >>>>>>>>>>>>>>> queue
> > > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > > > > >>>>>> indicating
> > > > > > >>>>>>>>>>> something
> > > > > > >>>>>>>>>>>>>> very
> > > > > > >>>>>>>>>>>>>>>>> wrong
> > > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > > > > >>> intuitive.
> > > > > > >>>> It
> > > > > > >>>>>>>> will
> > > > > > >>>>>>>>> be
> > > > > > >>>>>>>>>>> good
> > > > > > >>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>> learn
> > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > > > > >>> Wang <
> > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > > > > >>> the
> > > > > > >>>>> KIP
> > > > > > >>>>>> by
> > > > > > >>>>>>>>>>>> explaining
> > > > > > >>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>> cases
> > > > > > >>>>>>>>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > > > > >>>>>> comments.
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > > > > >>> Wang
> > > > > > >>>> <
> > > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > > > > >>>> merely
> > > > > > >>>>>>>> for me
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>>>> easily
> > > > > > >>>>>>>>>>>>>>>>>>> construct
> > > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > > > > >> In
> > > > > > >>>>>>>> production,
> > > > > > >>>>>>>>>>> other
> > > > > > >>>>>>>>>>>>>> than
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> disk
> > > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > > > > >> by
> > > > > > >>>> high
> > > > > > >>>>>>>>> produce
> > > > > > >>>>>>>>>>> QPS.
> > > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > > > > >>> the
> > > > > > >>>>>>>> broker
> > > > > > >>>>>>>>> and
> > > > > > >>>>>>>>>>>>> that's
> > > > > > >>>>>>>>>>>>>>> when
> > > > > > >>>>>>>>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > > > > >>>> about
> > > > > > >>>>>> each
> > > > > > >>>>>>>>>>>>>> ProduceRequest
> > > > > > >>>>>>>>>>>>>>>>>>> covering
> > > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > > > > >>>>> request
> > > > > > >>>>>>>> is
> > > > > > >>>>>>>>>>>> enqueued
> > > > > > >>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>> tries
> > > > > > >>>>>>>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > > > > >> broker0,
> > > > > > >>>>> from
> > > > > > >>>>>>>>> leader
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>> follower
> > > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > > > > >>>> *test-0*.
> > > > > > >>>>>> For
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>> sake
> > > > > > >>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>> argument,
> > > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > > > > >>> say
> > > > > > >>>>>>>> broker1,
> > > > > > >>>>>>>>>> have
> > > > > > >>>>>>>>>>>>>>> *stopped*
> > > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > > > > >>> have
> > > > > > >>>>>> acks =
> > > > > > >>>>>>>>> -1
> > > > > > >>>>>>>>>>>> (ALL)
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > > > > >>>> ProduceRequests
> > > > > > >>>>>>>> ahead
> > > > > > >>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > > > > >>>>> replicated
> > > > > > >>>>>>>> to
> > > > > > >>>>>>>>>> other
> > > > > > >>>>>>>>>>>>>> brokers
> > > > > > >>>>>>>>>>>>>>>>>>> (because
> > > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > > > > >>>>>>>> LeaderAndISR
> > > > > > >>>>>>>>>>>> request
> > > > > > >>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>>>> processed
> > > > > > >>>>>>>>>>>>>>>>>>>> or
> > > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > > > > >>>>> immediately
> > > > > > >>>>>>>>>>> transition
> > > > > > >>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> partition
> > > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > > > > >>> the
> > > > > > >>>>>>>>>> replication
> > > > > > >>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> remaining
> > > > > > >>>>>>>>>>>>>>>>>>>> 19
> > > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > > > > >>>>> indicating
> > > > > > >>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > > > > >>> the
> > > > > > >>>>>>>>> "test-0".
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > > > > >> between
> > > > > > >>>> 1.1
> > > > > > >>>>>> and
> > > > > > >>>>>>>>> 1.2,
> > > > > > >>>>>>>>>>>> let's
> > > > > > >>>>>>>>>>>>>> say
> > > > > > >>>>>>>>>>>>>>>>>> there
> > > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > > > > >>>>>> LeaderAndISR,
> > > > > > >>>>>>>>> and
> > > > > > >>>>>>>>>>>> there
> > > > > > >>>>>>>>>>>>>> are
> > > > > > >>>>>>>>>>>>>>> 8
> > > > > > >>>>>>>>>>>>>>>>> io
> > > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > > > > >>>>>> approximately
> > > > > > >>>>>>>>> 3000
> > > > > > >>>>>>>>>>>>> produce
> > > > > > >>>>>>>>>>>>>>>>>> requests.
> > > > > > >>>>>>>>>>>>>>>>>>>> Now
> > > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > > > > >>>>> finally
> > > > > > >>>>>>>>>> processed
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > > > > >> we
> > > > > > >>>>> model
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>> time
> > > > > > >>>>>>>>>>>> when
> > > > > > >>>>>>>>>>>>>>> their
> > > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > > > > >>>> ...t2999,
> > > > > > >>>>>> and
> > > > > > >>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > > > > >>>>>>>>>>>>>>>>>>>> request
> > > > > > >>>>>>>>>>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > > > > >>>> request
> > > > > > >>>>>>>> would
> > > > > > >>>>>>>>>> have
> > > > > > >>>>>>>>>>>>>> waited
> > > > > > >>>>>>>>>>>>>>> an
> > > > > > >>>>>>>>>>>>>>>>>>> extra
> > > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > > > > >>> 2nd
> > > > > > >>>>> an
> > > > > > >>>>>>>> extra
> > > > > > >>>>>>>>>>> time
> > > > > > >>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>> t3000 -
> > > > > > >>>>>>>>>>>>>>>>>>> t1,
> > > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > > > > >>>> difference
> > > > > > >>>>> is
> > > > > > >>>>>>>>> bigger
> > > > > > >>>>>>>>>>> for
> > > > > > >>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>> earlier
> > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > > > > >>> ones.
> > > > > > >>>>> For
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>> same
> > > > > > >>>>>>>>>>>>>> reason,
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > > > > >>>>> benefit
> > > > > > >>>>>>>> we
> > > > > > >>>>>>>>> get
> > > > > > >>>>>>>>>>>>> (capped
> > > > > > >>>>>>>>>>>>>>> by
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > > > > >>> have
> > > > > > >>>>>>>> acks=0 or
> > > > > > >>>>>>>>>>>> acks=1
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > > > > >> differences
> > > > > > >>> in
> > > > > > >>>>>> this
> > > > > > >>>>>>>>> case,
> > > > > > >>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > > > > >> of
> > > > > > >>>>>>>> partition
> > > > > > >>>>>>>>>>> test-0
> > > > > > >>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > > > > >>> LeaderAndISR
> > > > > > >>>>>> will
> > > > > > >>>>>>>> be
> > > > > > >>>>>>>>>>>> appended
> > > > > > >>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> local
> > > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > > > > >>> after
> > > > > > >>>>>>>>> processing
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > > > > >> of
> > > > > > >>>> data
> > > > > > >>>>>>>> loss
> > > > > > >>>>>>>>> in
> > > > > > >>>>>>>>>>>> terms
> > > > > > >>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>> messages
> > > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > > > > >>> the
> > > > > > >>>>>> effect
> > > > > > >>>>>>>>>> since
> > > > > > >>>>>>>>>>> if
> > > > > > >>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > > > > >> response
> > > > > > >>> to
> > > > > > >>>>>>>>> producers
> > > > > > >>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>> have
> > > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > > > > >>> error,
> > > > > > >>>>>>>> causing
> > > > > > >>>>>>>>>>>> producers
> > > > > > >>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>> retry
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > > > > >>> for
> > > > > > >>>>>>>> reducing
> > > > > > >>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>> latency
> > > > > > >>>>>>>>>>>>>>>>> of a
> > > > > > >>>>>>>>>>>>>>>>>>>>> broker
> > > > > > >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> > > > > > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> > > > > > >> latency
> > > > > > >>>> of
> > > > > > >>>>> a
> > > > > > >>>>>>>>> broker
> > > > > > >>>>>>>>>>>>> becoming
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> leader.
> > > > > > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> > > > > > >> more
> > > > > > >>>>>>>> obvious, if
> > > > > > >>>>>>>>>>> other
> > > > > > >>>>>>>>>>>>>>> brokers
> > > > > > >>>>>>>>>>>>>>>>>> have
> > > > > > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> > > > > > >>>>>>>>>>>>>>>>>>>>>> current broker should take
> > > > > > >> leadership.
> > > > > > >>>> Any
> > > > > > >>>>>>>> delay
> > > > > > >>>>>>>>> in
> > > > > > >>>>>>>>>>>>>> processing
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> > > > > > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> > > > > > >>> extreme
> > > > > > >>>>>> cases,
> > > > > > >>>>>>>>> this
> > > > > > >>>>>>>>>>> can
> > > > > > >>>>>>>>>>>>>> cause
> > > > > > >>>>>>>>>>>>>>>>>> failed
> > > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> > > > > > >>>>>>>>>>>>>>>>>>>>>> exhausted.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> > > > > > >>> requests
> > > > > > >>>>> are
> > > > > > >>>>>>>>>>>>> UpdateMetadata
> > > > > > >>>>>>>>>>>>>>> and
> > > > > > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> > > > > > >> discuss
> > > > > > >>>> as
> > > > > > >>>>>>>>> follows:
> > > > > > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> > > > > > >>>>>> processing
> > > > > > >>>>>>>>>> means
> > > > > > >>>>>>>>>>>>>> clients
> > > > > > >>>>>>>>>>>>>>>>>>> receiving
> > > > > > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> > > > > > >>>>>> leadership
> > > > > > >>>>>>>>> info
> > > > > > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> > > > > > >> effect
> > > > > > >>> is
> > > > > > >>>>>> more
> > > > > > >>>>>>>>>> retries
> > > > > > >>>>>>>>>>>> or
> > > > > > >>>>>>>>>>>>>> even
> > > > > > >>>>>>>>>>>>>>>>>> fatal
> > > > > > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> > > > > > >>> queuing
> > > > > > >>>>> time
> > > > > > >>>>>>>> may
> > > > > > >>>>>>>>>>>> degrade
> > > > > > >>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> performance
> > > > > > >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> > > > > > >>> delay
> > > > > > >>>>> for
> > > > > > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> > > > > > >>>>>>>>>>>>>>>>>>>> you
> > > > > > >>>>>>>>>>>>>>>>>>>>>> are right
> > > > > > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> > > > > > >>>> latency
> > > > > > >>>>> in
> > > > > > >>>>>>>>>> getting
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>> log
> > > > > > >>>>>>>>>>>>>>>>> dirs
> > > > > > >>>>>>>>>>>>>>>>>>>> info,
> > > > > > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> > > > > > >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> > > > > > >> involved.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > > > > > >>> Lin
> > > > > > >>>> <
> > > > > > >>>>>>>>>>>>>> lindong28@gmail.com
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> > > > > > >>> good
> > > > > > >>>>>>>> point.
> > > > > > >>>>>>>>> So
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> feature
> > > > > > >>>>>>>>>>>>>>>>> may
> > > > > > >>>>>>>>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> > > > > > >>>>>> question
> > > > > > >>>>>>>>>> below.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> > > > > > >>> useful
> > > > > > >>>>> for
> > > > > > >>>>>>>>>> non-JBOD
> > > > > > >>>>>>>>>>>>> setup
> > > > > > >>>>>>>>>>>>>>> or
> > > > > > >>>>>>>>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>>>>>> only
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> > > > > > >>>>> useful
> > > > > > >>>>>> to
> > > > > > >>>>>>>>>>>> understand
> > > > > > >>>>>>>>>>>>>>> this.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> > > > > > >>> in
> > > > > > >>>>>> order
> > > > > > >>>>>>>> to
> > > > > > >>>>>>>>>> move
> > > > > > >>>>>>>>>>>>>> leaders
> > > > > > >>>>>>>>>>>>>>>>> on
> > > > > > >>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> failed
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> > > > > > >>> operator
> > > > > > >>>>>> first
> > > > > > >>>>>>>>>> needs
> > > > > > >>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>> get
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>> list
> > > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> > > > > > >> is
> > > > > > >>>>>>>> currently
> > > > > > >>>>>>>>>>>> achieved
> > > > > > >>>>>>>>>>>>>>> using
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> > > > > > >>>> sends
> > > > > > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > > > >>>>>>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> > > > > > >>>>> controller
> > > > > > >>>>>>>>>>> requests,
> > > > > > >>>>>>>>>>>>> then
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> > > > > > >>>> processed
> > > > > > >>>>>> by
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>>> broker.
> > > > > > >>>>>>>>>>>>>> So
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> overall
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> > > > > > >>>> failed
> > > > > > >>>>>> disk
> > > > > > >>>>>>>>> may
> > > > > > >>>>>>>>>>>> still
> > > > > > >>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>> long
> > > > > > >>>>>>>>>>>>>>>>>>> even
> > > > > > >>>>>>>>>>>>>>>>>>>>> with
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> > > > > > >> Lucas
> > > > > > >>>>> Wang <
> > > > > > >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> > > > > > >>>> Jun.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> > > > > > >>> your
> > > > > > >>>>>>>> previous
> > > > > > >>>>>>>>>>> email
> > > > > > >>>>>>>>>>>>> are
> > > > > > >>>>>>>>>>>>>>>>> about
> > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> > > > > > >>> it's
> > > > > > >>>>>>>> useful,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> > > > > > >>> you
> > > > > > >>>>>> agree
> > > > > > >>>>>>>>> that
> > > > > > >>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>> KIP
> > > > > > >>>>>>>>>>>>>>>>> can
> > > > > > >>>>>>>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> > > > > > >> by
> > > > > > >>>> Jun?
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> > > > > > >> Jun
> > > > > > >>>> Rao
> > > > > > >>>>> <
> > > > > > >>>>>>>>>>>>>> jun@confluent.io>
> > > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> > > > > > >> slow,
> > > > > > >>>> one
> > > > > > >>>>>>>>> probably
> > > > > > >>>>>>>>>>>>> should
> > > > > > >>>>>>>>>>>>>>> just
> > > > > > >>>>>>>>>>>>>>>>>> kill
> > > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> > > > > > >> may
> > > > > > >>>> not
> > > > > > >>>>>>>> help.
> > > > > > >>>>>>>>> If
> > > > > > >>>>>>>>>>>> only
> > > > > > >>>>>>>>>>>>>> one
> > > > > > >>>>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>> disks
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> on
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> > > > > > >>> fail
> > > > > > >>>>>> that
> > > > > > >>>>>>>>> disk
> > > > > > >>>>>>>>>>> and
> > > > > > >>>>>>>>>>>>> move
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>> leaders
> > > > > > >>>>>>>>>>>>>>>>>>>>> on
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> > > > > > >>> case,
> > > > > > >>>>>> being
> > > > > > >>>>>>>>> able
> > > > > > >>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>> process
> > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> > > > > > >>>> help
> > > > > > >>>>>> the
> > > > > > >>>>>>>>>>> producers
> > > > > > >>>>>>>>>>>>>>> recover
> > > > > > >>>>>>>>>>>>>>>>>>>> quicker.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> > > > > > >>> Dong
> > > > > > >>>>>> Lin <
> > > > > > >>>>>>>>>>>>>>>>> lindong28@gmail.com
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> > > > > > >>> follow
> > > > > > >>>> up
> > > > > > >>>>>>>>>> questions
> > > > > > >>>>>>>>>>>>> below.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> > > > > > >>>> ProduceRequest
> > > > > > >>>>>>>> covers
> > > > > > >>>>>>>>> 20
> > > > > > >>>>>>>>>>>>>>> partitions
> > > > > > >>>>>>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> > > > > > >>> partitions,
> > > > > > >>>>>> then
> > > > > > >>>>>>>>> each
> > > > > > >>>>>>>>>>>>>>>>> ProduceRequest
> > > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> likely
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> > > > > > >> which
> > > > > > >>>> the
> > > > > > >>>>>>>> broker
> > > > > > >>>>>>>>> is
> > > > > > >>>>>>>>>>>> still
> > > > > > >>>>>>>>>>>>>>>>> leader
> > > > > > >>>>>>>>>>>>>>>>>>> after
> > > > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> > > > > > >> broker
> > > > > > >>>>> will
> > > > > > >>>>>>>> still
> > > > > > >>>>>>>>>> be
> > > > > > >>>>>>>>>>>> slow
> > > > > > >>>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>>>>>>>> processing
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> these
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> > > > > > >> will
> > > > > > >>>>> still
> > > > > > >>>>>> be
> > > > > > >>>>>>>>> very
> > > > > > >>>>>>>>>>>> high
> > > > > > >>>>>>>>>>>>>> with
> > > > > > >>>>>>>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>>>>>>>> KIP.
> > > > > > >>>>>>>>>>>>>>>>>>>>> It
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> seems
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> > > > > > >>>> still
> > > > > > >>>>>>>> timeout
> > > > > > >>>>>>>>>>> after
> > > > > > >>>>>>>>>>>>> 30
> > > > > > >>>>>>>>>>>>>>>>>> seconds.
> > > > > > >>>>>>>>>>>>>>>>>>> Is
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> > > > > > >>>> ProduceRequest
> > > > > > >>>>>> will
> > > > > > >>>>>>>>>> still
> > > > > > >>>>>>>>>>>>>> timeout
> > > > > > >>>>>>>>>>>>>>>>> after
> > > > > > >>>>>>>>>>>>>>>>>>> 30
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> > > > > > >>> KIP
> > > > > > >>>>>>>> reduces
> > > > > > >>>>>>>>>>>> average
> > > > > > >>>>>>>>>>>>>>>>> produce
> > > > > > >>>>>>>>>>>>>>>>>>>>> latency.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Can
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> > > > > > >>>> improved
> > > > > > >>>>>> by
> > > > > > >>>>>>>>> this
> > > > > > >>>>>>>>>>> KIP?
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> > > > > > >>>>> directly
> > > > > > >>>>>>>> cares
> > > > > > >>>>>>>>>>>> number
> > > > > > >>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>>> truncated
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> messages.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> > > > > > >> improve
> > > > > > >>>>>> average
> > > > > > >>>>>>>>>>>> throughput
> > > > > > >>>>>>>>>>>>>> or
> > > > > > >>>>>>>>>>>>>>>>>> reduce
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> message
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> > > > > > >> to
> > > > > > >>>>>>>> understand
> > > > > > >>>>>>>>>>> this.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> > > > > > >>>> Lucas
> > > > > > >>>>>>>> Wang <
> > > > > > >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> > > > > > >>>> comments.
> > > > > > >>>>>>>> Please
> > > > > > >>>>>>>>>> see
> > > > > > >>>>>>>>>>>> my
> > > > > > >>>>>>>>>>>>>>> reply
> > > > > > >>>>>>>>>>>>>>>>>>> below.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> > > > > > >> only
> > > > > > >>> 1
> > > > > > >>>>>>>>> partition.
> > > > > > >>>>>>>>>>> Now
> > > > > > >>>>>>>>>>>>>> let's
> > > > > > >>>>>>>>>>>>>>>>>>> consider
> > > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> more
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> common
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> > > > > > >> of
> > > > > > >>>>> many
> > > > > > >>>>>>>>>>> partitions.
> > > > > > >>>>>>>>>>>>> And
> > > > > > >>>>>>>>>>>>>>>>> let's
> > > > > > >>>>>>>>>>>>>>>>>>> say
> > > > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> some
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> > > > > > >>> partitions
> > > > > > >>>> on
> > > > > > >>>>>>>>> broker0
> > > > > > >>>>>>>>>> is
> > > > > > >>>>>>>>>>>> so
> > > > > > >>>>>>>>>>>>>>> large,
> > > > > > >>>>>>>>>>>>>>>>>> say
> > > > > > >>>>>>>>>>>>>>>>>>>> 10K,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> > > > > > >> to
> > > > > > >>>>> shift
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>>>> leadership
> > > > > > >>>>>>>>>>>>>>>>> for a
> > > > > > >>>>>>>>>>>>>>>>>>> lot
> > > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> > > > > > >>>>>> brokers,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> > > > > > >>> some
> > > > > > >>>>>>>> service
> > > > > > >>>>>>>>>> like
> > > > > > >>>>>>>>>>>>> cruise
> > > > > > >>>>>>>>>>>>>>>>>> control.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> > > > > > >>> the
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Regards,
> > > > > Mayuresh R. Gharat
> > > > > (862) 250-7125
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for the comment, Becket.
So far, we've been trying to avoid making any request handler thread
special.
But if we were to follow that path in order to make the two planes more
isolated,
what do you think about also having a dedicated processor thread,
and dedicated port for the controller?

Today one processor thread can handle multiple connections, let's say 100
connections

represented by connection0, ... connection99, among which connection0-98
are from clients, while connection99 is from

the controller. Further let's say after one selector polling, there are
incoming requests on all connections.

When the request queue is full, (either the data request being full in the
two queue design, or

the one single queue being full in the deque design), the processor thread
will be blocked first

when trying to enqueue the data request from connection0, then possibly
blocked for the data request

from connection1, ... etc even though the controller request is ready to be
enqueued.

To solve this problem, it seems we would need to have a separate port
dedicated to

the controller, a dedicated processor thread, a dedicated controller
request queue,

and pinning of one request handler thread for controller requests.

Thanks,
Lucas


On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin <be...@gmail.com> wrote:

> Personally I am not fond of the dequeue approach simply because it is
> against the basic idea of isolating the controller plane and data plane.
> With a single dequeue, theoretically speaking the controller requests can
> starve the clients requests. I would prefer the approach with a separate
> controller request queue and a dedicated controller request handler thread.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Sure, I can summarize the usage of correlation id. But before I do that,
> it
> > seems
> > the same out-of-order processing can also happen to Produce requests sent
> > by producers,
> > following the same example you described earlier.
> > If that's the case, I think this probably deserves a separate doc and
> > design independent of this KIP.
> >
> > Lucas
> >
> >
> >
> > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Could you update the KIP if you are confident with the approach which
> > uses
> > > correlation id? The idea around correlation id is kind of scattered
> > across
> > > multiple emails. It will be useful if other reviews can read the KIP to
> > > understand the latest proposal.
> > >
> > > Thanks,
> > > Dong
> > >
> > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com> wrote:
> > >
> > > > I like the idea of the dequeue implementation by Lucas. This will
> help
> > us
> > > > avoid additional queue for controller and additional configs in
> Kafka.
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > The usage of correlation ID might still be useful to address the
> > cases
> > > > > that the controller epoch and leader epoch check are not sufficient
> > to
> > > > > guarantee correct behavior. For example, if the controller sends a
> > > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the
> broker
> > > > > processes it in the reverse order, the replica may still be wrongly
> > > > > recreated, right?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> > > > > >
> > > > > > Hmm, since we already use controller epoch and leader epoch for
> > > > properly
> > > > > > caching the latest partition state, do we really need correlation
> > id
> > > > for
> > > > > > ordering the controller requests?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Lucas and Mayuresh,
> > > > > >>
> > > > > >> Good idea. The correlation id should work.
> > > > > >>
> > > > > >> In the ControllerChannelManager, a request will be resent until
> a
> > > > > response
> > > > > >> is received. So if the controller to broker connection
> disconnects
> > > > after
> > > > > >> controller sends R1_a, but before the response of R1_a is
> > received,
> > > a
> > > > > >> disconnection may cause the controller to resend R1_b. i.e.
> until
> > R1
> > > > is
> > > > > >> acked, R2 won't be sent by the controller.
> > > > > >> This gives two guarantees:
> > > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > > >> 2. On the broker side, when R2 is seen, R1 must have been
> > processed
> > > at
> > > > > >> least once.
> > > > > >>
> > > > > >> So on the broker side, with a single thread controller request
> > > > handler,
> > > > > the
> > > > > >> logic should be:
> > > > > >> 1. Process what ever request seen in the controller request
> queue
> > > > > >> 2. For the given epoch, drop request if its correlation id is
> > > smaller
> > > > > than
> > > > > >> that of the last processed request.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jiangjie (Becket) Qin
> > > > > >>
> > > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > >>
> > > > > >>> I agree that there is no strong ordering when there are more
> than
> > > one
> > > > > >>> socket connections. Currently, we rely on controllerEpoch and
> > > > > leaderEpoch
> > > > > >>> to ensure that the receiving broker picks up the latest state
> for
> > > > each
> > > > > >>> partition.
> > > > > >>>
> > > > > >>> One potential issue with the dequeue approach is that if the
> > queue
> > > is
> > > > > >> full,
> > > > > >>> there is no guarantee that the controller requests will be
> > enqueued
> > > > > >>> quickly.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>>
> > > > > >>> Jun
> > > > > >>>
> > > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > > > >>> gharatmayuresh15@gmail.com
> > > > > >>>> wrote:
> > > > > >>>
> > > > > >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> > > > > >> constructor.
> > > > > >>>> Since we reuse the same NetworkClient between Controller and
> the
> > > > > >> broker,
> > > > > >>> a
> > > > > >>>> disconnection should not cause it to reset to 0, in which case
> > it
> > > > can
> > > > > >> be
> > > > > >>>> used to reject obsolete requests.
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>>
> > > > > >>>> Mayuresh
> > > > > >>>>
> > > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > >>>>
> > > > > >>>>> @Dong,
> > > > > >>>>> Great example and explanation, thanks!
> > > > > >>>>>
> > > > > >>>>> @All
> > > > > >>>>> Regarding the example given by Dong, it seems even if we use
> a
> > > > queue,
> > > > > >>>> and a
> > > > > >>>>> dedicated controller request handling thread,
> > > > > >>>>> the same result can still happen because R1_a will be sent on
> > one
> > > > > >>>>> connection, and R1_b & R2 will be sent on a different
> > connection,
> > > > > >>>>> and there is no ordering between different connections on the
> > > > broker
> > > > > >>>> side.
> > > > > >>>>> I was discussing with Mayuresh offline, and it seems
> > correlation
> > > id
> > > > > >>>> within
> > > > > >>>>> the same NetworkClient object is monotonically increasing and
> > > never
> > > > > >>>> reset,
> > > > > >>>>> hence a broker can leverage that to properly reject obsolete
> > > > > >> requests.
> > > > > >>>>> Thoughts?
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Lucas
> > > > > >>>>>
> > > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > >>>>>
> > > > > >>>>>> Actually nvm, correlationId is reset in case of connection
> > > loss, I
> > > > > >>>> think.
> > > > > >>>>>>
> > > > > >>>>>> Thanks,
> > > > > >>>>>>
> > > > > >>>>>> Mayuresh
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> I agree with Dong that out-of-order processing can happen
> > with
> > > > > >>>> having 2
> > > > > >>>>>>> separate queues as well and it can even happen today.
> > > > > >>>>>>> Can we use the correlationId in the request from the
> > controller
> > > > > >> to
> > > > > >>>> the
> > > > > >>>>>>> broker to handle ordering ?
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>>
> > > > > >>>>>>> Mayuresh
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > > becket.qin@gmail.com
> > > > > >>>
> > > > > >>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller
> > request
> > > > > >>>> handling
> > > > > >>>>>>>> thread would be a better isolation. It also solves the
> > > > > >> reordering
> > > > > >>>>> issue.
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > > > >> jjkoshy.w@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Good example. I think this scenario can occur in the
> > current
> > > > > >>> code
> > > > > >>>> as
> > > > > >>>>>>>> well
> > > > > >>>>>>>>> but with even lower probability given that there are
> other
> > > > > >>>>>>>> non-controller
> > > > > >>>>>>>>> requests interleaved. It is still sketchy though and I
> > think
> > > a
> > > > > >>>> safer
> > > > > >>>>>>>>> approach would be separate queues and pinning controller
> > > > > >> request
> > > > > >>>>>>>> handling
> > > > > >>>>>>>>> to one handler thread.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > > > >> lindong28@gmail.com
> > > > > >>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Hey Becket,
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I think you are right that there may be out-of-order
> > > > > >>> processing.
> > > > > >>>>>>>> However,
> > > > > >>>>>>>>>> it seems that out-of-order processing may also happen
> even
> > > > > >> if
> > > > > >>> we
> > > > > >>>>>> use a
> > > > > >>>>>>>>>> separate queue.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Here is the example:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
> > receiving
> > > > > >>>>>> response.
> > > > > >>>>>>>>> Then
> > > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in
> the
> > > > > >>>>> controller
> > > > > >>>>>>>>>> request queue in the order they are sent.
> > > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then
> > thread2
> > > > > >>>> takes
> > > > > >>>>>> R2
> > > > > >>>>>>>>> from
> > > > > >>>>>>>>>> the request queue almost at the same time.
> > > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is
> > chance
> > > > > >>> that
> > > > > >>>>>> R2's
> > > > > >>>>>>>>>> processing is completed before R1.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> If out-of-order processing can happen for both
> approaches
> > > > > >> with
> > > > > >>>>> very
> > > > > >>>>>>>> low
> > > > > >>>>>>>>>> probability, it may not be worthwhile to add the extra
> > > > > >> queue.
> > > > > >>>> What
> > > > > >>>>>> do
> > > > > >>>>>>>> you
> > > > > >>>>>>>>>> think?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Thanks,
> > > > > >>>>>>>>>> Dong
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > > > >>>> becket.qin@gmail.com
> > > > > >>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Using the request channel as a dequeue was bright up
> some
> > > > > >>> time
> > > > > >>>>> ago
> > > > > >>>>>>>> when
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > > > > >> concern
> > > > > >>>> was
> > > > > >>>>>> that
> > > > > >>>>>>>>> the
> > > > > >>>>>>>>>>> controller requests are supposed to be processed in
> > order.
> > > > > >>> If
> > > > > >>>> we
> > > > > >>>>>> can
> > > > > >>>>>>>>>> ensure
> > > > > >>>>>>>>>>> that there is one controller request in the request
> > > > > >> channel,
> > > > > >>>> the
> > > > > >>>>>>>> order
> > > > > >>>>>>>>> is
> > > > > >>>>>>>>>>> not a concern. But in cases that there are more than
> one
> > > > > >>>>>> controller
> > > > > >>>>>>>>>> request
> > > > > >>>>>>>>>>> inserted into the queue, the controller request order
> may
> > > > > >>>> change
> > > > > >>>>>> and
> > > > > >>>>>>>>>> cause
> > > > > >>>>>>>>>>> problem. For example, think about the following
> sequence:
> > > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
> > > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head
> of
> > > > > >> the
> > > > > >>>>>> request
> > > > > >>>>>>>>>> queue.
> > > > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > > > >> controller
> > > > > >>>>>>>>> reconnected
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>> the broker.
> > > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > > > > >> request
> > > > > >>>>> queue.
> > > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before R1
> is
> > > > > >>>>>> processed,
> > > > > >>>>>>>>>> which
> > > > > >>>>>>>>>>> may cause problem.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > > >>>>> jjkoshy.w@gmail.com>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a
> simpler
> > > > > >>>> less
> > > > > >>>>>>>>> invasive
> > > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
> > > > > >> you
> > > > > >>>> see
> > > > > >>>>>> any
> > > > > >>>>>>>>>>> pitfalls
> > > > > >>>>>>>>>>>> with this approach?
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > >>>>>>>> lucasatucla@gmail.com>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> @Mayuresh,
> > > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't thought
> > > > > >>>>> before.
> > > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well,
> and
> > > > > >>>> also
> > > > > >>>>>>>>>>>>> avoids the need to have a new size metric and
> capacity
> > > > > >>>>> config
> > > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we were
> > > > > >> to
> > > > > >>>>> adopt
> > > > > >>>>>>>>>>>>> this design, there is no public interface change, and
> > > > > >> we
> > > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily
> satisfy
> > > > > >>> the
> > > > > >>>>>>>>>> requirement
> > > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
> > > > > >> at
> > > > > >>>>> both
> > > > > >>>>>>>> ends.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> My only concern is that this design is tied to the
> > > > > >>>>> coincidence
> > > > > >>>>>>>> that
> > > > > >>>>>>>>>>>>> we have two request priorities and there are two ends
> > > > > >>> to a
> > > > > >>>>>>>> deque.
> > > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > > > > >> network
> > > > > >>>>> layer
> > > > > >>>>>>>> is
> > > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
> > > > > >> we
> > > > > >>>> were
> > > > > >>>>>> to
> > > > > >>>>>>>> add
> > > > > >>>>>>>>>>>>> an extra priority level in the future for some
> reason,
> > > > > >>> we
> > > > > >>>>>> would
> > > > > >>>>>>>>>>> probably
> > > > > >>>>>>>>>>>>> need to go back to the design of separate queues, one
> > > > > >>> for
> > > > > >>>>> each
> > > > > >>>>>>>>>> priority
> > > > > >>>>>>>>>>>>> level.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
> > > > > >>> your
> > > > > >>>>>>>> suggested
> > > > > >>>>>>>>>>>>> approach.
> > > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> @Becket,
> > > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > > > > >>> answering
> > > > > >>>>>> your
> > > > > >>>>>>>>>>> question
> > > > > >>>>>>>>>>>>> only in the context
> > > > > >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> > > > > >> makes
> > > > > >>>>>> sense,
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>> I'm
> > > > > >>>>>>>>>>>> ok
> > > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > > > >> sufficient
> > > > > >>>>>> enough.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>> Lucas
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
> > > > > >>>>>>>> controller
> > > > > >>>>>>>>>>> request
> > > > > >>>>>>>>>>>>>> over any other requests.
> > > > > >>>>>>>>>>>>>> In that case, we can change the request queue to a
> > > > > >>>>> dequeue,
> > > > > >>>>>>>> where
> > > > > >>>>>>>>>> you
> > > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > > > >>>> consume,..etc)
> > > > > >>>>>> to
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>> end
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> > > > > >>> insert
> > > > > >>>>> it
> > > > > >>>>>> to
> > > > > >>>>>>>>> the
> > > > > >>>>>>>>>>> head
> > > > > >>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> the queue. This ensures that the controller request
> > > > > >>> will
> > > > > >>>>> be
> > > > > >>>>>>>> given
> > > > > >>>>>>>>>>>> higher
> > > > > >>>>>>>>>>>>>> priority over other requests.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Also since we only read one request from the socket
> > > > > >>> and
> > > > > >>>>> mute
> > > > > >>>>>>>> it
> > > > > >>>>>>>>> and
> > > > > >>>>>>>>>>>> only
> > > > > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > > > > >>> ensure
> > > > > >>>>> that
> > > > > >>>>>>>> we
> > > > > >>>>>>>>>> don't
> > > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue and
> > > > > >>> the
> > > > > >>>>>>>>> additional
> > > > > >>>>>>>>>>>>> config
> > > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> What do you think ?
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Mayuresh
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > >>>>>>>> becket.qin@gmail.com
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > > > >>> current
> > > > > >>>>>> design
> > > > > >>>>>>>>>> makes
> > > > > >>>>>>>>>>>>> sense.
> > > > > >>>>>>>>>>>>>>> My confusion is about whether the new config for
> > > > > >> the
> > > > > >>>>>>>> controller
> > > > > >>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
> > > > > >>>> which
> > > > > >>>>>>>> users
> > > > > >>>>>>>>>>> would
> > > > > >>>>>>>>>>>>>> change
> > > > > >>>>>>>>>>>>>>> it.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > > > > >>>> expect
> > > > > >>>>>>>> user to
> > > > > >>>>>>>>>>> ever
> > > > > >>>>>>>>>>>>>> change
> > > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > > > > >>> agree
> > > > > >>>>> that
> > > > > >>>>>>>> 20
> > > > > >>>>>>>>> is
> > > > > >>>>>>>>>>>>> already
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > > > >> expect
> > > > > >>>> user
> > > > > >>>>>> to
> > > > > >>>>>>>>>> change
> > > > > >>>>>>>>>>>> it,
> > > > > >>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> @Becket
> > > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > > > > >>>>> normally
> > > > > >>>>>>>> there
> > > > > >>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > > > > >> many
> > > > > >>>>>>>> enqueued
> > > > > >>>>>>>>>>>>> controller
> > > > > >>>>>>>>>>>>>>>>> requests.
> > > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> > > > > >>>> which
> > > > > >>>>>> part
> > > > > >>>>>>>>>>> conveys
> > > > > >>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>> info.
> > > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> > > > > >>>>> section.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > > > >>>> conditions,
> > > > > >>>>>> the
> > > > > >>>>>>>>>> current
> > > > > >>>>>>>>>>>>>> design
> > > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> > > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > > > > >>>>> controller
> > > > > >>>>>>>>> queue
> > > > > >>>>>>>>>>>>> capacity
> > > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > > > >> different
> > > > > >>>>>>>> controllers
> > > > > >>>>>>>>>>> will
> > > > > >>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > > > >> other
> > > > > >>>>> hand,
> > > > > >>>>>>>>> adding
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> extra
> > > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> > > > > >>> from
> > > > > >>>>>>>> issues
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>>>> those
> > > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > > > >>>> downside
> > > > > >>>>> of
> > > > > >>>>>>>>> adding
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> extra
> > > > > >>>>>>>>>>>>>>>>> config.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > > > >>> statement
> > > > > >>>>>> based
> > > > > >>>>>>>> on
> > > > > >>>>>>>>> a
> > > > > >>>>>>>>>>>>> previous
> > > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>> Lucas
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > > > >>> Gharat <
> > > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > > > > >>>> memory
> > > > > >>>>>>>>>>> consumption
> > > > > >>>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>>> rise
> > > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> > > > > >>> go
> > > > > >>>> up
> > > > > >>>>>> to
> > > > > >>>>>>>> 2x"
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> impact
> > > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > > > >> controller
> > > > > >>>> to a
> > > > > >>>>>>>> Broker
> > > > > >>>>>>>>>> are
> > > > > >>>>>>>>>>>> not
> > > > > >>>>>>>>>>>>>>> high
> > > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > > > >>>> control
> > > > > >>>>>>>> plane
> > > > > >>>>>>>>>> from
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> data
> > > > > >>>>>>>>>>>>>>>>>> plane
> > > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > > > >> controller
> > > > > >>>>>> request
> > > > > >>>>>>>>> queue
> > > > > >>>>>>>>>>> may
> > > > > >>>>>>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>>>> many
> > > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > > > > >>> The
> > > > > >>>>>>>>> controller
> > > > > >>>>>>>>>>>>>> requests
> > > > > >>>>>>>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > > > >>> SocketServer
> > > > > >>>>>> will
> > > > > >>>>>>>>> mute
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> channel
> > > > > >>>>>>>>>>>>>>>>>> once
> > > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > > > > >>>>> channel.
> > > > > >>>>>>>> So
> > > > > >>>>>>>>>>>> assuming
> > > > > >>>>>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > > > > >>> each
> > > > > >>>>>>>> broker,
> > > > > >>>>>>>>> on
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> broker
> > > > > >>>>>>>>>>>>>>>>>> side,
> > > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > > > > >>> in
> > > > > >>>>> the
> > > > > >>>>>>>>>>> controller
> > > > > >>>>>>>>>>>>>>> request
> > > > > >>>>>>>>>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > > > > >> we
> > > > > >>>>> need
> > > > > >>>>>> a
> > > > > >>>>>>>>>>> separate
> > > > > >>>>>>>>>>>>>>>>> controller
> > > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > > > > >>>> value
> > > > > >>>>> 20
> > > > > >>>>>>>>> means
> > > > > >>>>>>>>>>> that
> > > > > >>>>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>>>> expect
> > > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > > > > >>> in a
> > > > > >>>>>> short
> > > > > >>>>>>>>>> period
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> time.
> > > > > >>>>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>> am
> > > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > > > >> the
> > > > > >>>>>>>> controller
> > > > > >>>>>>>>>>>> request
> > > > > >>>>>>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > > > >>>>>> indicating
> > > > > >>>>>>>>>>> something
> > > > > >>>>>>>>>>>>>> very
> > > > > >>>>>>>>>>>>>>>>> wrong
> > > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > > > >>> intuitive.
> > > > > >>>> It
> > > > > >>>>>>>> will
> > > > > >>>>>>>>> be
> > > > > >>>>>>>>>>> good
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> learn
> > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > > > >>> Wang <
> > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > > > >>> the
> > > > > >>>>> KIP
> > > > > >>>>>> by
> > > > > >>>>>>>>>>>> explaining
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> cases
> > > > > >>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > > > >>>>>> comments.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > > > >>> Wang
> > > > > >>>> <
> > > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > > > >>>> merely
> > > > > >>>>>>>> for me
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>> easily
> > > > > >>>>>>>>>>>>>>>>>>> construct
> > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > > > >> In
> > > > > >>>>>>>> production,
> > > > > >>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>> than
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> disk
> > > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > > > >> by
> > > > > >>>> high
> > > > > >>>>>>>>> produce
> > > > > >>>>>>>>>>> QPS.
> > > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > > > >>> the
> > > > > >>>>>>>> broker
> > > > > >>>>>>>>> and
> > > > > >>>>>>>>>>>>> that's
> > > > > >>>>>>>>>>>>>>> when
> > > > > >>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > > > >>>> about
> > > > > >>>>>> each
> > > > > >>>>>>>>>>>>>> ProduceRequest
> > > > > >>>>>>>>>>>>>>>>>>> covering
> > > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > > > >>>>> request
> > > > > >>>>>>>> is
> > > > > >>>>>>>>>>>> enqueued
> > > > > >>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>> tries
> > > > > >>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > > > >> broker0,
> > > > > >>>>> from
> > > > > >>>>>>>>> leader
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> follower
> > > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > > > >>>> *test-0*.
> > > > > >>>>>> For
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>> sake
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> argument,
> > > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > > > >>> say
> > > > > >>>>>>>> broker1,
> > > > > >>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>> *stopped*
> > > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > > > >>> have
> > > > > >>>>>> acks =
> > > > > >>>>>>>>> -1
> > > > > >>>>>>>>>>>> (ALL)
> > > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > > > >>>> ProduceRequests
> > > > > >>>>>>>> ahead
> > > > > >>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > > > >>>>> replicated
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>> other
> > > > > >>>>>>>>>>>>>> brokers
> > > > > >>>>>>>>>>>>>>>>>>> (because
> > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > > > >>>>>>>> LeaderAndISR
> > > > > >>>>>>>>>>>> request
> > > > > >>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>> processed
> > > > > >>>>>>>>>>>>>>>>>>>> or
> > > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > > > >>>>> immediately
> > > > > >>>>>>>>>>> transition
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> partition
> > > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > > > >>> the
> > > > > >>>>>>>>>> replication
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> remaining
> > > > > >>>>>>>>>>>>>>>>>>>> 19
> > > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > > > >>>>> indicating
> > > > > >>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > > > >>> the
> > > > > >>>>>>>>> "test-0".
> > > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > > > >> between
> > > > > >>>> 1.1
> > > > > >>>>>> and
> > > > > >>>>>>>>> 1.2,
> > > > > >>>>>>>>>>>> let's
> > > > > >>>>>>>>>>>>>> say
> > > > > >>>>>>>>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > > > >>>>>> LeaderAndISR,
> > > > > >>>>>>>>> and
> > > > > >>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>> 8
> > > > > >>>>>>>>>>>>>>>>> io
> > > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > > > >>>>>> approximately
> > > > > >>>>>>>>> 3000
> > > > > >>>>>>>>>>>>> produce
> > > > > >>>>>>>>>>>>>>>>>> requests.
> > > > > >>>>>>>>>>>>>>>>>>>> Now
> > > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > > > >>>>> finally
> > > > > >>>>>>>>>> processed
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > > > >> we
> > > > > >>>>> model
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>> time
> > > > > >>>>>>>>>>>> when
> > > > > >>>>>>>>>>>>>>> their
> > > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > > > >>>> ...t2999,
> > > > > >>>>>> and
> > > > > >>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > > > >>>>>>>>>>>>>>>>>>>> request
> > > > > >>>>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > > > >>>> request
> > > > > >>>>>>>> would
> > > > > >>>>>>>>>> have
> > > > > >>>>>>>>>>>>>> waited
> > > > > >>>>>>>>>>>>>>> an
> > > > > >>>>>>>>>>>>>>>>>>> extra
> > > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > > > >>> 2nd
> > > > > >>>>> an
> > > > > >>>>>>>> extra
> > > > > >>>>>>>>>>> time
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>> t3000 -
> > > > > >>>>>>>>>>>>>>>>>>> t1,
> > > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > > > >>>> difference
> > > > > >>>>> is
> > > > > >>>>>>>>> bigger
> > > > > >>>>>>>>>>> for
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> earlier
> > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > > > >>> ones.
> > > > > >>>>> For
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>> same
> > > > > >>>>>>>>>>>>>> reason,
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > > > >>>>> benefit
> > > > > >>>>>>>> we
> > > > > >>>>>>>>> get
> > > > > >>>>>>>>>>>>> (capped
> > > > > >>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > > > >>> have
> > > > > >>>>>>>> acks=0 or
> > > > > >>>>>>>>>>>> acks=1
> > > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > > > >> differences
> > > > > >>> in
> > > > > >>>>>> this
> > > > > >>>>>>>>> case,
> > > > > >>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > > > >> of
> > > > > >>>>>>>> partition
> > > > > >>>>>>>>>>> test-0
> > > > > >>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > > > >>> LeaderAndISR
> > > > > >>>>>> will
> > > > > >>>>>>>> be
> > > > > >>>>>>>>>>>> appended
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> local
> > > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > > > >>> after
> > > > > >>>>>>>>> processing
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > > > >> of
> > > > > >>>> data
> > > > > >>>>>>>> loss
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>>>> terms
> > > > > >>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> messages
> > > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > > > >>> the
> > > > > >>>>>> effect
> > > > > >>>>>>>>>> since
> > > > > >>>>>>>>>>> if
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > > > >> response
> > > > > >>> to
> > > > > >>>>>>>>> producers
> > > > > >>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > > > >>> error,
> > > > > >>>>>>>> causing
> > > > > >>>>>>>>>>>> producers
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> retry
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > > > >>> for
> > > > > >>>>>>>> reducing
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> latency
> > > > > >>>>>>>>>>>>>>>>> of a
> > > > > >>>>>>>>>>>>>>>>>>>>> broker
> > > > > >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> > > > > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> > > > > >> latency
> > > > > >>>> of
> > > > > >>>>> a
> > > > > >>>>>>>>> broker
> > > > > >>>>>>>>>>>>> becoming
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> leader.
> > > > > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> > > > > >> more
> > > > > >>>>>>>> obvious, if
> > > > > >>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>> brokers
> > > > > >>>>>>>>>>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> > > > > >>>>>>>>>>>>>>>>>>>>>> current broker should take
> > > > > >> leadership.
> > > > > >>>> Any
> > > > > >>>>>>>> delay
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>>>>>> processing
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> > > > > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> > > > > >>> extreme
> > > > > >>>>>> cases,
> > > > > >>>>>>>>> this
> > > > > >>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>> cause
> > > > > >>>>>>>>>>>>>>>>>> failed
> > > > > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> > > > > >>>>>>>>>>>>>>>>>>>>>> exhausted.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> > > > > >>> requests
> > > > > >>>>> are
> > > > > >>>>>>>>>>>>> UpdateMetadata
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> > > > > >> discuss
> > > > > >>>> as
> > > > > >>>>>>>>> follows:
> > > > > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> > > > > >>>>>> processing
> > > > > >>>>>>>>>> means
> > > > > >>>>>>>>>>>>>> clients
> > > > > >>>>>>>>>>>>>>>>>>> receiving
> > > > > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> > > > > >>>>>> leadership
> > > > > >>>>>>>>> info
> > > > > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> > > > > >> effect
> > > > > >>> is
> > > > > >>>>>> more
> > > > > >>>>>>>>>> retries
> > > > > >>>>>>>>>>>> or
> > > > > >>>>>>>>>>>>>> even
> > > > > >>>>>>>>>>>>>>>>>> fatal
> > > > > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> > > > > >>> queuing
> > > > > >>>>> time
> > > > > >>>>>>>> may
> > > > > >>>>>>>>>>>> degrade
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> performance
> > > > > >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> > > > > >>> delay
> > > > > >>>>> for
> > > > > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> > > > > >>>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>> are right
> > > > > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> > > > > >>>> latency
> > > > > >>>>> in
> > > > > >>>>>>>>>> getting
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> log
> > > > > >>>>>>>>>>>>>>>>> dirs
> > > > > >>>>>>>>>>>>>>>>>>>> info,
> > > > > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> > > > > >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> > > > > >> involved.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > > >>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > > > > >>> Lin
> > > > > >>>> <
> > > > > >>>>>>>>>>>>>> lindong28@gmail.com
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> > > > > >>> good
> > > > > >>>>>>>> point.
> > > > > >>>>>>>>> So
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> feature
> > > > > >>>>>>>>>>>>>>>>> may
> > > > > >>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> > > > > >>>>>> question
> > > > > >>>>>>>>>> below.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> > > > > >>> useful
> > > > > >>>>> for
> > > > > >>>>>>>>>> non-JBOD
> > > > > >>>>>>>>>>>>> setup
> > > > > >>>>>>>>>>>>>>> or
> > > > > >>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>> only
> > > > > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> > > > > >>>>> useful
> > > > > >>>>>> to
> > > > > >>>>>>>>>>>> understand
> > > > > >>>>>>>>>>>>>>> this.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> > > > > >>> in
> > > > > >>>>>> order
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>> move
> > > > > >>>>>>>>>>>>>> leaders
> > > > > >>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> failed
> > > > > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> > > > > >>> operator
> > > > > >>>>>> first
> > > > > >>>>>>>>>> needs
> > > > > >>>>>>>>>>> to
> > > > > >>>>>>>>>>>>> get
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> list
> > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> > > > > >> is
> > > > > >>>>>>>> currently
> > > > > >>>>>>>>>>>> achieved
> > > > > >>>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> > > > > >>>> sends
> > > > > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> > > > > >>>>> controller
> > > > > >>>>>>>>>>> requests,
> > > > > >>>>>>>>>>>>> then
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> > > > > >>>> processed
> > > > > >>>>>> by
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>>> broker.
> > > > > >>>>>>>>>>>>>> So
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> overall
> > > > > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> > > > > >>>> failed
> > > > > >>>>>> disk
> > > > > >>>>>>>>> may
> > > > > >>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>> long
> > > > > >>>>>>>>>>>>>>>>>>> even
> > > > > >>>>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> > > > > >> Lucas
> > > > > >>>>> Wang <
> > > > > >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> > > > > >>>> Jun.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> > > > > >>> your
> > > > > >>>>>>>> previous
> > > > > >>>>>>>>>>> email
> > > > > >>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>> about
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> > > > > >>> it's
> > > > > >>>>>>>> useful,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> > > > > >>> you
> > > > > >>>>>> agree
> > > > > >>>>>>>>> that
> > > > > >>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>> KIP
> > > > > >>>>>>>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> > > > > >> by
> > > > > >>>> Jun?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> > > > > >> Jun
> > > > > >>>> Rao
> > > > > >>>>> <
> > > > > >>>>>>>>>>>>>> jun@confluent.io>
> > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> > > > > >> slow,
> > > > > >>>> one
> > > > > >>>>>>>>> probably
> > > > > >>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>> kill
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> > > > > >> may
> > > > > >>>> not
> > > > > >>>>>>>> help.
> > > > > >>>>>>>>> If
> > > > > >>>>>>>>>>>> only
> > > > > >>>>>>>>>>>>>> one
> > > > > >>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> disks
> > > > > >>>>>>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> > > > > >>> fail
> > > > > >>>>>> that
> > > > > >>>>>>>>> disk
> > > > > >>>>>>>>>>> and
> > > > > >>>>>>>>>>>>> move
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> leaders
> > > > > >>>>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> > > > > >>> case,
> > > > > >>>>>> being
> > > > > >>>>>>>>> able
> > > > > >>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> process
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> > > > > >>>> help
> > > > > >>>>>> the
> > > > > >>>>>>>>>>> producers
> > > > > >>>>>>>>>>>>>>> recover
> > > > > >>>>>>>>>>>>>>>>>>>> quicker.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> > > > > >>> Dong
> > > > > >>>>>> Lin <
> > > > > >>>>>>>>>>>>>>>>> lindong28@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> > > > > >>> follow
> > > > > >>>> up
> > > > > >>>>>>>>>> questions
> > > > > >>>>>>>>>>>>> below.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> > > > > >>>> ProduceRequest
> > > > > >>>>>>>> covers
> > > > > >>>>>>>>> 20
> > > > > >>>>>>>>>>>>>>> partitions
> > > > > >>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> > > > > >>> partitions,
> > > > > >>>>>> then
> > > > > >>>>>>>>> each
> > > > > >>>>>>>>>>>>>>>>> ProduceRequest
> > > > > >>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>> likely
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> > > > > >> which
> > > > > >>>> the
> > > > > >>>>>>>> broker
> > > > > >>>>>>>>> is
> > > > > >>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>> leader
> > > > > >>>>>>>>>>>>>>>>>>> after
> > > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> > > > > >> broker
> > > > > >>>>> will
> > > > > >>>>>>>> still
> > > > > >>>>>>>>>> be
> > > > > >>>>>>>>>>>> slow
> > > > > >>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>> processing
> > > > > >>>>>>>>>>>>>>>>>>>>>>> these
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> > > > > >> will
> > > > > >>>>> still
> > > > > >>>>>> be
> > > > > >>>>>>>>> very
> > > > > >>>>>>>>>>>> high
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>> KIP.
> > > > > >>>>>>>>>>>>>>>>>>>>> It
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> seems
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> > > > > >>>> still
> > > > > >>>>>>>> timeout
> > > > > >>>>>>>>>>> after
> > > > > >>>>>>>>>>>>> 30
> > > > > >>>>>>>>>>>>>>>>>> seconds.
> > > > > >>>>>>>>>>>>>>>>>>> Is
> > > > > >>>>>>>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> > > > > >>>> ProduceRequest
> > > > > >>>>>> will
> > > > > >>>>>>>>>> still
> > > > > >>>>>>>>>>>>>> timeout
> > > > > >>>>>>>>>>>>>>>>> after
> > > > > >>>>>>>>>>>>>>>>>>> 30
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> > > > > >>> KIP
> > > > > >>>>>>>> reduces
> > > > > >>>>>>>>>>>> average
> > > > > >>>>>>>>>>>>>>>>> produce
> > > > > >>>>>>>>>>>>>>>>>>>>> latency.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Can
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> > > > > >>>> improved
> > > > > >>>>>> by
> > > > > >>>>>>>>> this
> > > > > >>>>>>>>>>> KIP?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> > > > > >>>>> directly
> > > > > >>>>>>>> cares
> > > > > >>>>>>>>>>>> number
> > > > > >>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>> truncated
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> messages.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> > > > > >> improve
> > > > > >>>>>> average
> > > > > >>>>>>>>>>>> throughput
> > > > > >>>>>>>>>>>>>> or
> > > > > >>>>>>>>>>>>>>>>>> reduce
> > > > > >>>>>>>>>>>>>>>>>>>>>>> message
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> > > > > >> to
> > > > > >>>>>>>> understand
> > > > > >>>>>>>>>>> this.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> > > > > >>>> Lucas
> > > > > >>>>>>>> Wang <
> > > > > >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> > > > > >>>> comments.
> > > > > >>>>>>>> Please
> > > > > >>>>>>>>>> see
> > > > > >>>>>>>>>>>> my
> > > > > >>>>>>>>>>>>>>> reply
> > > > > >>>>>>>>>>>>>>>>>>> below.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> > > > > >> only
> > > > > >>> 1
> > > > > >>>>>>>>> partition.
> > > > > >>>>>>>>>>> Now
> > > > > >>>>>>>>>>>>>> let's
> > > > > >>>>>>>>>>>>>>>>>>> consider
> > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> common
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> > > > > >> of
> > > > > >>>>> many
> > > > > >>>>>>>>>>> partitions.
> > > > > >>>>>>>>>>>>> And
> > > > > >>>>>>>>>>>>>>>>> let's
> > > > > >>>>>>>>>>>>>>>>>>> say
> > > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> > > > > >>> partitions
> > > > > >>>> on
> > > > > >>>>>>>>> broker0
> > > > > >>>>>>>>>> is
> > > > > >>>>>>>>>>>> so
> > > > > >>>>>>>>>>>>>>> large,
> > > > > >>>>>>>>>>>>>>>>>> say
> > > > > >>>>>>>>>>>>>>>>>>>> 10K,
> > > > > >>>>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> > > > > >> to
> > > > > >>>>> shift
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>>>> leadership
> > > > > >>>>>>>>>>>>>>>>> for a
> > > > > >>>>>>>>>>>>>>>>>>> lot
> > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> > > > > >>>>>> brokers,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> > > > > >>> some
> > > > > >>>>>>>> service
> > > > > >>>>>>>>>> like
> > > > > >>>>>>>>>>>>> cruise
> > > > > >>>>>>>>>>>>>>>>>> control.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> > > > > >>> the
> > > > >
> > > >
> > > >
> > > > --
> > > > -Regards,
> > > > Mayuresh R. Gharat
> > > > (862) 250-7125
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Personally I am not fond of the dequeue approach simply because it is
against the basic idea of isolating the controller plane and data plane.
With a single dequeue, theoretically speaking the controller requests can
starve the clients requests. I would prefer the approach with a separate
controller request queue and a dedicated controller request handler thread.

Thanks,

Jiangjie (Becket) Qin

On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <lu...@gmail.com> wrote:

> Sure, I can summarize the usage of correlation id. But before I do that, it
> seems
> the same out-of-order processing can also happen to Produce requests sent
> by producers,
> following the same example you described earlier.
> If that's the case, I think this probably deserves a separate doc and
> design independent of this KIP.
>
> Lucas
>
>
>
> On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Lucas,
> >
> > Could you update the KIP if you are confident with the approach which
> uses
> > correlation id? The idea around correlation id is kind of scattered
> across
> > multiple emails. It will be useful if other reviews can read the KIP to
> > understand the latest proposal.
> >
> > Thanks,
> > Dong
> >
> > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com> wrote:
> >
> > > I like the idea of the dequeue implementation by Lucas. This will help
> us
> > > avoid additional queue for controller and additional configs in Kafka.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > The usage of correlation ID might still be useful to address the
> cases
> > > > that the controller epoch and leader epoch check are not sufficient
> to
> > > > guarantee correct behavior. For example, if the controller sends a
> > > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker
> > > > processes it in the reverse order, the replica may still be wrongly
> > > > recreated, right?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > Hmm, since we already use controller epoch and leader epoch for
> > > properly
> > > > > caching the latest partition state, do we really need correlation
> id
> > > for
> > > > > ordering the controller requests?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Lucas and Mayuresh,
> > > > >>
> > > > >> Good idea. The correlation id should work.
> > > > >>
> > > > >> In the ControllerChannelManager, a request will be resent until a
> > > > response
> > > > >> is received. So if the controller to broker connection disconnects
> > > after
> > > > >> controller sends R1_a, but before the response of R1_a is
> received,
> > a
> > > > >> disconnection may cause the controller to resend R1_b. i.e. until
> R1
> > > is
> > > > >> acked, R2 won't be sent by the controller.
> > > > >> This gives two guarantees:
> > > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > > >> 2. On the broker side, when R2 is seen, R1 must have been
> processed
> > at
> > > > >> least once.
> > > > >>
> > > > >> So on the broker side, with a single thread controller request
> > > handler,
> > > > the
> > > > >> logic should be:
> > > > >> 1. Process what ever request seen in the controller request queue
> > > > >> 2. For the given epoch, drop request if its correlation id is
> > smaller
> > > > than
> > > > >> that of the last processed request.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jiangjie (Becket) Qin
> > > > >>
> > > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > >>
> > > > >>> I agree that there is no strong ordering when there are more than
> > one
> > > > >>> socket connections. Currently, we rely on controllerEpoch and
> > > > leaderEpoch
> > > > >>> to ensure that the receiving broker picks up the latest state for
> > > each
> > > > >>> partition.
> > > > >>>
> > > > >>> One potential issue with the dequeue approach is that if the
> queue
> > is
> > > > >> full,
> > > > >>> there is no guarantee that the controller requests will be
> enqueued
> > > > >>> quickly.
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> Jun
> > > > >>>
> > > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > > >>> gharatmayuresh15@gmail.com
> > > > >>>> wrote:
> > > > >>>
> > > > >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> > > > >> constructor.
> > > > >>>> Since we reuse the same NetworkClient between Controller and the
> > > > >> broker,
> > > > >>> a
> > > > >>>> disconnection should not cause it to reset to 0, in which case
> it
> > > can
> > > > >> be
> > > > >>>> used to reject obsolete requests.
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>>
> > > > >>>> Mayuresh
> > > > >>>>
> > > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > >>> wrote:
> > > > >>>>
> > > > >>>>> @Dong,
> > > > >>>>> Great example and explanation, thanks!
> > > > >>>>>
> > > > >>>>> @All
> > > > >>>>> Regarding the example given by Dong, it seems even if we use a
> > > queue,
> > > > >>>> and a
> > > > >>>>> dedicated controller request handling thread,
> > > > >>>>> the same result can still happen because R1_a will be sent on
> one
> > > > >>>>> connection, and R1_b & R2 will be sent on a different
> connection,
> > > > >>>>> and there is no ordering between different connections on the
> > > broker
> > > > >>>> side.
> > > > >>>>> I was discussing with Mayuresh offline, and it seems
> correlation
> > id
> > > > >>>> within
> > > > >>>>> the same NetworkClient object is monotonically increasing and
> > never
> > > > >>>> reset,
> > > > >>>>> hence a broker can leverage that to properly reject obsolete
> > > > >> requests.
> > > > >>>>> Thoughts?
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Lucas
> > > > >>>>>
> > > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > > >>>>>
> > > > >>>>>> Actually nvm, correlationId is reset in case of connection
> > loss, I
> > > > >>>> think.
> > > > >>>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>>
> > > > >>>>>> Mayuresh
> > > > >>>>>>
> > > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > >>>>>> gharatmayuresh15@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> I agree with Dong that out-of-order processing can happen
> with
> > > > >>>> having 2
> > > > >>>>>>> separate queues as well and it can even happen today.
> > > > >>>>>>> Can we use the correlationId in the request from the
> controller
> > > > >> to
> > > > >>>> the
> > > > >>>>>>> broker to handle ordering ?
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>>
> > > > >>>>>>> Mayuresh
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> > becket.qin@gmail.com
> > > > >>>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Good point, Joel. I agree that a dedicated controller
> request
> > > > >>>> handling
> > > > >>>>>>>> thread would be a better isolation. It also solves the
> > > > >> reordering
> > > > >>>>> issue.
> > > > >>>>>>>>
> > > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > > >> jjkoshy.w@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Good example. I think this scenario can occur in the
> current
> > > > >>> code
> > > > >>>> as
> > > > >>>>>>>> well
> > > > >>>>>>>>> but with even lower probability given that there are other
> > > > >>>>>>>> non-controller
> > > > >>>>>>>>> requests interleaved. It is still sketchy though and I
> think
> > a
> > > > >>>> safer
> > > > >>>>>>>>> approach would be separate queues and pinning controller
> > > > >> request
> > > > >>>>>>>> handling
> > > > >>>>>>>>> to one handler thread.
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > > >> lindong28@gmail.com
> > > > >>>>
> > > > >>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Hey Becket,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I think you are right that there may be out-of-order
> > > > >>> processing.
> > > > >>>>>>>> However,
> > > > >>>>>>>>>> it seems that out-of-order processing may also happen even
> > > > >> if
> > > > >>> we
> > > > >>>>>> use a
> > > > >>>>>>>>>> separate queue.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Here is the example:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> - Controller sends R1 and got disconnected before
> receiving
> > > > >>>>>> response.
> > > > >>>>>>>>> Then
> > > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
> > > > >>>>> controller
> > > > >>>>>>>>>> request queue in the order they are sent.
> > > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then
> thread2
> > > > >>>> takes
> > > > >>>>>> R2
> > > > >>>>>>>>> from
> > > > >>>>>>>>>> the request queue almost at the same time.
> > > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is
> chance
> > > > >>> that
> > > > >>>>>> R2's
> > > > >>>>>>>>>> processing is completed before R1.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> If out-of-order processing can happen for both approaches
> > > > >> with
> > > > >>>>> very
> > > > >>>>>>>> low
> > > > >>>>>>>>>> probability, it may not be worthwhile to add the extra
> > > > >> queue.
> > > > >>>> What
> > > > >>>>>> do
> > > > >>>>>>>> you
> > > > >>>>>>>>>> think?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thanks,
> > > > >>>>>>>>>> Dong
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > > >>>> becket.qin@gmail.com
> > > > >>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Using the request channel as a dequeue was bright up some
> > > > >>> time
> > > > >>>>> ago
> > > > >>>>>>>> when
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > > > >> concern
> > > > >>>> was
> > > > >>>>>> that
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>> controller requests are supposed to be processed in
> order.
> > > > >>> If
> > > > >>>> we
> > > > >>>>>> can
> > > > >>>>>>>>>> ensure
> > > > >>>>>>>>>>> that there is one controller request in the request
> > > > >> channel,
> > > > >>>> the
> > > > >>>>>>>> order
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>> not a concern. But in cases that there are more than one
> > > > >>>>>> controller
> > > > >>>>>>>>>> request
> > > > >>>>>>>>>>> inserted into the queue, the controller request order may
> > > > >>>> change
> > > > >>>>>> and
> > > > >>>>>>>>>> cause
> > > > >>>>>>>>>>> problem. For example, think about the following sequence:
> > > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
> > > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
> > > > >> the
> > > > >>>>>> request
> > > > >>>>>>>>>> queue.
> > > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > > >> controller
> > > > >>>>>>>>> reconnected
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>> the broker.
> > > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > > > >> request
> > > > >>>>> queue.
> > > > >>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
> > > > >>>>>> processed,
> > > > >>>>>>>>>> which
> > > > >>>>>>>>>>> may cause problem.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > >>>>> jjkoshy.w@gmail.com>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
> > > > >>>> less
> > > > >>>>>>>>> invasive
> > > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
> > > > >> you
> > > > >>>> see
> > > > >>>>>> any
> > > > >>>>>>>>>>> pitfalls
> > > > >>>>>>>>>>>> with this approach?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > >>>>>>>> lucasatucla@gmail.com>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> @Mayuresh,
> > > > >>>>>>>>>>>>> That's a very interesting idea that I haven't thought
> > > > >>>>> before.
> > > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
> > > > >>>> also
> > > > >>>>>>>>>>>>> avoids the need to have a new size metric and capacity
> > > > >>>>> config
> > > > >>>>>>>>>>>>> for the controller request queue. In fact, if we were
> > > > >> to
> > > > >>>>> adopt
> > > > >>>>>>>>>>>>> this design, there is no public interface change, and
> > > > >> we
> > > > >>>>>>>>>>>>> probably don't need a KIP.
> > > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
> > > > >>> the
> > > > >>>>>>>>>> requirement
> > > > >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
> > > > >> at
> > > > >>>>> both
> > > > >>>>>>>> ends.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> My only concern is that this design is tied to the
> > > > >>>>> coincidence
> > > > >>>>>>>> that
> > > > >>>>>>>>>>>>> we have two request priorities and there are two ends
> > > > >>> to a
> > > > >>>>>>>> deque.
> > > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > > > >> network
> > > > >>>>> layer
> > > > >>>>>>>> is
> > > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
> > > > >> we
> > > > >>>> were
> > > > >>>>>> to
> > > > >>>>>>>> add
> > > > >>>>>>>>>>>>> an extra priority level in the future for some reason,
> > > > >>> we
> > > > >>>>>> would
> > > > >>>>>>>>>>> probably
> > > > >>>>>>>>>>>>> need to go back to the design of separate queues, one
> > > > >>> for
> > > > >>>>> each
> > > > >>>>>>>>>> priority
> > > > >>>>>>>>>>>>> level.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
> > > > >>> your
> > > > >>>>>>>> suggested
> > > > >>>>>>>>>>>>> approach.
> > > > >>>>>>>>>>>>> Let's hear what others think.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> @Becket,
> > > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > > > >>> answering
> > > > >>>>>> your
> > > > >>>>>>>>>>> question
> > > > >>>>>>>>>>>>> only in the context
> > > > >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> > > > >> makes
> > > > >>>>>> sense,
> > > > >>>>>>>> and
> > > > >>>>>>>>>> I'm
> > > > >>>>>>>>>>>> ok
> > > > >>>>>>>>>>>>> with removing the capacity config and
> > > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > > >> sufficient
> > > > >>>>>> enough.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>> Lucas
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Hi Lucas,
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
> > > > >>>>>>>> controller
> > > > >>>>>>>>>>> request
> > > > >>>>>>>>>>>>>> over any other requests.
> > > > >>>>>>>>>>>>>> In that case, we can change the request queue to a
> > > > >>>>> dequeue,
> > > > >>>>>>>> where
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > > >>>> consume,..etc)
> > > > >>>>>> to
> > > > >>>>>>>> the
> > > > >>>>>>>>>> end
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> > > > >>> insert
> > > > >>>>> it
> > > > >>>>>> to
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>> head
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> the queue. This ensures that the controller request
> > > > >>> will
> > > > >>>>> be
> > > > >>>>>>>> given
> > > > >>>>>>>>>>>> higher
> > > > >>>>>>>>>>>>>> priority over other requests.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Also since we only read one request from the socket
> > > > >>> and
> > > > >>>>> mute
> > > > >>>>>>>> it
> > > > >>>>>>>>> and
> > > > >>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > > > >>> ensure
> > > > >>>>> that
> > > > >>>>>>>> we
> > > > >>>>>>>>>> don't
> > > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> With this approach we can avoid the second queue and
> > > > >>> the
> > > > >>>>>>>>> additional
> > > > >>>>>>>>>>>>> config
> > > > >>>>>>>>>>>>>> for the size of the queue.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> What do you think ?
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Mayuresh
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > >>>>>>>> becket.qin@gmail.com
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Hey Joel,
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > > >>> current
> > > > >>>>>> design
> > > > >>>>>>>>>> makes
> > > > >>>>>>>>>>>>> sense.
> > > > >>>>>>>>>>>>>>> My confusion is about whether the new config for
> > > > >> the
> > > > >>>>>>>> controller
> > > > >>>>>>>>>>> queue
> > > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
> > > > >>>> which
> > > > >>>>>>>> users
> > > > >>>>>>>>>>> would
> > > > >>>>>>>>>>>>>> change
> > > > >>>>>>>>>>>>>>> it.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > >>>>>>>>>> becket.qin@gmail.com>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > > > >>>> expect
> > > > >>>>>>>> user to
> > > > >>>>>>>>>>> ever
> > > > >>>>>>>>>>>>>> change
> > > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > > > >>> agree
> > > > >>>>> that
> > > > >>>>>>>> 20
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > > >> expect
> > > > >>>> user
> > > > >>>>>> to
> > > > >>>>>>>>>> change
> > > > >>>>>>>>>>>> it,
> > > > >>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > >>>>>>>>>>> lucasatucla@gmail.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> @Becket
> > > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > > > >>>>> normally
> > > > >>>>>>>> there
> > > > >>>>>>>>>>>> should
> > > > >>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > > > >> many
> > > > >>>>>>>> enqueued
> > > > >>>>>>>>>>>>> controller
> > > > >>>>>>>>>>>>>>>>> requests.
> > > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> > > > >>>> which
> > > > >>>>>> part
> > > > >>>>>>>>>>> conveys
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>> info.
> > > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> > > > >>>>> section.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > > >>>> conditions,
> > > > >>>>>> the
> > > > >>>>>>>>>> current
> > > > >>>>>>>>>>>>>> design
> > > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> > > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > > > >>>>> controller
> > > > >>>>>>>>> queue
> > > > >>>>>>>>>>>>> capacity
> > > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > > >> different
> > > > >>>>>>>> controllers
> > > > >>>>>>>>>>> will
> > > > >>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > > >> other
> > > > >>>>> hand,
> > > > >>>>>>>>> adding
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> extra
> > > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> > > > >>> from
> > > > >>>>>>>> issues
> > > > >>>>>>>>> in
> > > > >>>>>>>>>>>> those
> > > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > > >>>> downside
> > > > >>>>> of
> > > > >>>>>>>>> adding
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> extra
> > > > >>>>>>>>>>>>>>>>> config.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > > >>> statement
> > > > >>>>>> based
> > > > >>>>>>>> on
> > > > >>>>>>>>> a
> > > > >>>>>>>>>>>>> previous
> > > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>> Lucas
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > > >>> Gharat <
> > > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > > > >>>> memory
> > > > >>>>>>>>>>> consumption
> > > > >>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>> rise
> > > > >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> > > > >>> go
> > > > >>>> up
> > > > >>>>>> to
> > > > >>>>>>>> 2x"
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> impact
> > > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > > >> controller
> > > > >>>> to a
> > > > >>>>>>>> Broker
> > > > >>>>>>>>>> are
> > > > >>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>> high
> > > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > > >>>> control
> > > > >>>>>>>> plane
> > > > >>>>>>>>>> from
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> data
> > > > >>>>>>>>>>>>>>>>>> plane
> > > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > > >> controller
> > > > >>>>>> request
> > > > >>>>>>>>> queue
> > > > >>>>>>>>>>> may
> > > > >>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>> many
> > > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > > > >>> The
> > > > >>>>>>>>> controller
> > > > >>>>>>>>>>>>>> requests
> > > > >>>>>>>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > > >>> SocketServer
> > > > >>>>>> will
> > > > >>>>>>>>> mute
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> channel
> > > > >>>>>>>>>>>>>>>>>> once
> > > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > > > >>>>> channel.
> > > > >>>>>>>> So
> > > > >>>>>>>>>>>> assuming
> > > > >>>>>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > > > >>> each
> > > > >>>>>>>> broker,
> > > > >>>>>>>>> on
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> broker
> > > > >>>>>>>>>>>>>>>>>> side,
> > > > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > > > >>> in
> > > > >>>>> the
> > > > >>>>>>>>>>> controller
> > > > >>>>>>>>>>>>>>> request
> > > > >>>>>>>>>>>>>>>>>> queue
> > > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > > > >> we
> > > > >>>>> need
> > > > >>>>>> a
> > > > >>>>>>>>>>> separate
> > > > >>>>>>>>>>>>>>>>> controller
> > > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > > > >>>> value
> > > > >>>>> 20
> > > > >>>>>>>>> means
> > > > >>>>>>>>>>> that
> > > > >>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>> expect
> > > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > > > >>> in a
> > > > >>>>>> short
> > > > >>>>>>>>>> period
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> time.
> > > > >>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>> am
> > > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > > >> the
> > > > >>>>>>>> controller
> > > > >>>>>>>>>>>> request
> > > > >>>>>>>>>>>>>>> queue
> > > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > > >>>>>> indicating
> > > > >>>>>>>>>>> something
> > > > >>>>>>>>>>>>>> very
> > > > >>>>>>>>>>>>>>>>> wrong
> > > > >>>>>>>>>>>>>>>>>>> has happened.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > >>>>>>>>>>>> lindong28@gmail.com>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > > >>> intuitive.
> > > > >>>> It
> > > > >>>>>>>> will
> > > > >>>>>>>>> be
> > > > >>>>>>>>>>> good
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> learn
> > > > >>>>>>>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > > >>> Wang <
> > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > > >>> the
> > > > >>>>> KIP
> > > > >>>>>> by
> > > > >>>>>>>>>>>> explaining
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> cases
> > > > >>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > > >>>>>> comments.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > > >>> Wang
> > > > >>>> <
> > > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > > >>>> merely
> > > > >>>>>>>> for me
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>> easily
> > > > >>>>>>>>>>>>>>>>>>> construct
> > > > >>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > > >> In
> > > > >>>>>>>> production,
> > > > >>>>>>>>>>> other
> > > > >>>>>>>>>>>>>> than
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> disk
> > > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > > >> by
> > > > >>>> high
> > > > >>>>>>>>> produce
> > > > >>>>>>>>>>> QPS.
> > > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > > >>> the
> > > > >>>>>>>> broker
> > > > >>>>>>>>> and
> > > > >>>>>>>>>>>>> that's
> > > > >>>>>>>>>>>>>>> when
> > > > >>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>> KIP
> > > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > > >>>> about
> > > > >>>>>> each
> > > > >>>>>>>>>>>>>> ProduceRequest
> > > > >>>>>>>>>>>>>>>>>>> covering
> > > > >>>>>>>>>>>>>>>>>>>>> 20
> > > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > > >>>>> request
> > > > >>>>>>>> is
> > > > >>>>>>>>>>>> enqueued
> > > > >>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>> tries
> > > > >>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > > >> broker0,
> > > > >>>>> from
> > > > >>>>>>>>> leader
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> follower
> > > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > > >>>> *test-0*.
> > > > >>>>>> For
> > > > >>>>>>>> the
> > > > >>>>>>>>>>> sake
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> argument,
> > > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > > >>> say
> > > > >>>>>>>> broker1,
> > > > >>>>>>>>>> have
> > > > >>>>>>>>>>>>>>> *stopped*
> > > > >>>>>>>>>>>>>>>>>>>> fetching
> > > > >>>>>>>>>>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > > >>> have
> > > > >>>>>> acks =
> > > > >>>>>>>>> -1
> > > > >>>>>>>>>>>> (ALL)
> > > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > > >>>> ProduceRequests
> > > > >>>>>>>> ahead
> > > > >>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > > >>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > > >>>>> replicated
> > > > >>>>>>>> to
> > > > >>>>>>>>>> other
> > > > >>>>>>>>>>>>>> brokers
> > > > >>>>>>>>>>>>>>>>>>> (because
> > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > > >>>>>>>> LeaderAndISR
> > > > >>>>>>>>>>>> request
> > > > >>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>> processed
> > > > >>>>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > > >>>>> immediately
> > > > >>>>>>>>>>> transition
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> partition
> > > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > > >>> the
> > > > >>>>>>>>>> replication
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> remaining
> > > > >>>>>>>>>>>>>>>>>>>> 19
> > > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > > >>>>> indicating
> > > > >>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > > >>> the
> > > > >>>>>>>>> "test-0".
> > > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > > >> between
> > > > >>>> 1.1
> > > > >>>>>> and
> > > > >>>>>>>>> 1.2,
> > > > >>>>>>>>>>>> let's
> > > > >>>>>>>>>>>>>> say
> > > > >>>>>>>>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > > >>>>>> LeaderAndISR,
> > > > >>>>>>>>> and
> > > > >>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>> 8
> > > > >>>>>>>>>>>>>>>>> io
> > > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > > >>>>>> approximately
> > > > >>>>>>>>> 3000
> > > > >>>>>>>>>>>>> produce
> > > > >>>>>>>>>>>>>>>>>> requests.
> > > > >>>>>>>>>>>>>>>>>>>> Now
> > > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > > >>>>> finally
> > > > >>>>>>>>>> processed
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > > >> we
> > > > >>>>> model
> > > > >>>>>>>> the
> > > > >>>>>>>>>> time
> > > > >>>>>>>>>>>> when
> > > > >>>>>>>>>>>>>>> their
> > > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > > >>>> ...t2999,
> > > > >>>>>> and
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>>>>>> LeaderAndISR
> > > > >>>>>>>>>>>>>>>>>>>> request
> > > > >>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > > >>>> request
> > > > >>>>>>>> would
> > > > >>>>>>>>>> have
> > > > >>>>>>>>>>>>>> waited
> > > > >>>>>>>>>>>>>>> an
> > > > >>>>>>>>>>>>>>>>>>> extra
> > > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > > >>> 2nd
> > > > >>>>> an
> > > > >>>>>>>> extra
> > > > >>>>>>>>>>> time
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> t3000 -
> > > > >>>>>>>>>>>>>>>>>>> t1,
> > > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > > >>>> difference
> > > > >>>>> is
> > > > >>>>>>>>> bigger
> > > > >>>>>>>>>>> for
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> earlier
> > > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > > >>> ones.
> > > > >>>>> For
> > > > >>>>>>>> the
> > > > >>>>>>>>>> same
> > > > >>>>>>>>>>>>>> reason,
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > > >>>>> benefit
> > > > >>>>>>>> we
> > > > >>>>>>>>> get
> > > > >>>>>>>>>>>>> (capped
> > > > >>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > > >>> have
> > > > >>>>>>>> acks=0 or
> > > > >>>>>>>>>>>> acks=1
> > > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > > >> differences
> > > > >>> in
> > > > >>>>>> this
> > > > >>>>>>>>> case,
> > > > >>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > > >> of
> > > > >>>>>>>> partition
> > > > >>>>>>>>>>> test-0
> > > > >>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > > >>> LeaderAndISR
> > > > >>>>>> will
> > > > >>>>>>>> be
> > > > >>>>>>>>>>>> appended
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> local
> > > > >>>>>>>>>>>>>>>>>>>>> log,
> > > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > > >>> after
> > > > >>>>>>>>> processing
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > > >> of
> > > > >>>> data
> > > > >>>>>>>> loss
> > > > >>>>>>>>> in
> > > > >>>>>>>>>>>> terms
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> messages
> > > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > > >>> the
> > > > >>>>>> effect
> > > > >>>>>>>>>> since
> > > > >>>>>>>>>>> if
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > > >> response
> > > > >>> to
> > > > >>>>>>>>> producers
> > > > >>>>>>>>>>> will
> > > > >>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > > >>> error,
> > > > >>>>>>>> causing
> > > > >>>>>>>>>>>> producers
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> retry
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > > >>> for
> > > > >>>>>>>> reducing
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>> latency
> > > > >>>>>>>>>>>>>>>>> of a
> > > > >>>>>>>>>>>>>>>>>>>>> broker
> > > > >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> > > > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> > > > >> latency
> > > > >>>> of
> > > > >>>>> a
> > > > >>>>>>>>> broker
> > > > >>>>>>>>>>>>> becoming
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> leader.
> > > > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> > > > >> more
> > > > >>>>>>>> obvious, if
> > > > >>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>> brokers
> > > > >>>>>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> > > > >>>>>>>>>>>>>>>>>>>>>> current broker should take
> > > > >> leadership.
> > > > >>>> Any
> > > > >>>>>>>> delay
> > > > >>>>>>>>> in
> > > > >>>>>>>>>>>>>> processing
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> > > > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> > > > >>> extreme
> > > > >>>>>> cases,
> > > > >>>>>>>>> this
> > > > >>>>>>>>>>> can
> > > > >>>>>>>>>>>>>> cause
> > > > >>>>>>>>>>>>>>>>>> failed
> > > > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> > > > >>>>>>>>>>>>>>>>>>>>>> exhausted.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> > > > >>> requests
> > > > >>>>> are
> > > > >>>>>>>>>>>>> UpdateMetadata
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> > > > >> discuss
> > > > >>>> as
> > > > >>>>>>>>> follows:
> > > > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> > > > >>>>>> processing
> > > > >>>>>>>>>> means
> > > > >>>>>>>>>>>>>> clients
> > > > >>>>>>>>>>>>>>>>>>> receiving
> > > > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> > > > >>>>>> leadership
> > > > >>>>>>>>> info
> > > > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> > > > >> effect
> > > > >>> is
> > > > >>>>>> more
> > > > >>>>>>>>>> retries
> > > > >>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>> even
> > > > >>>>>>>>>>>>>>>>>> fatal
> > > > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> > > > >>> queuing
> > > > >>>>> time
> > > > >>>>>>>> may
> > > > >>>>>>>>>>>> degrade
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> performance
> > > > >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> > > > >>> delay
> > > > >>>>> for
> > > > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> > > > >>>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>> are right
> > > > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> > > > >>>> latency
> > > > >>>>> in
> > > > >>>>>>>>>> getting
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> log
> > > > >>>>>>>>>>>>>>>>> dirs
> > > > >>>>>>>>>>>>>>>>>>>> info,
> > > > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> > > > >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> > > > >> involved.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > > > >>> Lin
> > > > >>>> <
> > > > >>>>>>>>>>>>>> lindong28@gmail.com
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> > > > >>> good
> > > > >>>>>>>> point.
> > > > >>>>>>>>> So
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> feature
> > > > >>>>>>>>>>>>>>>>> may
> > > > >>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> > > > >>>>>> question
> > > > >>>>>>>>>> below.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> > > > >>> useful
> > > > >>>>> for
> > > > >>>>>>>>>> non-JBOD
> > > > >>>>>>>>>>>>> setup
> > > > >>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> > > > >>>>> useful
> > > > >>>>>> to
> > > > >>>>>>>>>>>> understand
> > > > >>>>>>>>>>>>>>> this.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> > > > >>> in
> > > > >>>>>> order
> > > > >>>>>>>> to
> > > > >>>>>>>>>> move
> > > > >>>>>>>>>>>>>> leaders
> > > > >>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> failed
> > > > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> > > > >>> operator
> > > > >>>>>> first
> > > > >>>>>>>>>> needs
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>> get
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> list
> > > > >>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> > > > >> is
> > > > >>>>>>>> currently
> > > > >>>>>>>>>>>> achieved
> > > > >>>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> > > > >>>> sends
> > > > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> > > > >>>>> controller
> > > > >>>>>>>>>>> requests,
> > > > >>>>>>>>>>>>> then
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> > > > >>>> processed
> > > > >>>>>> by
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>> broker.
> > > > >>>>>>>>>>>>>> So
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> overall
> > > > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> > > > >>>> failed
> > > > >>>>>> disk
> > > > >>>>>>>>> may
> > > > >>>>>>>>>>>> still
> > > > >>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>> long
> > > > >>>>>>>>>>>>>>>>>>> even
> > > > >>>>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> > > > >> Lucas
> > > > >>>>> Wang <
> > > > >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> > > > >>>> Jun.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> > > > >>> your
> > > > >>>>>>>> previous
> > > > >>>>>>>>>>> email
> > > > >>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>> about
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> > > > >>> it's
> > > > >>>>>>>> useful,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> > > > >>> you
> > > > >>>>>> agree
> > > > >>>>>>>>> that
> > > > >>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>> KIP
> > > > >>>>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> > > > >> by
> > > > >>>> Jun?
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> > > > >> Jun
> > > > >>>> Rao
> > > > >>>>> <
> > > > >>>>>>>>>>>>>> jun@confluent.io>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> > > > >> slow,
> > > > >>>> one
> > > > >>>>>>>>> probably
> > > > >>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>> kill
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> > > > >> may
> > > > >>>> not
> > > > >>>>>>>> help.
> > > > >>>>>>>>> If
> > > > >>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>> one
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> disks
> > > > >>>>>>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> > > > >>> fail
> > > > >>>>>> that
> > > > >>>>>>>>> disk
> > > > >>>>>>>>>>> and
> > > > >>>>>>>>>>>>> move
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> leaders
> > > > >>>>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> > > > >>> case,
> > > > >>>>>> being
> > > > >>>>>>>>> able
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> process
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> > > > >>>> help
> > > > >>>>>> the
> > > > >>>>>>>>>>> producers
> > > > >>>>>>>>>>>>>>> recover
> > > > >>>>>>>>>>>>>>>>>>>> quicker.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> > > > >>> Dong
> > > > >>>>>> Lin <
> > > > >>>>>>>>>>>>>>>>> lindong28@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> > > > >>> follow
> > > > >>>> up
> > > > >>>>>>>>>> questions
> > > > >>>>>>>>>>>>> below.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> > > > >>>> ProduceRequest
> > > > >>>>>>>> covers
> > > > >>>>>>>>> 20
> > > > >>>>>>>>>>>>>>> partitions
> > > > >>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> > > > >>> partitions,
> > > > >>>>>> then
> > > > >>>>>>>>> each
> > > > >>>>>>>>>>>>>>>>> ProduceRequest
> > > > >>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>> likely
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> > > > >> which
> > > > >>>> the
> > > > >>>>>>>> broker
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>> leader
> > > > >>>>>>>>>>>>>>>>>>> after
> > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> > > > >> broker
> > > > >>>>> will
> > > > >>>>>>>> still
> > > > >>>>>>>>>> be
> > > > >>>>>>>>>>>> slow
> > > > >>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>> processing
> > > > >>>>>>>>>>>>>>>>>>>>>>> these
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> > > > >> will
> > > > >>>>> still
> > > > >>>>>> be
> > > > >>>>>>>>> very
> > > > >>>>>>>>>>>> high
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>> KIP.
> > > > >>>>>>>>>>>>>>>>>>>>> It
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> seems
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> > > > >>>> still
> > > > >>>>>>>> timeout
> > > > >>>>>>>>>>> after
> > > > >>>>>>>>>>>>> 30
> > > > >>>>>>>>>>>>>>>>>> seconds.
> > > > >>>>>>>>>>>>>>>>>>> Is
> > > > >>>>>>>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> > > > >>>> ProduceRequest
> > > > >>>>>> will
> > > > >>>>>>>>>> still
> > > > >>>>>>>>>>>>>> timeout
> > > > >>>>>>>>>>>>>>>>> after
> > > > >>>>>>>>>>>>>>>>>>> 30
> > > > >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> > > > >>> KIP
> > > > >>>>>>>> reduces
> > > > >>>>>>>>>>>> average
> > > > >>>>>>>>>>>>>>>>> produce
> > > > >>>>>>>>>>>>>>>>>>>>> latency.
> > > > >>>>>>>>>>>>>>>>>>>>>>> Can
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> > > > >>>> improved
> > > > >>>>>> by
> > > > >>>>>>>>> this
> > > > >>>>>>>>>>> KIP?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> > > > >>>>> directly
> > > > >>>>>>>> cares
> > > > >>>>>>>>>>>> number
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>> truncated
> > > > >>>>>>>>>>>>>>>>>>>>>>>> messages.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> > > > >> improve
> > > > >>>>>> average
> > > > >>>>>>>>>>>> throughput
> > > > >>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>> reduce
> > > > >>>>>>>>>>>>>>>>>>>>>>> message
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> > > > >> to
> > > > >>>>>>>> understand
> > > > >>>>>>>>>>> this.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> > > > >>>> Lucas
> > > > >>>>>>>> Wang <
> > > > >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> > > > >>>> comments.
> > > > >>>>>>>> Please
> > > > >>>>>>>>>> see
> > > > >>>>>>>>>>>> my
> > > > >>>>>>>>>>>>>>> reply
> > > > >>>>>>>>>>>>>>>>>>> below.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> > > > >> only
> > > > >>> 1
> > > > >>>>>>>>> partition.
> > > > >>>>>>>>>>> Now
> > > > >>>>>>>>>>>>>> let's
> > > > >>>>>>>>>>>>>>>>>>> consider
> > > > >>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> common
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> > > > >> of
> > > > >>>>> many
> > > > >>>>>>>>>>> partitions.
> > > > >>>>>>>>>>>>> And
> > > > >>>>>>>>>>>>>>>>> let's
> > > > >>>>>>>>>>>>>>>>>>> say
> > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> > > > >>> partitions
> > > > >>>> on
> > > > >>>>>>>>> broker0
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>>> large,
> > > > >>>>>>>>>>>>>>>>>> say
> > > > >>>>>>>>>>>>>>>>>>>> 10K,
> > > > >>>>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> > > > >> to
> > > > >>>>> shift
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>>> leadership
> > > > >>>>>>>>>>>>>>>>> for a
> > > > >>>>>>>>>>>>>>>>>>> lot
> > > > >>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> > > > >>>>>> brokers,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> > > > >>> some
> > > > >>>>>>>> service
> > > > >>>>>>>>>> like
> > > > >>>>>>>>>>>>> cruise
> > > > >>>>>>>>>>>>>>>>>> control.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> > > > >>> the
> > > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Sure, I can summarize the usage of correlation id. But before I do that, it
seems
the same out-of-order processing can also happen to Produce requests sent
by producers,
following the same example you described earlier.
If that's the case, I think this probably deserves a separate doc and
design independent of this KIP.

Lucas



On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Lucas,
>
> Could you update the KIP if you are confident with the approach which uses
> correlation id? The idea around correlation id is kind of scattered across
> multiple emails. It will be useful if other reviews can read the KIP to
> understand the latest proposal.
>
> Thanks,
> Dong
>
> On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
>
> > I like the idea of the dequeue implementation by Lucas. This will help us
> > avoid additional queue for controller and additional configs in Kafka.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > Hi Jun,
> > >
> > > The usage of correlation ID might still be useful to address the cases
> > > that the controller epoch and leader epoch check are not sufficient to
> > > guarantee correct behavior. For example, if the controller sends a
> > > LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker
> > > processes it in the reverse order, the replica may still be wrongly
> > > recreated, right?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > Hmm, since we already use controller epoch and leader epoch for
> > properly
> > > > caching the latest partition state, do we really need correlation id
> > for
> > > > ordering the controller requests?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com>
> > > wrote:
> > > >
> > > >> Lucas and Mayuresh,
> > > >>
> > > >> Good idea. The correlation id should work.
> > > >>
> > > >> In the ControllerChannelManager, a request will be resent until a
> > > response
> > > >> is received. So if the controller to broker connection disconnects
> > after
> > > >> controller sends R1_a, but before the response of R1_a is received,
> a
> > > >> disconnection may cause the controller to resend R1_b. i.e. until R1
> > is
> > > >> acked, R2 won't be sent by the controller.
> > > >> This gives two guarantees:
> > > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > > >> 2. On the broker side, when R2 is seen, R1 must have been processed
> at
> > > >> least once.
> > > >>
> > > >> So on the broker side, with a single thread controller request
> > handler,
> > > the
> > > >> logic should be:
> > > >> 1. Process what ever request seen in the controller request queue
> > > >> 2. For the given epoch, drop request if its correlation id is
> smaller
> > > than
> > > >> that of the last processed request.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
> > > >>
> > > >>> I agree that there is no strong ordering when there are more than
> one
> > > >>> socket connections. Currently, we rely on controllerEpoch and
> > > leaderEpoch
> > > >>> to ensure that the receiving broker picks up the latest state for
> > each
> > > >>> partition.
> > > >>>
> > > >>> One potential issue with the dequeue approach is that if the queue
> is
> > > >> full,
> > > >>> there is no guarantee that the controller requests will be enqueued
> > > >>> quickly.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Jun
> > > >>>
> > > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > > >>> gharatmayuresh15@gmail.com
> > > >>>> wrote:
> > > >>>
> > > >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> > > >> constructor.
> > > >>>> Since we reuse the same NetworkClient between Controller and the
> > > >> broker,
> > > >>> a
> > > >>>> disconnection should not cause it to reset to 0, in which case it
> > can
> > > >> be
> > > >>>> used to reject obsolete requests.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> Mayuresh
> > > >>>>
> > > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lucasatucla@gmail.com
> >
> > > >>> wrote:
> > > >>>>
> > > >>>>> @Dong,
> > > >>>>> Great example and explanation, thanks!
> > > >>>>>
> > > >>>>> @All
> > > >>>>> Regarding the example given by Dong, it seems even if we use a
> > queue,
> > > >>>> and a
> > > >>>>> dedicated controller request handling thread,
> > > >>>>> the same result can still happen because R1_a will be sent on one
> > > >>>>> connection, and R1_b & R2 will be sent on a different connection,
> > > >>>>> and there is no ordering between different connections on the
> > broker
> > > >>>> side.
> > > >>>>> I was discussing with Mayuresh offline, and it seems correlation
> id
> > > >>>> within
> > > >>>>> the same NetworkClient object is monotonically increasing and
> never
> > > >>>> reset,
> > > >>>>> hence a broker can leverage that to properly reject obsolete
> > > >> requests.
> > > >>>>> Thoughts?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Lucas
> > > >>>>>
> > > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > >>>>> gharatmayuresh15@gmail.com> wrote:
> > > >>>>>
> > > >>>>>> Actually nvm, correlationId is reset in case of connection
> loss, I
> > > >>>> think.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>>
> > > >>>>>> Mayuresh
> > > >>>>>>
> > > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > >>>>>> gharatmayuresh15@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> I agree with Dong that out-of-order processing can happen with
> > > >>>> having 2
> > > >>>>>>> separate queues as well and it can even happen today.
> > > >>>>>>> Can we use the correlationId in the request from the controller
> > > >> to
> > > >>>> the
> > > >>>>>>> broker to handle ordering ?
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>>
> > > >>>>>>> Mayuresh
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <
> becket.qin@gmail.com
> > > >>>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Good point, Joel. I agree that a dedicated controller request
> > > >>>> handling
> > > >>>>>>>> thread would be a better isolation. It also solves the
> > > >> reordering
> > > >>>>> issue.
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > > >> jjkoshy.w@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Good example. I think this scenario can occur in the current
> > > >>> code
> > > >>>> as
> > > >>>>>>>> well
> > > >>>>>>>>> but with even lower probability given that there are other
> > > >>>>>>>> non-controller
> > > >>>>>>>>> requests interleaved. It is still sketchy though and I think
> a
> > > >>>> safer
> > > >>>>>>>>> approach would be separate queues and pinning controller
> > > >> request
> > > >>>>>>>> handling
> > > >>>>>>>>> to one handler thread.
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > > >> lindong28@gmail.com
> > > >>>>
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hey Becket,
> > > >>>>>>>>>>
> > > >>>>>>>>>> I think you are right that there may be out-of-order
> > > >>> processing.
> > > >>>>>>>> However,
> > > >>>>>>>>>> it seems that out-of-order processing may also happen even
> > > >> if
> > > >>> we
> > > >>>>>> use a
> > > >>>>>>>>>> separate queue.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Here is the example:
> > > >>>>>>>>>>
> > > >>>>>>>>>> - Controller sends R1 and got disconnected before receiving
> > > >>>>>> response.
> > > >>>>>>>>> Then
> > > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
> > > >>>>> controller
> > > >>>>>>>>>> request queue in the order they are sent.
> > > >>>>>>>>>> - thread1 takes R1_a from the request queue and then thread2
> > > >>>> takes
> > > >>>>>> R2
> > > >>>>>>>>> from
> > > >>>>>>>>>> the request queue almost at the same time.
> > > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is chance
> > > >>> that
> > > >>>>>> R2's
> > > >>>>>>>>>> processing is completed before R1.
> > > >>>>>>>>>>
> > > >>>>>>>>>> If out-of-order processing can happen for both approaches
> > > >> with
> > > >>>>> very
> > > >>>>>>>> low
> > > >>>>>>>>>> probability, it may not be worthwhile to add the extra
> > > >> queue.
> > > >>>> What
> > > >>>>>> do
> > > >>>>>>>> you
> > > >>>>>>>>>> think?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>> Dong
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > >>>> becket.qin@gmail.com
> > > >>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Mayuresh/Joel,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Using the request channel as a dequeue was bright up some
> > > >>> time
> > > >>>>> ago
> > > >>>>>>>> when
> > > >>>>>>>>>> we
> > > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > > >> concern
> > > >>>> was
> > > >>>>>> that
> > > >>>>>>>>> the
> > > >>>>>>>>>>> controller requests are supposed to be processed in order.
> > > >>> If
> > > >>>> we
> > > >>>>>> can
> > > >>>>>>>>>> ensure
> > > >>>>>>>>>>> that there is one controller request in the request
> > > >> channel,
> > > >>>> the
> > > >>>>>>>> order
> > > >>>>>>>>> is
> > > >>>>>>>>>>> not a concern. But in cases that there are more than one
> > > >>>>>> controller
> > > >>>>>>>>>> request
> > > >>>>>>>>>>> inserted into the queue, the controller request order may
> > > >>>> change
> > > >>>>>> and
> > > >>>>>>>>>> cause
> > > >>>>>>>>>>> problem. For example, think about the following sequence:
> > > >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
> > > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
> > > >> the
> > > >>>>>> request
> > > >>>>>>>>>> queue.
> > > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > > >> controller
> > > >>>>>>>>> reconnected
> > > >>>>>>>>>> to
> > > >>>>>>>>>>> the broker.
> > > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > > >> request
> > > >>>>> queue.
> > > >>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
> > > >>>>>> processed,
> > > >>>>>>>>>> which
> > > >>>>>>>>>>> may cause problem.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Jiangjie (Becket) Qin
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > >>>>> jjkoshy.w@gmail.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
> > > >>>> less
> > > >>>>>>>>> invasive
> > > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
> > > >> you
> > > >>>> see
> > > >>>>>> any
> > > >>>>>>>>>>> pitfalls
> > > >>>>>>>>>>>> with this approach?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > >>>>>>>> lucasatucla@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> @Mayuresh,
> > > >>>>>>>>>>>>> That's a very interesting idea that I haven't thought
> > > >>>>> before.
> > > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
> > > >>>> also
> > > >>>>>>>>>>>>> avoids the need to have a new size metric and capacity
> > > >>>>> config
> > > >>>>>>>>>>>>> for the controller request queue. In fact, if we were
> > > >> to
> > > >>>>> adopt
> > > >>>>>>>>>>>>> this design, there is no public interface change, and
> > > >> we
> > > >>>>>>>>>>>>> probably don't need a KIP.
> > > >>>>>>>>>>>>> Also implementation wise, it seems
> > > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
> > > >>> the
> > > >>>>>>>>>> requirement
> > > >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
> > > >> at
> > > >>>>> both
> > > >>>>>>>> ends.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> My only concern is that this design is tied to the
> > > >>>>> coincidence
> > > >>>>>>>> that
> > > >>>>>>>>>>>>> we have two request priorities and there are two ends
> > > >>> to a
> > > >>>>>>>> deque.
> > > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > > >> network
> > > >>>>> layer
> > > >>>>>>>> is
> > > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
> > > >> we
> > > >>>> were
> > > >>>>>> to
> > > >>>>>>>> add
> > > >>>>>>>>>>>>> an extra priority level in the future for some reason,
> > > >>> we
> > > >>>>>> would
> > > >>>>>>>>>>> probably
> > > >>>>>>>>>>>>> need to go back to the design of separate queues, one
> > > >>> for
> > > >>>>> each
> > > >>>>>>>>>> priority
> > > >>>>>>>>>>>>> level.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
> > > >>> your
> > > >>>>>>>> suggested
> > > >>>>>>>>>>>>> approach.
> > > >>>>>>>>>>>>> Let's hear what others think.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> @Becket,
> > > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > > >>> answering
> > > >>>>>> your
> > > >>>>>>>>>>> question
> > > >>>>>>>>>>>>> only in the context
> > > >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> > > >> makes
> > > >>>>>> sense,
> > > >>>>>>>> and
> > > >>>>>>>>>> I'm
> > > >>>>>>>>>>>> ok
> > > >>>>>>>>>>>>> with removing the capacity config and
> > > >>>>>>>>>>>>> just relying on the default value of 20 being
> > > >> sufficient
> > > >>>>>> enough.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> Lucas
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi Lucas,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
> > > >>>>>>>> controller
> > > >>>>>>>>>>> request
> > > >>>>>>>>>>>>>> over any other requests.
> > > >>>>>>>>>>>>>> In that case, we can change the request queue to a
> > > >>>>> dequeue,
> > > >>>>>>>> where
> > > >>>>>>>>>> you
> > > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > > >>>> consume,..etc)
> > > >>>>>> to
> > > >>>>>>>> the
> > > >>>>>>>>>> end
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> > > >>> insert
> > > >>>>> it
> > > >>>>>> to
> > > >>>>>>>>> the
> > > >>>>>>>>>>> head
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>> the queue. This ensures that the controller request
> > > >>> will
> > > >>>>> be
> > > >>>>>>>> given
> > > >>>>>>>>>>>> higher
> > > >>>>>>>>>>>>>> priority over other requests.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Also since we only read one request from the socket
> > > >>> and
> > > >>>>> mute
> > > >>>>>>>> it
> > > >>>>>>>>> and
> > > >>>>>>>>>>>> only
> > > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > > >>> ensure
> > > >>>>> that
> > > >>>>>>>> we
> > > >>>>>>>>>> don't
> > > >>>>>>>>>>>>>> handle controller requests out of order.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> With this approach we can avoid the second queue and
> > > >>> the
> > > >>>>>>>>> additional
> > > >>>>>>>>>>>>> config
> > > >>>>>>>>>>>>>> for the size of the queue.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> What do you think ?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Mayuresh
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > >>>>>>>> becket.qin@gmail.com
> > > >>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hey Joel,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > > >>> current
> > > >>>>>> design
> > > >>>>>>>>>> makes
> > > >>>>>>>>>>>>> sense.
> > > >>>>>>>>>>>>>>> My confusion is about whether the new config for
> > > >> the
> > > >>>>>>>> controller
> > > >>>>>>>>>>> queue
> > > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
> > > >>>> which
> > > >>>>>>>> users
> > > >>>>>>>>>>> would
> > > >>>>>>>>>>>>>> change
> > > >>>>>>>>>>>>>>> it.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > >>>>>>>>>> becket.qin@gmail.com>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Lucas,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > > >>>> expect
> > > >>>>>>>> user to
> > > >>>>>>>>>>> ever
> > > >>>>>>>>>>>>>> change
> > > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > > >>> agree
> > > >>>>> that
> > > >>>>>>>> 20
> > > >>>>>>>>> is
> > > >>>>>>>>>>>>> already
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> very generous default number and we do not
> > > >> expect
> > > >>>> user
> > > >>>>>> to
> > > >>>>>>>>>> change
> > > >>>>>>>>>>>> it,
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > >>>>>>>>>>> lucasatucla@gmail.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> @Becket
> > > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > > >>>>> normally
> > > >>>>>>>> there
> > > >>>>>>>>>>>> should
> > > >>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > > >> many
> > > >>>>>>>> enqueued
> > > >>>>>>>>>>>>> controller
> > > >>>>>>>>>>>>>>>>> requests.
> > > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> > > >>>> which
> > > >>>>>> part
> > > >>>>>>>>>>> conveys
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>> info.
> > > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> > > >>>>> section.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > > >>>> conditions,
> > > >>>>>> the
> > > >>>>>>>>>> current
> > > >>>>>>>>>>>>>> design
> > > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> > > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > > >>>>> controller
> > > >>>>>>>>> queue
> > > >>>>>>>>>>>>> capacity
> > > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > > >>>>>>>>>>>>>>>>> network threads handling requests from
> > > >> different
> > > >>>>>>>> controllers
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > > >> other
> > > >>>>> hand,
> > > >>>>>>>>> adding
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> extra
> > > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> > > >>> from
> > > >>>>>>>> issues
> > > >>>>>>>>> in
> > > >>>>>>>>>>>> those
> > > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > > >>>> downside
> > > >>>>> of
> > > >>>>>>>>> adding
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> extra
> > > >>>>>>>>>>>>>>>>> config.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> @Mayuresh
> > > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > > >>> statement
> > > >>>>>> based
> > > >>>>>>>> on
> > > >>>>>>>>> a
> > > >>>>>>>>>>>>> previous
> > > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>> Lucas
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > > >>> Gharat <
> > > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > > >>>> memory
> > > >>>>>>>>>>> consumption
> > > >>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>> rise
> > > >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> > > >>> go
> > > >>>> up
> > > >>>>>> to
> > > >>>>>>>> 2x"
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> impact
> > > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > > >> controller
> > > >>>> to a
> > > >>>>>>>> Broker
> > > >>>>>>>>>> are
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>> high
> > > >>>>>>>>>>>>>>>>>> volume, right ?
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Mayuresh
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > >>>>>>>>>>>> becket.qin@gmail.com>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > > >>>> control
> > > >>>>>>>> plane
> > > >>>>>>>>>> from
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> data
> > > >>>>>>>>>>>>>>>>>> plane
> > > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > > >> controller
> > > >>>>>> request
> > > >>>>>>>>> queue
> > > >>>>>>>>>>> may
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>> many
> > > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > > >>> The
> > > >>>>>>>>> controller
> > > >>>>>>>>>>>>>> requests
> > > >>>>>>>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > > >>> SocketServer
> > > >>>>>> will
> > > >>>>>>>>> mute
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> channel
> > > >>>>>>>>>>>>>>>>>> once
> > > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > > >>>>> channel.
> > > >>>>>>>> So
> > > >>>>>>>>>>>> assuming
> > > >>>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > > >>> each
> > > >>>>>>>> broker,
> > > >>>>>>>>> on
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> broker
> > > >>>>>>>>>>>>>>>>>> side,
> > > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > > >>> in
> > > >>>>> the
> > > >>>>>>>>>>> controller
> > > >>>>>>>>>>>>>>> request
> > > >>>>>>>>>>>>>>>>>> queue
> > > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > > >> we
> > > >>>>> need
> > > >>>>>> a
> > > >>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>> controller
> > > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > > >>>> value
> > > >>>>> 20
> > > >>>>>>>>> means
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> expect
> > > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > > >>> in a
> > > >>>>>> short
> > > >>>>>>>>>> period
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> time.
> > > >>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>> am
> > > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > > >> the
> > > >>>>>>>> controller
> > > >>>>>>>>>>>> request
> > > >>>>>>>>>>>>>>> queue
> > > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > > >>>>>> indicating
> > > >>>>>>>>>>> something
> > > >>>>>>>>>>>>>> very
> > > >>>>>>>>>>>>>>>>> wrong
> > > >>>>>>>>>>>>>>>>>>> has happened.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > >>>>>>>>>>>> lindong28@gmail.com>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > > >>> intuitive.
> > > >>>> It
> > > >>>>>>>> will
> > > >>>>>>>>> be
> > > >>>>>>>>>>> good
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> learn
> > > >>>>>>>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > > >>> Wang <
> > > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > > >>> the
> > > >>>>> KIP
> > > >>>>>> by
> > > >>>>>>>>>>>> explaining
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> cases
> > > >>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > > >>>>>> comments.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>> Lucas
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > > >>> Wang
> > > >>>> <
> > > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > > >>>> merely
> > > >>>>>>>> for me
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> easily
> > > >>>>>>>>>>>>>>>>>>> construct
> > > >>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > > >> In
> > > >>>>>>>> production,
> > > >>>>>>>>>>> other
> > > >>>>>>>>>>>>>> than
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> disk
> > > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > > >> by
> > > >>>> high
> > > >>>>>>>>> produce
> > > >>>>>>>>>>> QPS.
> > > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > > >>> the
> > > >>>>>>>> broker
> > > >>>>>>>>> and
> > > >>>>>>>>>>>>> that's
> > > >>>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > > >>>> about
> > > >>>>>> each
> > > >>>>>>>>>>>>>> ProduceRequest
> > > >>>>>>>>>>>>>>>>>>> covering
> > > >>>>>>>>>>>>>>>>>>>>> 20
> > > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > > >>>>> request
> > > >>>>>>>> is
> > > >>>>>>>>>>>> enqueued
> > > >>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> tries
> > > >>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > > >> broker0,
> > > >>>>> from
> > > >>>>>>>>> leader
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>>> follower
> > > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > > >>>> *test-0*.
> > > >>>>>> For
> > > >>>>>>>> the
> > > >>>>>>>>>>> sake
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> argument,
> > > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > > >>> say
> > > >>>>>>>> broker1,
> > > >>>>>>>>>> have
> > > >>>>>>>>>>>>>>> *stopped*
> > > >>>>>>>>>>>>>>>>>>>> fetching
> > > >>>>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > > >>> have
> > > >>>>>> acks =
> > > >>>>>>>>> -1
> > > >>>>>>>>>>>> (ALL)
> > > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > > >>>> ProduceRequests
> > > >>>>>>>> ahead
> > > >>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> LeaderAndISR
> > > >>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > > >>>>> replicated
> > > >>>>>>>> to
> > > >>>>>>>>>> other
> > > >>>>>>>>>>>>>> brokers
> > > >>>>>>>>>>>>>>>>>>> (because
> > > >>>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > > >>>>>>>> LeaderAndISR
> > > >>>>>>>>>>>> request
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>> processed
> > > >>>>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > > >>>>> immediately
> > > >>>>>>>>>>> transition
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> partition
> > > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > > >>> the
> > > >>>>>>>>>> replication
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> remaining
> > > >>>>>>>>>>>>>>>>>>>> 19
> > > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > > >>>>> indicating
> > > >>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > > >>> the
> > > >>>>>>>>> "test-0".
> > > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > > >> between
> > > >>>> 1.1
> > > >>>>>> and
> > > >>>>>>>>> 1.2,
> > > >>>>>>>>>>>> let's
> > > >>>>>>>>>>>>>> say
> > > >>>>>>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > > >>>>>> LeaderAndISR,
> > > >>>>>>>>> and
> > > >>>>>>>>>>>> there
> > > >>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>> 8
> > > >>>>>>>>>>>>>>>>> io
> > > >>>>>>>>>>>>>>>>>>>>> threads,
> > > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > > >>>>>> approximately
> > > >>>>>>>>> 3000
> > > >>>>>>>>>>>>> produce
> > > >>>>>>>>>>>>>>>>>> requests.
> > > >>>>>>>>>>>>>>>>>>>> Now
> > > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > > >>>>> finally
> > > >>>>>>>>>> processed
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > > >> we
> > > >>>>> model
> > > >>>>>>>> the
> > > >>>>>>>>>> time
> > > >>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>> their
> > > >>>>>>>>>>>>>>>>>>>>> remaining
> > > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > > >>>> ...t2999,
> > > >>>>>> and
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>>>> LeaderAndISR
> > > >>>>>>>>>>>>>>>>>>>> request
> > > >>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > > >>>> request
> > > >>>>>>>> would
> > > >>>>>>>>>> have
> > > >>>>>>>>>>>>>> waited
> > > >>>>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>>>> extra
> > > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > > >>> 2nd
> > > >>>>> an
> > > >>>>>>>> extra
> > > >>>>>>>>>>> time
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> t3000 -
> > > >>>>>>>>>>>>>>>>>>> t1,
> > > >>>>>>>>>>>>>>>>>>>>> etc.
> > > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > > >>>> difference
> > > >>>>> is
> > > >>>>>>>>> bigger
> > > >>>>>>>>>>> for
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> earlier
> > > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > > >>> ones.
> > > >>>>> For
> > > >>>>>>>> the
> > > >>>>>>>>>> same
> > > >>>>>>>>>>>>>> reason,
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > > >>>>> benefit
> > > >>>>>>>> we
> > > >>>>>>>>> get
> > > >>>>>>>>>>>>> (capped
> > > >>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > > >>> have
> > > >>>>>>>> acks=0 or
> > > >>>>>>>>>>>> acks=1
> > > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > > >> differences
> > > >>> in
> > > >>>>>> this
> > > >>>>>>>>> case,
> > > >>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > > >> of
> > > >>>>>>>> partition
> > > >>>>>>>>>>> test-0
> > > >>>>>>>>>>>> in
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > > >>> LeaderAndISR
> > > >>>>>> will
> > > >>>>>>>> be
> > > >>>>>>>>>>>> appended
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> local
> > > >>>>>>>>>>>>>>>>>>>>> log,
> > > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > > >>> after
> > > >>>>>>>>> processing
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > > >> of
> > > >>>> data
> > > >>>>>>>> loss
> > > >>>>>>>>> in
> > > >>>>>>>>>>>> terms
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> messages
> > > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > > >>> the
> > > >>>>>> effect
> > > >>>>>>>>>> since
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > > >> response
> > > >>> to
> > > >>>>>>>>> producers
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > > >>> error,
> > > >>>>>>>> causing
> > > >>>>>>>>>>>> producers
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> retry
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > > >>> for
> > > >>>>>>>> reducing
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> latency
> > > >>>>>>>>>>>>>>>>> of a
> > > >>>>>>>>>>>>>>>>>>>>> broker
> > > >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> > > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> > > >> latency
> > > >>>> of
> > > >>>>> a
> > > >>>>>>>>> broker
> > > >>>>>>>>>>>>> becoming
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> leader.
> > > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> > > >> more
> > > >>>>>>>> obvious, if
> > > >>>>>>>>>>> other
> > > >>>>>>>>>>>>>>> brokers
> > > >>>>>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> > > >>>>>>>>>>>>>>>>>>>>>> current broker should take
> > > >> leadership.
> > > >>>> Any
> > > >>>>>>>> delay
> > > >>>>>>>>> in
> > > >>>>>>>>>>>>>> processing
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> > > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> > > >>> extreme
> > > >>>>>> cases,
> > > >>>>>>>>> this
> > > >>>>>>>>>>> can
> > > >>>>>>>>>>>>>> cause
> > > >>>>>>>>>>>>>>>>>> failed
> > > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> > > >>>>>>>>>>>>>>>>>>>>>> exhausted.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> > > >>> requests
> > > >>>>> are
> > > >>>>>>>>>>>>> UpdateMetadata
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> > > >> discuss
> > > >>>> as
> > > >>>>>>>>> follows:
> > > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> > > >>>>>> processing
> > > >>>>>>>>>> means
> > > >>>>>>>>>>>>>> clients
> > > >>>>>>>>>>>>>>>>>>> receiving
> > > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> > > >>>>>> leadership
> > > >>>>>>>>> info
> > > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> > > >> effect
> > > >>> is
> > > >>>>>> more
> > > >>>>>>>>>> retries
> > > >>>>>>>>>>>> or
> > > >>>>>>>>>>>>>> even
> > > >>>>>>>>>>>>>>>>>> fatal
> > > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> > > >>> queuing
> > > >>>>> time
> > > >>>>>>>> may
> > > >>>>>>>>>>>> degrade
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> performance
> > > >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> > > >>> delay
> > > >>>>> for
> > > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> > > >>>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>> are right
> > > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> > > >>>> latency
> > > >>>>> in
> > > >>>>>>>>>> getting
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> log
> > > >>>>>>>>>>>>>>>>> dirs
> > > >>>>>>>>>>>>>>>>>>>> info,
> > > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> > > >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> > > >> involved.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>>>>>>>>>> Lucas
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > > >>> Lin
> > > >>>> <
> > > >>>>>>>>>>>>>> lindong28@gmail.com
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> > > >>> good
> > > >>>>>>>> point.
> > > >>>>>>>>> So
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> feature
> > > >>>>>>>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> > > >>>>>> question
> > > >>>>>>>>>> below.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> > > >>> useful
> > > >>>>> for
> > > >>>>>>>>>> non-JBOD
> > > >>>>>>>>>>>>> setup
> > > >>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> > > >>>>> useful
> > > >>>>>> to
> > > >>>>>>>>>>>> understand
> > > >>>>>>>>>>>>>>> this.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> > > >>> in
> > > >>>>>> order
> > > >>>>>>>> to
> > > >>>>>>>>>> move
> > > >>>>>>>>>>>>>> leaders
> > > >>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> failed
> > > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> > > >>> operator
> > > >>>>>> first
> > > >>>>>>>>>> needs
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> get
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> list
> > > >>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> > > >> is
> > > >>>>>>>> currently
> > > >>>>>>>>>>>> achieved
> > > >>>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> > > >>>> sends
> > > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> > > >>>>> controller
> > > >>>>>>>>>>> requests,
> > > >>>>>>>>>>>>> then
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> > > >>>> processed
> > > >>>>>> by
> > > >>>>>>>> the
> > > >>>>>>>>>>>> broker.
> > > >>>>>>>>>>>>>> So
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> overall
> > > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> > > >>>> failed
> > > >>>>>> disk
> > > >>>>>>>>> may
> > > >>>>>>>>>>>> still
> > > >>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> long
> > > >>>>>>>>>>>>>>>>>>> even
> > > >>>>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>>>> Dong
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> > > >> Lucas
> > > >>>>> Wang <
> > > >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> > > >>>> Jun.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> > > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> > > >>> your
> > > >>>>>>>> previous
> > > >>>>>>>>>>> email
> > > >>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> > > >>> it's
> > > >>>>>>>> useful,
> > > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> > > >>> you
> > > >>>>>> agree
> > > >>>>>>>>> that
> > > >>>>>>>>>>>> this
> > > >>>>>>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> > > >> by
> > > >>>> Jun?
> > > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> > > >> Jun
> > > >>>> Rao
> > > >>>>> <
> > > >>>>>>>>>>>>>> jun@confluent.io>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> > > >> slow,
> > > >>>> one
> > > >>>>>>>>> probably
> > > >>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>> kill
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> > > >> may
> > > >>>> not
> > > >>>>>>>> help.
> > > >>>>>>>>> If
> > > >>>>>>>>>>>> only
> > > >>>>>>>>>>>>>> one
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> disks
> > > >>>>>>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> > > >>> fail
> > > >>>>>> that
> > > >>>>>>>>> disk
> > > >>>>>>>>>>> and
> > > >>>>>>>>>>>>> move
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> leaders
> > > >>>>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> > > >>> case,
> > > >>>>>> being
> > > >>>>>>>>> able
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> process
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> > > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> > > >>>> help
> > > >>>>>> the
> > > >>>>>>>>>>> producers
> > > >>>>>>>>>>>>>>> recover
> > > >>>>>>>>>>>>>>>>>>>> quicker.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> > > >>> Dong
> > > >>>>>> Lin <
> > > >>>>>>>>>>>>>>>>> lindong28@gmail.com
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> > > >>> follow
> > > >>>> up
> > > >>>>>>>>>> questions
> > > >>>>>>>>>>>>> below.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> > > >>>> ProduceRequest
> > > >>>>>>>> covers
> > > >>>>>>>>> 20
> > > >>>>>>>>>>>>>>> partitions
> > > >>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> > > >>> partitions,
> > > >>>>>> then
> > > >>>>>>>>> each
> > > >>>>>>>>>>>>>>>>> ProduceRequest
> > > >>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>> likely
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> > > >> which
> > > >>>> the
> > > >>>>>>>> broker
> > > >>>>>>>>> is
> > > >>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>> leader
> > > >>>>>>>>>>>>>>>>>>> after
> > > >>>>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> > > >> broker
> > > >>>>> will
> > > >>>>>>>> still
> > > >>>>>>>>>> be
> > > >>>>>>>>>>>> slow
> > > >>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>> processing
> > > >>>>>>>>>>>>>>>>>>>>>>> these
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> > > >> will
> > > >>>>> still
> > > >>>>>> be
> > > >>>>>>>>> very
> > > >>>>>>>>>>>> high
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> KIP.
> > > >>>>>>>>>>>>>>>>>>>>> It
> > > >>>>>>>>>>>>>>>>>>>>>>>>> seems
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> > > >>>> still
> > > >>>>>>>> timeout
> > > >>>>>>>>>>> after
> > > >>>>>>>>>>>>> 30
> > > >>>>>>>>>>>>>>>>>> seconds.
> > > >>>>>>>>>>>>>>>>>>> Is
> > > >>>>>>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> > > >>>> ProduceRequest
> > > >>>>>> will
> > > >>>>>>>>>> still
> > > >>>>>>>>>>>>>> timeout
> > > >>>>>>>>>>>>>>>>> after
> > > >>>>>>>>>>>>>>>>>>> 30
> > > >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> > > >>> KIP
> > > >>>>>>>> reduces
> > > >>>>>>>>>>>> average
> > > >>>>>>>>>>>>>>>>> produce
> > > >>>>>>>>>>>>>>>>>>>>> latency.
> > > >>>>>>>>>>>>>>>>>>>>>>> Can
> > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> > > >>>> improved
> > > >>>>>> by
> > > >>>>>>>>> this
> > > >>>>>>>>>>> KIP?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> > > >>>>> directly
> > > >>>>>>>> cares
> > > >>>>>>>>>>>> number
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>> truncated
> > > >>>>>>>>>>>>>>>>>>>>>>>> messages.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> > > >> improve
> > > >>>>>> average
> > > >>>>>>>>>>>> throughput
> > > >>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>> reduce
> > > >>>>>>>>>>>>>>>>>>>>>>> message
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> > > >> to
> > > >>>>>>>> understand
> > > >>>>>>>>>>> this.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> > > >>>> Lucas
> > > >>>>>>>> Wang <
> > > >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> > > >>>> comments.
> > > >>>>>>>> Please
> > > >>>>>>>>>> see
> > > >>>>>>>>>>>> my
> > > >>>>>>>>>>>>>>> reply
> > > >>>>>>>>>>>>>>>>>>> below.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> > > >> only
> > > >>> 1
> > > >>>>>>>>> partition.
> > > >>>>>>>>>>> Now
> > > >>>>>>>>>>>>>> let's
> > > >>>>>>>>>>>>>>>>>>> consider
> > > >>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> common
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> > > >> of
> > > >>>>> many
> > > >>>>>>>>>>> partitions.
> > > >>>>>>>>>>>>> And
> > > >>>>>>>>>>>>>>>>> let's
> > > >>>>>>>>>>>>>>>>>>> say
> > > >>>>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> > > >>> partitions
> > > >>>> on
> > > >>>>>>>>> broker0
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>> large,
> > > >>>>>>>>>>>>>>>>>> say
> > > >>>>>>>>>>>>>>>>>>>> 10K,
> > > >>>>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> > > >> to
> > > >>>>> shift
> > > >>>>>>>> the
> > > >>>>>>>>>>>>> leadership
> > > >>>>>>>>>>>>>>>>> for a
> > > >>>>>>>>>>>>>>>>>>> lot
> > > >>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> > > >>>>>> brokers,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> > > >>> some
> > > >>>>>>>> service
> > > >>>>>>>>>> like
> > > >>>>>>>>>>>>> cruise
> > > >>>>>>>>>>>>>>>>>> control.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> > > >>> the
> > >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

Could you update the KIP if you are confident with the approach which uses
correlation id? The idea around correlation id is kind of scattered across
multiple emails. It will be useful if other reviews can read the KIP to
understand the latest proposal.

Thanks,
Dong

On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
gharatmayuresh15@gmail.com> wrote:

> I like the idea of the dequeue implementation by Lucas. This will help us
> avoid additional queue for controller and additional configs in Kafka.
>
> Thanks,
>
> Mayuresh
>
> On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > The usage of correlation ID might still be useful to address the cases
> > that the controller epoch and leader epoch check are not sufficient to
> > guarantee correct behavior. For example, if the controller sends a
> > LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker
> > processes it in the reverse order, the replica may still be wrongly
> > recreated, right?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > Hmm, since we already use controller epoch and leader epoch for
> properly
> > > caching the latest partition state, do we really need correlation id
> for
> > > ordering the controller requests?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > >> Lucas and Mayuresh,
> > >>
> > >> Good idea. The correlation id should work.
> > >>
> > >> In the ControllerChannelManager, a request will be resent until a
> > response
> > >> is received. So if the controller to broker connection disconnects
> after
> > >> controller sends R1_a, but before the response of R1_a is received, a
> > >> disconnection may cause the controller to resend R1_b. i.e. until R1
> is
> > >> acked, R2 won't be sent by the controller.
> > >> This gives two guarantees:
> > >> 1. Correlation id wise: R1_a < R1_b < R2.
> > >> 2. On the broker side, when R2 is seen, R1 must have been processed at
> > >> least once.
> > >>
> > >> So on the broker side, with a single thread controller request
> handler,
> > the
> > >> logic should be:
> > >> 1. Process what ever request seen in the controller request queue
> > >> 2. For the given epoch, drop request if its correlation id is smaller
> > than
> > >> that of the last processed request.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
> > >>
> > >>> I agree that there is no strong ordering when there are more than one
> > >>> socket connections. Currently, we rely on controllerEpoch and
> > leaderEpoch
> > >>> to ensure that the receiving broker picks up the latest state for
> each
> > >>> partition.
> > >>>
> > >>> One potential issue with the dequeue approach is that if the queue is
> > >> full,
> > >>> there is no guarantee that the controller requests will be enqueued
> > >>> quickly.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jun
> > >>>
> > >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > >>> gharatmayuresh15@gmail.com
> > >>>> wrote:
> > >>>
> > >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> > >> constructor.
> > >>>> Since we reuse the same NetworkClient between Controller and the
> > >> broker,
> > >>> a
> > >>>> disconnection should not cause it to reset to 0, in which case it
> can
> > >> be
> > >>>> used to reject obsolete requests.
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Mayuresh
> > >>>>
> > >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>>> @Dong,
> > >>>>> Great example and explanation, thanks!
> > >>>>>
> > >>>>> @All
> > >>>>> Regarding the example given by Dong, it seems even if we use a
> queue,
> > >>>> and a
> > >>>>> dedicated controller request handling thread,
> > >>>>> the same result can still happen because R1_a will be sent on one
> > >>>>> connection, and R1_b & R2 will be sent on a different connection,
> > >>>>> and there is no ordering between different connections on the
> broker
> > >>>> side.
> > >>>>> I was discussing with Mayuresh offline, and it seems correlation id
> > >>>> within
> > >>>>> the same NetworkClient object is monotonically increasing and never
> > >>>> reset,
> > >>>>> hence a broker can leverage that to properly reject obsolete
> > >> requests.
> > >>>>> Thoughts?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Lucas
> > >>>>>
> > >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > >>>>> gharatmayuresh15@gmail.com> wrote:
> > >>>>>
> > >>>>>> Actually nvm, correlationId is reset in case of connection loss, I
> > >>>> think.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> Mayuresh
> > >>>>>>
> > >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > >>>>>> gharatmayuresh15@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> I agree with Dong that out-of-order processing can happen with
> > >>>> having 2
> > >>>>>>> separate queues as well and it can even happen today.
> > >>>>>>> Can we use the correlationId in the request from the controller
> > >> to
> > >>>> the
> > >>>>>>> broker to handle ordering ?
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>>
> > >>>>>>> Mayuresh
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket.qin@gmail.com
> > >>>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Good point, Joel. I agree that a dedicated controller request
> > >>>> handling
> > >>>>>>>> thread would be a better isolation. It also solves the
> > >> reordering
> > >>>>> issue.
> > >>>>>>>>
> > >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> > >> jjkoshy.w@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Good example. I think this scenario can occur in the current
> > >>> code
> > >>>> as
> > >>>>>>>> well
> > >>>>>>>>> but with even lower probability given that there are other
> > >>>>>>>> non-controller
> > >>>>>>>>> requests interleaved. It is still sketchy though and I think a
> > >>>> safer
> > >>>>>>>>> approach would be separate queues and pinning controller
> > >> request
> > >>>>>>>> handling
> > >>>>>>>>> to one handler thread.
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> > >> lindong28@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hey Becket,
> > >>>>>>>>>>
> > >>>>>>>>>> I think you are right that there may be out-of-order
> > >>> processing.
> > >>>>>>>> However,
> > >>>>>>>>>> it seems that out-of-order processing may also happen even
> > >> if
> > >>> we
> > >>>>>> use a
> > >>>>>>>>>> separate queue.
> > >>>>>>>>>>
> > >>>>>>>>>> Here is the example:
> > >>>>>>>>>>
> > >>>>>>>>>> - Controller sends R1 and got disconnected before receiving
> > >>>>>> response.
> > >>>>>>>>> Then
> > >>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
> > >>>>> controller
> > >>>>>>>>>> request queue in the order they are sent.
> > >>>>>>>>>> - thread1 takes R1_a from the request queue and then thread2
> > >>>> takes
> > >>>>>> R2
> > >>>>>>>>> from
> > >>>>>>>>>> the request queue almost at the same time.
> > >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is chance
> > >>> that
> > >>>>>> R2's
> > >>>>>>>>>> processing is completed before R1.
> > >>>>>>>>>>
> > >>>>>>>>>> If out-of-order processing can happen for both approaches
> > >> with
> > >>>>> very
> > >>>>>>>> low
> > >>>>>>>>>> probability, it may not be worthwhile to add the extra
> > >> queue.
> > >>>> What
> > >>>>>> do
> > >>>>>>>> you
> > >>>>>>>>>> think?
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Dong
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > >>>> becket.qin@gmail.com
> > >>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Mayuresh/Joel,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Using the request channel as a dequeue was bright up some
> > >>> time
> > >>>>> ago
> > >>>>>>>> when
> > >>>>>>>>>> we
> > >>>>>>>>>>> initially thinking of prioritizing the request. The
> > >> concern
> > >>>> was
> > >>>>>> that
> > >>>>>>>>> the
> > >>>>>>>>>>> controller requests are supposed to be processed in order.
> > >>> If
> > >>>> we
> > >>>>>> can
> > >>>>>>>>>> ensure
> > >>>>>>>>>>> that there is one controller request in the request
> > >> channel,
> > >>>> the
> > >>>>>>>> order
> > >>>>>>>>> is
> > >>>>>>>>>>> not a concern. But in cases that there are more than one
> > >>>>>> controller
> > >>>>>>>>>> request
> > >>>>>>>>>>> inserted into the queue, the controller request order may
> > >>>> change
> > >>>>>> and
> > >>>>>>>>>> cause
> > >>>>>>>>>>> problem. For example, think about the following sequence:
> > >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
> > >>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
> > >> the
> > >>>>>> request
> > >>>>>>>>>> queue.
> > >>>>>>>>>>> 3. Controller to broker connection failed and the
> > >> controller
> > >>>>>>>>> reconnected
> > >>>>>>>>>> to
> > >>>>>>>>>>> the broker.
> > >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> > >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> > >> request
> > >>>>> queue.
> > >>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
> > >>>>>> processed,
> > >>>>>>>>>> which
> > >>>>>>>>>>> may cause problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Jiangjie (Becket) Qin
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > >>>>> jjkoshy.w@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
> > >>>> less
> > >>>>>>>>> invasive
> > >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
> > >> you
> > >>>> see
> > >>>>>> any
> > >>>>>>>>>>> pitfalls
> > >>>>>>>>>>>> with this approach?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > >>>>>>>> lucasatucla@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> @Mayuresh,
> > >>>>>>>>>>>>> That's a very interesting idea that I haven't thought
> > >>>>> before.
> > >>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
> > >>>> also
> > >>>>>>>>>>>>> avoids the need to have a new size metric and capacity
> > >>>>> config
> > >>>>>>>>>>>>> for the controller request queue. In fact, if we were
> > >> to
> > >>>>> adopt
> > >>>>>>>>>>>>> this design, there is no public interface change, and
> > >> we
> > >>>>>>>>>>>>> probably don't need a KIP.
> > >>>>>>>>>>>>> Also implementation wise, it seems
> > >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
> > >>> the
> > >>>>>>>>>> requirement
> > >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
> > >> at
> > >>>>> both
> > >>>>>>>> ends.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> My only concern is that this design is tied to the
> > >>>>> coincidence
> > >>>>>>>> that
> > >>>>>>>>>>>>> we have two request priorities and there are two ends
> > >>> to a
> > >>>>>>>> deque.
> > >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> > >> network
> > >>>>> layer
> > >>>>>>>> is
> > >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
> > >> we
> > >>>> were
> > >>>>>> to
> > >>>>>>>> add
> > >>>>>>>>>>>>> an extra priority level in the future for some reason,
> > >>> we
> > >>>>>> would
> > >>>>>>>>>>> probably
> > >>>>>>>>>>>>> need to go back to the design of separate queues, one
> > >>> for
> > >>>>> each
> > >>>>>>>>>> priority
> > >>>>>>>>>>>>> level.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
> > >>> your
> > >>>>>>>> suggested
> > >>>>>>>>>>>>> approach.
> > >>>>>>>>>>>>> Let's hear what others think.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> @Becket,
> > >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> > >>> answering
> > >>>>>> your
> > >>>>>>>>>>> question
> > >>>>>>>>>>>>> only in the context
> > >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> > >> makes
> > >>>>>> sense,
> > >>>>>>>> and
> > >>>>>>>>>> I'm
> > >>>>>>>>>>>> ok
> > >>>>>>>>>>>>> with removing the capacity config and
> > >>>>>>>>>>>>> just relying on the default value of 20 being
> > >> sufficient
> > >>>>>> enough.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Lucas
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Lucas,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
> > >>>>>>>> controller
> > >>>>>>>>>>> request
> > >>>>>>>>>>>>>> over any other requests.
> > >>>>>>>>>>>>>> In that case, we can change the request queue to a
> > >>>>> dequeue,
> > >>>>>>>> where
> > >>>>>>>>>> you
> > >>>>>>>>>>>>>> always insert the normal requests (produce,
> > >>>> consume,..etc)
> > >>>>>> to
> > >>>>>>>> the
> > >>>>>>>>>> end
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> > >>> insert
> > >>>>> it
> > >>>>>> to
> > >>>>>>>>> the
> > >>>>>>>>>>> head
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>> the queue. This ensures that the controller request
> > >>> will
> > >>>>> be
> > >>>>>>>> given
> > >>>>>>>>>>>> higher
> > >>>>>>>>>>>>>> priority over other requests.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Also since we only read one request from the socket
> > >>> and
> > >>>>> mute
> > >>>>>>>> it
> > >>>>>>>>> and
> > >>>>>>>>>>>> only
> > >>>>>>>>>>>>>> unmute it after handling the request, this would
> > >>> ensure
> > >>>>> that
> > >>>>>>>> we
> > >>>>>>>>>> don't
> > >>>>>>>>>>>>>> handle controller requests out of order.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> With this approach we can avoid the second queue and
> > >>> the
> > >>>>>>>>> additional
> > >>>>>>>>>>>>> config
> > >>>>>>>>>>>>>> for the size of the queue.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> What do you think ?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Mayuresh
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > >>>>>>>> becket.qin@gmail.com
> > >>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hey Joel,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> > >>> current
> > >>>>>> design
> > >>>>>>>>>> makes
> > >>>>>>>>>>>>> sense.
> > >>>>>>>>>>>>>>> My confusion is about whether the new config for
> > >> the
> > >>>>>>>> controller
> > >>>>>>>>>>> queue
> > >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
> > >>>> which
> > >>>>>>>> users
> > >>>>>>>>>>> would
> > >>>>>>>>>>>>>> change
> > >>>>>>>>>>>>>>> it.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > >>>>>>>>>> becket.qin@gmail.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Lucas,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> > >>>> expect
> > >>>>>>>> user to
> > >>>>>>>>>>> ever
> > >>>>>>>>>>>>>> change
> > >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> > >>> agree
> > >>>>> that
> > >>>>>>>> 20
> > >>>>>>>>> is
> > >>>>>>>>>>>>> already
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> very generous default number and we do not
> > >> expect
> > >>>> user
> > >>>>>> to
> > >>>>>>>>>> change
> > >>>>>>>>>>>> it,
> > >>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > >>>>>>>>>>> lucasatucla@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> @Becket
> > >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> > >>>>> normally
> > >>>>>>>> there
> > >>>>>>>>>>>> should
> > >>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>> one controller request because of muting,
> > >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> > >> many
> > >>>>>>>> enqueued
> > >>>>>>>>>>>>> controller
> > >>>>>>>>>>>>>>>>> requests.
> > >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> > >>>> which
> > >>>>>> part
> > >>>>>>>>>>> conveys
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>> info.
> > >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> > >>>>> section.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> > >>>> conditions,
> > >>>>>> the
> > >>>>>>>>>> current
> > >>>>>>>>>>>>>> design
> > >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> > >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> > >>>>> controller
> > >>>>>>>>> queue
> > >>>>>>>>>>>>> capacity
> > >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> > >>>>>>>>>>>>>>>>> network threads handling requests from
> > >> different
> > >>>>>>>> controllers
> > >>>>>>>>>>> will
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> > >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> > >> other
> > >>>>> hand,
> > >>>>>>>>> adding
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> extra
> > >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> > >>> from
> > >>>>>>>> issues
> > >>>>>>>>> in
> > >>>>>>>>>>>> those
> > >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> > >>>> downside
> > >>>>> of
> > >>>>>>>>> adding
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> extra
> > >>>>>>>>>>>>>>>>> config.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> @Mayuresh
> > >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> > >>> statement
> > >>>>>> based
> > >>>>>>>> on
> > >>>>>>>>> a
> > >>>>>>>>>>>>> previous
> > >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>> Lucas
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > >>> Gharat <
> > >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hi Lucas,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> > >>>> memory
> > >>>>>>>>>>> consumption
> > >>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>> rise
> > >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> > >>> go
> > >>>> up
> > >>>>>> to
> > >>>>>>>> 2x"
> > >>>>>>>>>> in
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> impact
> > >>>>>>>>>>>>>>>>>> section. Normally the requests from
> > >> controller
> > >>>> to a
> > >>>>>>>> Broker
> > >>>>>>>>>> are
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>>>>> high
> > >>>>>>>>>>>>>>>>>> volume, right ?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Mayuresh
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > >>>>>>>>>>>> becket.qin@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> > >>>> control
> > >>>>>>>> plane
> > >>>>>>>>>> from
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>> plane
> > >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> > >> controller
> > >>>>>> request
> > >>>>>>>>> queue
> > >>>>>>>>>>> may
> > >>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>> many
> > >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> > >>> The
> > >>>>>>>>> controller
> > >>>>>>>>>>>>>> requests
> > >>>>>>>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> > >>> SocketServer
> > >>>>>> will
> > >>>>>>>>> mute
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> channel
> > >>>>>>>>>>>>>>>>>> once
> > >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> > >>>>> channel.
> > >>>>>>>> So
> > >>>>>>>>>>>> assuming
> > >>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>> only one connection between controller and
> > >>> each
> > >>>>>>>> broker,
> > >>>>>>>>> on
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> broker
> > >>>>>>>>>>>>>>>>>> side,
> > >>>>>>>>>>>>>>>>>>> there should be only one controller request
> > >>> in
> > >>>>> the
> > >>>>>>>>>>> controller
> > >>>>>>>>>>>>>>> request
> > >>>>>>>>>>>>>>>>>> queue
> > >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> > >> we
> > >>>>> need
> > >>>>>> a
> > >>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>> controller
> > >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> > >>>> value
> > >>>>> 20
> > >>>>>>>>> means
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>> expect
> > >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> > >>> in a
> > >>>>>> short
> > >>>>>>>>>> period
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> time.
> > >>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>> am
> > >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> > >> the
> > >>>>>>>> controller
> > >>>>>>>>>>>> request
> > >>>>>>>>>>>>>>> queue
> > >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> > >>>>>> indicating
> > >>>>>>>>>>> something
> > >>>>>>>>>>>>>> very
> > >>>>>>>>>>>>>>>>> wrong
> > >>>>>>>>>>>>>>>>>>> has happened.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > >>>>>>>>>>>> lindong28@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> > >>> intuitive.
> > >>>> It
> > >>>>>>>> will
> > >>>>>>>>> be
> > >>>>>>>>>>> good
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> learn
> > >>>>>>>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > >>> Wang <
> > >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> > >>> the
> > >>>>> KIP
> > >>>>>> by
> > >>>>>>>>>>>> explaining
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> cases
> > >>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> > >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> > >>>>>> comments.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>> Lucas
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > >>> Wang
> > >>>> <
> > >>>>>>>>>>>>>>> lucasatucla@gmail.com
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> > >>>> merely
> > >>>>>>>> for me
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> easily
> > >>>>>>>>>>>>>>>>>>> construct
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> testing scenario
> > >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> > >> In
> > >>>>>>>> production,
> > >>>>>>>>>>> other
> > >>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> disk
> > >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> > >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> > >> by
> > >>>> high
> > >>>>>>>>> produce
> > >>>>>>>>>>> QPS.
> > >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> > >>> the
> > >>>>>>>> broker
> > >>>>>>>>> and
> > >>>>>>>>>>>>> that's
> > >>>>>>>>>>>>>>> when
> > >>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> > >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> > >>>> about
> > >>>>>> each
> > >>>>>>>>>>>>>> ProduceRequest
> > >>>>>>>>>>>>>>>>>>> covering
> > >>>>>>>>>>>>>>>>>>>>> 20
> > >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> > >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> > >>>>> request
> > >>>>>>>> is
> > >>>>>>>>>>>> enqueued
> > >>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> tries
> > >>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> > >> broker0,
> > >>>>> from
> > >>>>>>>>> leader
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>> follower
> > >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> > >>>> *test-0*.
> > >>>>>> For
> > >>>>>>>> the
> > >>>>>>>>>>> sake
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> argument,
> > >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> > >>> say
> > >>>>>>>> broker1,
> > >>>>>>>>>> have
> > >>>>>>>>>>>>>>> *stopped*
> > >>>>>>>>>>>>>>>>>>>> fetching
> > >>>>>>>>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> > >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> > >>> have
> > >>>>>> acks =
> > >>>>>>>>> -1
> > >>>>>>>>>>>> (ALL)
> > >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> > >>>> ProduceRequests
> > >>>>>>>> ahead
> > >>>>>>>>> of
> > >>>>>>>>>>>>>>>>> LeaderAndISR
> > >>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> > >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> > >>>>> replicated
> > >>>>>>>> to
> > >>>>>>>>>> other
> > >>>>>>>>>>>>>> brokers
> > >>>>>>>>>>>>>>>>>>> (because
> > >>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> > >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> > >>>>>>>> LeaderAndISR
> > >>>>>>>>>>>> request
> > >>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>> processed
> > >>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> > >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> > >>>>> immediately
> > >>>>>>>>>>> transition
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> partition
> > >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> > >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> > >>> the
> > >>>>>>>>>> replication
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> remaining
> > >>>>>>>>>>>>>>>>>>>> 19
> > >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> > >>>>> indicating
> > >>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> > >>> the
> > >>>>>>>>> "test-0".
> > >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> > >> between
> > >>>> 1.1
> > >>>>>> and
> > >>>>>>>>> 1.2,
> > >>>>>>>>>>>> let's
> > >>>>>>>>>>>>>> say
> > >>>>>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> > >>>>>> LeaderAndISR,
> > >>>>>>>>> and
> > >>>>>>>>>>>> there
> > >>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>> 8
> > >>>>>>>>>>>>>>>>> io
> > >>>>>>>>>>>>>>>>>>>>> threads,
> > >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> > >>>>>> approximately
> > >>>>>>>>> 3000
> > >>>>>>>>>>>>> produce
> > >>>>>>>>>>>>>>>>>> requests.
> > >>>>>>>>>>>>>>>>>>>> Now
> > >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> > >>>>> finally
> > >>>>>>>>>> processed
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> > >> we
> > >>>>> model
> > >>>>>>>> the
> > >>>>>>>>>> time
> > >>>>>>>>>>>> when
> > >>>>>>>>>>>>>>> their
> > >>>>>>>>>>>>>>>>>>>>> remaining
> > >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> > >>>> ...t2999,
> > >>>>>> and
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>> LeaderAndISR
> > >>>>>>>>>>>>>>>>>>>> request
> > >>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> > >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> > >>>> request
> > >>>>>>>> would
> > >>>>>>>>>> have
> > >>>>>>>>>>>>>> waited
> > >>>>>>>>>>>>>>> an
> > >>>>>>>>>>>>>>>>>>> extra
> > >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> > >>> 2nd
> > >>>>> an
> > >>>>>>>> extra
> > >>>>>>>>>>> time
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> t3000 -
> > >>>>>>>>>>>>>>>>>>> t1,
> > >>>>>>>>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> > >>>> difference
> > >>>>> is
> > >>>>>>>>> bigger
> > >>>>>>>>>>> for
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> earlier
> > >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> > >>> ones.
> > >>>>> For
> > >>>>>>>> the
> > >>>>>>>>>> same
> > >>>>>>>>>>>>>> reason,
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> > >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> > >>>>> benefit
> > >>>>>>>> we
> > >>>>>>>>> get
> > >>>>>>>>>>>>> (capped
> > >>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> > >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> > >>> have
> > >>>>>>>> acks=0 or
> > >>>>>>>>>>>> acks=1
> > >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> > >> differences
> > >>> in
> > >>>>>> this
> > >>>>>>>>> case,
> > >>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> > >> of
> > >>>>>>>> partition
> > >>>>>>>>>>> test-0
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> > >>> LeaderAndISR
> > >>>>>> will
> > >>>>>>>> be
> > >>>>>>>>>>>> appended
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> local
> > >>>>>>>>>>>>>>>>>>>>> log,
> > >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> > >>> after
> > >>>>>>>>> processing
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> > >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> > >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> > >> of
> > >>>> data
> > >>>>>>>> loss
> > >>>>>>>>> in
> > >>>>>>>>>>>> terms
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> messages
> > >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> > >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> > >>> the
> > >>>>>> effect
> > >>>>>>>>>> since
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> > >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> > >> response
> > >>> to
> > >>>>>>>>> producers
> > >>>>>>>>>>> will
> > >>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> > >>> error,
> > >>>>>>>> causing
> > >>>>>>>>>>>> producers
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> retry
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> > >>> for
> > >>>>>>>> reducing
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>> latency
> > >>>>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>>>>>> broker
> > >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> > >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> > >> latency
> > >>>> of
> > >>>>> a
> > >>>>>>>>> broker
> > >>>>>>>>>>>>> becoming
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> leader.
> > >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> > >> more
> > >>>>>>>> obvious, if
> > >>>>>>>>>>> other
> > >>>>>>>>>>>>>>> brokers
> > >>>>>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> > >>>>>>>>>>>>>>>>>>>>>> current broker should take
> > >> leadership.
> > >>>> Any
> > >>>>>>>> delay
> > >>>>>>>>> in
> > >>>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> > >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> > >>> extreme
> > >>>>>> cases,
> > >>>>>>>>> this
> > >>>>>>>>>>> can
> > >>>>>>>>>>>>>> cause
> > >>>>>>>>>>>>>>>>>> failed
> > >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> > >>>>>>>>>>>>>>>>>>>>>> exhausted.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> > >>> requests
> > >>>>> are
> > >>>>>>>>>>>>> UpdateMetadata
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> > >> discuss
> > >>>> as
> > >>>>>>>>> follows:
> > >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> > >>>>>> processing
> > >>>>>>>>>> means
> > >>>>>>>>>>>>>> clients
> > >>>>>>>>>>>>>>>>>>> receiving
> > >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> > >>>>>> leadership
> > >>>>>>>>> info
> > >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> > >> effect
> > >>> is
> > >>>>>> more
> > >>>>>>>>>> retries
> > >>>>>>>>>>>> or
> > >>>>>>>>>>>>>> even
> > >>>>>>>>>>>>>>>>>> fatal
> > >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> > >>> queuing
> > >>>>> time
> > >>>>>>>> may
> > >>>>>>>>>>>> degrade
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> performance
> > >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> > >>> delay
> > >>>>> for
> > >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> > >>>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>> are right
> > >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> > >>>> latency
> > >>>>> in
> > >>>>>>>>>> getting
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> log
> > >>>>>>>>>>>>>>>>> dirs
> > >>>>>>>>>>>>>>>>>>>> info,
> > >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> > >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> > >> involved.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>>>>>> Lucas
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > >>> Lin
> > >>>> <
> > >>>>>>>>>>>>>> lindong28@gmail.com
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> > >>> good
> > >>>>>>>> point.
> > >>>>>>>>> So
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> feature
> > >>>>>>>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> > >>>>>> question
> > >>>>>>>>>> below.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> > >>> useful
> > >>>>> for
> > >>>>>>>>>> non-JBOD
> > >>>>>>>>>>>>> setup
> > >>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> > >>>>> useful
> > >>>>>> to
> > >>>>>>>>>>>> understand
> > >>>>>>>>>>>>>>> this.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> > >>> in
> > >>>>>> order
> > >>>>>>>> to
> > >>>>>>>>>> move
> > >>>>>>>>>>>>>> leaders
> > >>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> failed
> > >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> > >>> operator
> > >>>>>> first
> > >>>>>>>>>> needs
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> get
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> list
> > >>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> > >> is
> > >>>>>>>> currently
> > >>>>>>>>>>>> achieved
> > >>>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> > >>>> sends
> > >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> > >>>>> controller
> > >>>>>>>>>>> requests,
> > >>>>>>>>>>>>> then
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> > >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> > >>>> processed
> > >>>>>> by
> > >>>>>>>> the
> > >>>>>>>>>>>> broker.
> > >>>>>>>>>>>>>> So
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> overall
> > >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> > >>>> failed
> > >>>>>> disk
> > >>>>>>>>> may
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>> long
> > >>>>>>>>>>>>>>>>>>> even
> > >>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>>>> Dong
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> > >> Lucas
> > >>>>> Wang <
> > >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> > >>>> Jun.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> > >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> > >>> your
> > >>>>>>>> previous
> > >>>>>>>>>>> email
> > >>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> > >>> it's
> > >>>>>>>> useful,
> > >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> > >>> you
> > >>>>>> agree
> > >>>>>>>>> that
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> > >> by
> > >>>> Jun?
> > >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> > >> Jun
> > >>>> Rao
> > >>>>> <
> > >>>>>>>>>>>>>> jun@confluent.io>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> > >> slow,
> > >>>> one
> > >>>>>>>>> probably
> > >>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>> kill
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> > >> may
> > >>>> not
> > >>>>>>>> help.
> > >>>>>>>>> If
> > >>>>>>>>>>>> only
> > >>>>>>>>>>>>>> one
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> disks
> > >>>>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> > >>> fail
> > >>>>>> that
> > >>>>>>>>> disk
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>> move
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> leaders
> > >>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> > >>> case,
> > >>>>>> being
> > >>>>>>>>> able
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>> process
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> > >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> > >>>> help
> > >>>>>> the
> > >>>>>>>>>>> producers
> > >>>>>>>>>>>>>>> recover
> > >>>>>>>>>>>>>>>>>>>> quicker.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> > >>> Dong
> > >>>>>> Lin <
> > >>>>>>>>>>>>>>>>> lindong28@gmail.com
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> > >>> follow
> > >>>> up
> > >>>>>>>>>> questions
> > >>>>>>>>>>>>> below.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> > >>>> ProduceRequest
> > >>>>>>>> covers
> > >>>>>>>>> 20
> > >>>>>>>>>>>>>>> partitions
> > >>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> > >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> > >>> partitions,
> > >>>>>> then
> > >>>>>>>>> each
> > >>>>>>>>>>>>>>>>> ProduceRequest
> > >>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>> likely
> > >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> > >> which
> > >>>> the
> > >>>>>>>> broker
> > >>>>>>>>> is
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>> leader
> > >>>>>>>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> > >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> > >> broker
> > >>>>> will
> > >>>>>>>> still
> > >>>>>>>>>> be
> > >>>>>>>>>>>> slow
> > >>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>>>>>>>>>>> these
> > >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> > >> will
> > >>>>> still
> > >>>>>> be
> > >>>>>>>>> very
> > >>>>>>>>>>>> high
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> KIP.
> > >>>>>>>>>>>>>>>>>>>>> It
> > >>>>>>>>>>>>>>>>>>>>>>>>> seems
> > >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> > >>>> still
> > >>>>>>>> timeout
> > >>>>>>>>>>> after
> > >>>>>>>>>>>>> 30
> > >>>>>>>>>>>>>>>>>> seconds.
> > >>>>>>>>>>>>>>>>>>> Is
> > >>>>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> > >>>> ProduceRequest
> > >>>>>> will
> > >>>>>>>>>> still
> > >>>>>>>>>>>>>> timeout
> > >>>>>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>> 30
> > >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> > >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> > >>> KIP
> > >>>>>>>> reduces
> > >>>>>>>>>>>> average
> > >>>>>>>>>>>>>>>>> produce
> > >>>>>>>>>>>>>>>>>>>>> latency.
> > >>>>>>>>>>>>>>>>>>>>>>> Can
> > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> > >>>> improved
> > >>>>>> by
> > >>>>>>>>> this
> > >>>>>>>>>>> KIP?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> > >>>>> directly
> > >>>>>>>> cares
> > >>>>>>>>>>>> number
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>> truncated
> > >>>>>>>>>>>>>>>>>>>>>>>> messages.
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> > >> improve
> > >>>>>> average
> > >>>>>>>>>>>> throughput
> > >>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>> reduce
> > >>>>>>>>>>>>>>>>>>>>>>> message
> > >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> > >> to
> > >>>>>>>> understand
> > >>>>>>>>>>> this.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> > >>>> Lucas
> > >>>>>>>> Wang <
> > >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> > >>>> comments.
> > >>>>>>>> Please
> > >>>>>>>>>> see
> > >>>>>>>>>>>> my
> > >>>>>>>>>>>>>>> reply
> > >>>>>>>>>>>>>>>>>>> below.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> > >> only
> > >>> 1
> > >>>>>>>>> partition.
> > >>>>>>>>>>> Now
> > >>>>>>>>>>>>>> let's
> > >>>>>>>>>>>>>>>>>>> consider
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>>>>>>>>> common
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> > >> of
> > >>>>> many
> > >>>>>>>>>>> partitions.
> > >>>>>>>>>>>>> And
> > >>>>>>>>>>>>>>>>> let's
> > >>>>>>>>>>>>>>>>>>> say
> > >>>>>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> > >>> partitions
> > >>>> on
> > >>>>>>>>> broker0
> > >>>>>>>>>> is
> > >>>>>>>>>>>> so
> > >>>>>>>>>>>>>>> large,
> > >>>>>>>>>>>>>>>>>> say
> > >>>>>>>>>>>>>>>>>>>> 10K,
> > >>>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> > >> to
> > >>>>> shift
> > >>>>>>>> the
> > >>>>>>>>>>>>> leadership
> > >>>>>>>>>>>>>>>>> for a
> > >>>>>>>>>>>>>>>>>>> lot
> > >>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> > >>>>>> brokers,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> > >>> some
> > >>>>>>>> service
> > >>>>>>>>>> like
> > >>>>>>>>>>>>> cruise
> > >>>>>>>>>>>>>>>>>> control.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> > >>> the
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

I like the idea of the dequeue implementation by Lucas. This will help us
avoid additional queue for controller and additional configs in Kafka.

Thanks,

Mayuresh

On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <be...@gmail.com> wrote:

> Hi Jun,
>
> The usage of correlation ID might still be useful to address the cases
> that the controller epoch and leader epoch check are not sufficient to
> guarantee correct behavior. For example, if the controller sends a
> LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker
> processes it in the reverse order, the replica may still be wrongly
> recreated, right?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> > On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> >
> > Hmm, since we already use controller epoch and leader epoch for properly
> > caching the latest partition state, do we really need correlation id for
> > ordering the controller requests?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> >> Lucas and Mayuresh,
> >>
> >> Good idea. The correlation id should work.
> >>
> >> In the ControllerChannelManager, a request will be resent until a
> response
> >> is received. So if the controller to broker connection disconnects after
> >> controller sends R1_a, but before the response of R1_a is received, a
> >> disconnection may cause the controller to resend R1_b. i.e. until R1 is
> >> acked, R2 won't be sent by the controller.
> >> This gives two guarantees:
> >> 1. Correlation id wise: R1_a < R1_b < R2.
> >> 2. On the broker side, when R2 is seen, R1 must have been processed at
> >> least once.
> >>
> >> So on the broker side, with a single thread controller request handler,
> the
> >> logic should be:
> >> 1. Process what ever request seen in the controller request queue
> >> 2. For the given epoch, drop request if its correlation id is smaller
> than
> >> that of the last processed request.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
> >>
> >>> I agree that there is no strong ordering when there are more than one
> >>> socket connections. Currently, we rely on controllerEpoch and
> leaderEpoch
> >>> to ensure that the receiving broker picks up the latest state for each
> >>> partition.
> >>>
> >>> One potential issue with the dequeue approach is that if the queue is
> >> full,
> >>> there is no guarantee that the controller requests will be enqueued
> >>> quickly.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> >>> gharatmayuresh15@gmail.com
> >>>> wrote:
> >>>
> >>>> Yea, the correlationId is only set to 0 in the NetworkClient
> >> constructor.
> >>>> Since we reuse the same NetworkClient between Controller and the
> >> broker,
> >>> a
> >>>> disconnection should not cause it to reset to 0, in which case it can
> >> be
> >>>> used to reject obsolete requests.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Mayuresh
> >>>>
> >>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> @Dong,
> >>>>> Great example and explanation, thanks!
> >>>>>
> >>>>> @All
> >>>>> Regarding the example given by Dong, it seems even if we use a queue,
> >>>> and a
> >>>>> dedicated controller request handling thread,
> >>>>> the same result can still happen because R1_a will be sent on one
> >>>>> connection, and R1_b & R2 will be sent on a different connection,
> >>>>> and there is no ordering between different connections on the broker
> >>>> side.
> >>>>> I was discussing with Mayuresh offline, and it seems correlation id
> >>>> within
> >>>>> the same NetworkClient object is monotonically increasing and never
> >>>> reset,
> >>>>> hence a broker can leverage that to properly reject obsolete
> >> requests.
> >>>>> Thoughts?
> >>>>>
> >>>>> Thanks,
> >>>>> Lucas
> >>>>>
> >>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> >>>>> gharatmayuresh15@gmail.com> wrote:
> >>>>>
> >>>>>> Actually nvm, correlationId is reset in case of connection loss, I
> >>>> think.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Mayuresh
> >>>>>>
> >>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> >>>>>> gharatmayuresh15@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I agree with Dong that out-of-order processing can happen with
> >>>> having 2
> >>>>>>> separate queues as well and it can even happen today.
> >>>>>>> Can we use the correlationId in the request from the controller
> >> to
> >>>> the
> >>>>>>> broker to handle ordering ?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Mayuresh
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket.qin@gmail.com
> >>>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Good point, Joel. I agree that a dedicated controller request
> >>>> handling
> >>>>>>>> thread would be a better isolation. It also solves the
> >> reordering
> >>>>> issue.
> >>>>>>>>
> >>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> >> jjkoshy.w@gmail.com>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Good example. I think this scenario can occur in the current
> >>> code
> >>>> as
> >>>>>>>> well
> >>>>>>>>> but with even lower probability given that there are other
> >>>>>>>> non-controller
> >>>>>>>>> requests interleaved. It is still sketchy though and I think a
> >>>> safer
> >>>>>>>>> approach would be separate queues and pinning controller
> >> request
> >>>>>>>> handling
> >>>>>>>>> to one handler thread.
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> >> lindong28@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hey Becket,
> >>>>>>>>>>
> >>>>>>>>>> I think you are right that there may be out-of-order
> >>> processing.
> >>>>>>>> However,
> >>>>>>>>>> it seems that out-of-order processing may also happen even
> >> if
> >>> we
> >>>>>> use a
> >>>>>>>>>> separate queue.
> >>>>>>>>>>
> >>>>>>>>>> Here is the example:
> >>>>>>>>>>
> >>>>>>>>>> - Controller sends R1 and got disconnected before receiving
> >>>>>> response.
> >>>>>>>>> Then
> >>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
> >>>>> controller
> >>>>>>>>>> request queue in the order they are sent.
> >>>>>>>>>> - thread1 takes R1_a from the request queue and then thread2
> >>>> takes
> >>>>>> R2
> >>>>>>>>> from
> >>>>>>>>>> the request queue almost at the same time.
> >>>>>>>>>> - So R1_a and R2 are processed in parallel. There is chance
> >>> that
> >>>>>> R2's
> >>>>>>>>>> processing is completed before R1.
> >>>>>>>>>>
> >>>>>>>>>> If out-of-order processing can happen for both approaches
> >> with
> >>>>> very
> >>>>>>>> low
> >>>>>>>>>> probability, it may not be worthwhile to add the extra
> >> queue.
> >>>> What
> >>>>>> do
> >>>>>>>> you
> >>>>>>>>>> think?
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Dong
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> >>>> becket.qin@gmail.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Mayuresh/Joel,
> >>>>>>>>>>>
> >>>>>>>>>>> Using the request channel as a dequeue was bright up some
> >>> time
> >>>>> ago
> >>>>>>>> when
> >>>>>>>>>> we
> >>>>>>>>>>> initially thinking of prioritizing the request. The
> >> concern
> >>>> was
> >>>>>> that
> >>>>>>>>> the
> >>>>>>>>>>> controller requests are supposed to be processed in order.
> >>> If
> >>>> we
> >>>>>> can
> >>>>>>>>>> ensure
> >>>>>>>>>>> that there is one controller request in the request
> >> channel,
> >>>> the
> >>>>>>>> order
> >>>>>>>>> is
> >>>>>>>>>>> not a concern. But in cases that there are more than one
> >>>>>> controller
> >>>>>>>>>> request
> >>>>>>>>>>> inserted into the queue, the controller request order may
> >>>> change
> >>>>>> and
> >>>>>>>>>> cause
> >>>>>>>>>>> problem. For example, think about the following sequence:
> >>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
> >>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
> >> the
> >>>>>> request
> >>>>>>>>>> queue.
> >>>>>>>>>>> 3. Controller to broker connection failed and the
> >> controller
> >>>>>>>>> reconnected
> >>>>>>>>>> to
> >>>>>>>>>>> the broker.
> >>>>>>>>>>> 4. Controller sends a request R2 to the broker
> >>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
> >> request
> >>>>> queue.
> >>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
> >>>>>> processed,
> >>>>>>>>>> which
> >>>>>>>>>>> may cause problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> >>>>> jjkoshy.w@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
> >>>> less
> >>>>>>>>> invasive
> >>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
> >> you
> >>>> see
> >>>>>> any
> >>>>>>>>>>> pitfalls
> >>>>>>>>>>>> with this approach?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> >>>>>>>> lucasatucla@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> @Mayuresh,
> >>>>>>>>>>>>> That's a very interesting idea that I haven't thought
> >>>>> before.
> >>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
> >>>> also
> >>>>>>>>>>>>> avoids the need to have a new size metric and capacity
> >>>>> config
> >>>>>>>>>>>>> for the controller request queue. In fact, if we were
> >> to
> >>>>> adopt
> >>>>>>>>>>>>> this design, there is no public interface change, and
> >> we
> >>>>>>>>>>>>> probably don't need a KIP.
> >>>>>>>>>>>>> Also implementation wise, it seems
> >>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
> >>> the
> >>>>>>>>>> requirement
> >>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
> >> at
> >>>>> both
> >>>>>>>> ends.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My only concern is that this design is tied to the
> >>>>> coincidence
> >>>>>>>> that
> >>>>>>>>>>>>> we have two request priorities and there are two ends
> >>> to a
> >>>>>>>> deque.
> >>>>>>>>>>>>> Hence by using the proposed design, it seems the
> >> network
> >>>>> layer
> >>>>>>>> is
> >>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
> >> we
> >>>> were
> >>>>>> to
> >>>>>>>> add
> >>>>>>>>>>>>> an extra priority level in the future for some reason,
> >>> we
> >>>>>> would
> >>>>>>>>>>> probably
> >>>>>>>>>>>>> need to go back to the design of separate queues, one
> >>> for
> >>>>> each
> >>>>>>>>>> priority
> >>>>>>>>>>>>> level.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
> >>> your
> >>>>>>>> suggested
> >>>>>>>>>>>>> approach.
> >>>>>>>>>>>>> Let's hear what others think.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Becket,
> >>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
> >>> answering
> >>>>>> your
> >>>>>>>>>>> question
> >>>>>>>>>>>>> only in the context
> >>>>>>>>>>>>> of the current KIP design: I think your suggestion
> >> makes
> >>>>>> sense,
> >>>>>>>> and
> >>>>>>>>>> I'm
> >>>>>>>>>>>> ok
> >>>>>>>>>>>>> with removing the capacity config and
> >>>>>>>>>>>>> just relying on the default value of 20 being
> >> sufficient
> >>>>>> enough.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Lucas
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> >>>>>>>>>>>>> gharatmayuresh15@gmail.com
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Lucas,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
> >>>>>>>> controller
> >>>>>>>>>>> request
> >>>>>>>>>>>>>> over any other requests.
> >>>>>>>>>>>>>> In that case, we can change the request queue to a
> >>>>> dequeue,
> >>>>>>>> where
> >>>>>>>>>> you
> >>>>>>>>>>>>>> always insert the normal requests (produce,
> >>>> consume,..etc)
> >>>>>> to
> >>>>>>>> the
> >>>>>>>>>> end
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the dequeue, but if its a controller request, you
> >>> insert
> >>>>> it
> >>>>>> to
> >>>>>>>>> the
> >>>>>>>>>>> head
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> the queue. This ensures that the controller request
> >>> will
> >>>>> be
> >>>>>>>> given
> >>>>>>>>>>>> higher
> >>>>>>>>>>>>>> priority over other requests.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Also since we only read one request from the socket
> >>> and
> >>>>> mute
> >>>>>>>> it
> >>>>>>>>> and
> >>>>>>>>>>>> only
> >>>>>>>>>>>>>> unmute it after handling the request, this would
> >>> ensure
> >>>>> that
> >>>>>>>> we
> >>>>>>>>>> don't
> >>>>>>>>>>>>>> handle controller requests out of order.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With this approach we can avoid the second queue and
> >>> the
> >>>>>>>>> additional
> >>>>>>>>>>>>> config
> >>>>>>>>>>>>>> for the size of the queue.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Mayuresh
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> >>>>>>>> becket.qin@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hey Joel,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
> >>> current
> >>>>>> design
> >>>>>>>>>> makes
> >>>>>>>>>>>>> sense.
> >>>>>>>>>>>>>>> My confusion is about whether the new config for
> >> the
> >>>>>>>> controller
> >>>>>>>>>>> queue
> >>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
> >>>> which
> >>>>>>>> users
> >>>>>>>>>>> would
> >>>>>>>>>>>>>> change
> >>>>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> >>>>>>>>>> becket.qin@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Lucas,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
> >>>> expect
> >>>>>>>> user to
> >>>>>>>>>>> ever
> >>>>>>>>>>>>>> change
> >>>>>>>>>>>>>>>> the controller request queue capacity"? If we
> >>> agree
> >>>>> that
> >>>>>>>> 20
> >>>>>>>>> is
> >>>>>>>>>>>>> already
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> very generous default number and we do not
> >> expect
> >>>> user
> >>>>>> to
> >>>>>>>>>> change
> >>>>>>>>>>>> it,
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> still necessary to expose this as a config?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> >>>>>>>>>>> lucasatucla@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> @Becket
> >>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
> >>>>> normally
> >>>>>>>> there
> >>>>>>>>>>>> should
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>> one controller request because of muting,
> >>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
> >> many
> >>>>>>>> enqueued
> >>>>>>>>>>>>> controller
> >>>>>>>>>>>>>>>>> requests.
> >>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
> >>>> which
> >>>>>> part
> >>>>>>>>>>> conveys
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> info.
> >>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
> >>>>> section.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
> >>>> conditions,
> >>>>>> the
> >>>>>>>>>> current
> >>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>> does not preclude multiple controllers running
> >>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
> >>>>> controller
> >>>>>>>>> queue
> >>>>>>>>>>>>> capacity
> >>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
> >>>>>>>>>>>>>>>>> network threads handling requests from
> >> different
> >>>>>>>> controllers
> >>>>>>>>>>> will
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> blocked during those troublesome times,
> >>>>>>>>>>>>>>>>> which is probably not what we want. On the
> >> other
> >>>>> hand,
> >>>>>>>>> adding
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> extra
> >>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
> >>> from
> >>>>>>>> issues
> >>>>>>>>> in
> >>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
> >>>> downside
> >>>>> of
> >>>>>>>>> adding
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> extra
> >>>>>>>>>>>>>>>>> config.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> @Mayuresh
> >>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
> >>> statement
> >>>>>> based
> >>>>>>>> on
> >>>>>>>>> a
> >>>>>>>>>>>>> previous
> >>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>> Lucas
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> >>> Gharat <
> >>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Lucas,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
> >>>> memory
> >>>>>>>>>>> consumption
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> rise
> >>>>>>>>>>>>>>>>>> given the total number of queued requests can
> >>> go
> >>>> up
> >>>>>> to
> >>>>>>>> 2x"
> >>>>>>>>>> in
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> impact
> >>>>>>>>>>>>>>>>>> section. Normally the requests from
> >> controller
> >>>> to a
> >>>>>>>> Broker
> >>>>>>>>>> are
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>> high
> >>>>>>>>>>>>>>>>>> volume, right ?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Mayuresh
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> >>>>>>>>>>>> becket.qin@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
> >>>> control
> >>>>>>>> plane
> >>>>>>>>>> from
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>> plane
> >>>>>>>>>>>>>>>>>>> makes a lot of sense.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
> >> controller
> >>>>>> request
> >>>>>>>>> queue
> >>>>>>>>>>> may
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> many
> >>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
> >>> The
> >>>>>>>>> controller
> >>>>>>>>>>>>>> requests
> >>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
> >>> SocketServer
> >>>>>> will
> >>>>>>>>> mute
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> channel
> >>>>>>>>>>>>>>>>>> once
> >>>>>>>>>>>>>>>>>>> a request is read and put into the request
> >>>>> channel.
> >>>>>>>> So
> >>>>>>>>>>>> assuming
> >>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> only one connection between controller and
> >>> each
> >>>>>>>> broker,
> >>>>>>>>> on
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>>>>> side,
> >>>>>>>>>>>>>>>>>>> there should be only one controller request
> >>> in
> >>>>> the
> >>>>>>>>>>> controller
> >>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>> queue
> >>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
> >> we
> >>>>> need
> >>>>>> a
> >>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>> controller
> >>>>>>>>>>>>>>>>>>> request queue capacity config? The default
> >>>> value
> >>>>> 20
> >>>>>>>>> means
> >>>>>>>>>>> that
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> expect
> >>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
> >>> in a
> >>>>>> short
> >>>>>>>>>> period
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>> not sure whether someone should increase
> >> the
> >>>>>>>> controller
> >>>>>>>>>>>> request
> >>>>>>>>>>>>>>> queue
> >>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
> >>>>>> indicating
> >>>>>>>>>>> something
> >>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>> wrong
> >>>>>>>>>>>>>>>>>>> has happened.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> >>>>>>>>>>>> lindong28@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I think the motivation section is
> >>> intuitive.
> >>>> It
> >>>>>>>> will
> >>>>>>>>> be
> >>>>>>>>>>> good
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> learn
> >>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> >>> Wang <
> >>>>>>>>>>>>>>> lucasatucla@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Dong,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
> >>> the
> >>>>> KIP
> >>>>>> by
> >>>>>>>>>>>> explaining
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> cases
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> would have user impacts.
> >>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
> >>>>>> comments.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>> Lucas
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> >>> Wang
> >>>> <
> >>>>>>>>>>>>>>> lucasatucla@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
> >>>> merely
> >>>>>>>> for me
> >>>>>>>>>> to
> >>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>>>> construct
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> testing scenario
> >>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
> >> In
> >>>>>>>> production,
> >>>>>>>>>>> other
> >>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> disk
> >>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
> >>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
> >> by
> >>>> high
> >>>>>>>>> produce
> >>>>>>>>>>> QPS.
> >>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
> >>> the
> >>>>>>>> broker
> >>>>>>>>> and
> >>>>>>>>>>>>> that's
> >>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
> >>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
> >>>> about
> >>>>>> each
> >>>>>>>>>>>>>> ProduceRequest
> >>>>>>>>>>>>>>>>>>> covering
> >>>>>>>>>>>>>>>>>>>>> 20
> >>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
> >>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
> >>>>> request
> >>>>>>>> is
> >>>>>>>>>>>> enqueued
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> tries
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
> >> broker0,
> >>>>> from
> >>>>>>>>> leader
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> follower
> >>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
> >>>> *test-0*.
> >>>>>> For
> >>>>>>>> the
> >>>>>>>>>>> sake
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> argument,
> >>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
> >>> say
> >>>>>>>> broker1,
> >>>>>>>>>> have
> >>>>>>>>>>>>>>> *stopped*
> >>>>>>>>>>>>>>>>>>>> fetching
> >>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
> >>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
> >>> have
> >>>>>> acks =
> >>>>>>>>> -1
> >>>>>>>>>>>> (ALL)
> >>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
> >>>> ProduceRequests
> >>>>>>>> ahead
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>> LeaderAndISR
> >>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
> >>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
> >>>>> replicated
> >>>>>>>> to
> >>>>>>>>>> other
> >>>>>>>>>>>>>> brokers
> >>>>>>>>>>>>>>>>>>> (because
> >>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
> >>>>>>>>>>>>>>>>>>>>>>        be completed either when the
> >>>>>>>> LeaderAndISR
> >>>>>>>>>>>> request
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> processed
> >>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
> >>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
> >>>>> immediately
> >>>>>>>>>>> transition
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
> >>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
> >>> the
> >>>>>>>>>> replication
> >>>>>>>>>>> of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> remaining
> >>>>>>>>>>>>>>>>>>>> 19
> >>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
> >>>>> indicating
> >>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
> >>> the
> >>>>>>>>> "test-0".
> >>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
> >> between
> >>>> 1.1
> >>>>>> and
> >>>>>>>>> 1.2,
> >>>>>>>>>>>> let's
> >>>>>>>>>>>>>> say
> >>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
> >>>>>> LeaderAndISR,
> >>>>>>>>> and
> >>>>>>>>>>>> there
> >>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>> 8
> >>>>>>>>>>>>>>>>> io
> >>>>>>>>>>>>>>>>>>>>> threads,
> >>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
> >>>>>> approximately
> >>>>>>>>> 3000
> >>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>>>> requests.
> >>>>>>>>>>>>>>>>>>>> Now
> >>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
> >>>>> finally
> >>>>>>>>>> processed
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> LeaderAndISR.
> >>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
> >> we
> >>>>> model
> >>>>>>>> the
> >>>>>>>>>> time
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>> remaining
> >>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
> >>>> ...t2999,
> >>>>>> and
> >>>>>>>>> the
> >>>>>>>>>>>>>>> LeaderAndISR
> >>>>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
> >>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
> >>>> request
> >>>>>>>> would
> >>>>>>>>>> have
> >>>>>>>>>>>>>> waited
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>> extra
> >>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
> >>> 2nd
> >>>>> an
> >>>>>>>> extra
> >>>>>>>>>>> time
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> t3000 -
> >>>>>>>>>>>>>>>>>>> t1,
> >>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
> >>>> difference
> >>>>> is
> >>>>>>>>> bigger
> >>>>>>>>>>> for
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> earlier
> >>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
> >>> ones.
> >>>>> For
> >>>>>>>> the
> >>>>>>>>>> same
> >>>>>>>>>>>>>> reason,
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
> >>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
> >>>>> benefit
> >>>>>>>> we
> >>>>>>>>> get
> >>>>>>>>>>>>> (capped
> >>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> produce timeout).
> >>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
> >>> have
> >>>>>>>> acks=0 or
> >>>>>>>>>>>> acks=1
> >>>>>>>>>>>>>>>>>>>>>>  There will be no latency
> >> differences
> >>> in
> >>>>>> this
> >>>>>>>>> case,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
> >> of
> >>>>>>>> partition
> >>>>>>>>>>> test-0
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
> >>> LeaderAndISR
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>> appended
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> local
> >>>>>>>>>>>>>>>>>>>>> log,
> >>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
> >>> after
> >>>>>>>>> processing
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> LeaderAndISR.
> >>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
> >>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
> >> of
> >>>> data
> >>>>>>>> loss
> >>>>>>>>> in
> >>>>>>>>>>>> terms
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> messages
> >>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
> >>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
> >>> the
> >>>>>> effect
> >>>>>>>>>> since
> >>>>>>>>>>> if
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> LeaderAndISR
> >>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
> >> response
> >>> to
> >>>>>>>>> producers
> >>>>>>>>>>> will
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
> >>> error,
> >>>>>>>> causing
> >>>>>>>>>>>> producers
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> retry
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
> >>> for
> >>>>>>>> reducing
> >>>>>>>>>> the
> >>>>>>>>>>>>>> latency
> >>>>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>>>>>>>>> becoming the follower,
> >>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
> >> latency
> >>>> of
> >>>>> a
> >>>>>>>>> broker
> >>>>>>>>>>>>> becoming
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> leader.
> >>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
> >> more
> >>>>>>>> obvious, if
> >>>>>>>>>>> other
> >>>>>>>>>>>>>>> brokers
> >>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
> >>>>>>>>>>>>>>>>>>>>>> current broker should take
> >> leadership.
> >>>> Any
> >>>>>>>> delay
> >>>>>>>>> in
> >>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
> >>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
> >>> extreme
> >>>>>> cases,
> >>>>>>>>> this
> >>>>>>>>>>> can
> >>>>>>>>>>>>>> cause
> >>>>>>>>>>>>>>>>>> failed
> >>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
> >>>>>>>>>>>>>>>>>>>>>> exhausted.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Another two types of controller
> >>> requests
> >>>>> are
> >>>>>>>>>>>>> UpdateMetadata
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
> >> discuss
> >>>> as
> >>>>>>>>> follows:
> >>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
> >>>>>> processing
> >>>>>>>>>> means
> >>>>>>>>>>>>>> clients
> >>>>>>>>>>>>>>>>>>> receiving
> >>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
> >>>>>> leadership
> >>>>>>>>> info
> >>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
> >> effect
> >>> is
> >>>>>> more
> >>>>>>>>>> retries
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>> fatal
> >>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
> >>> queuing
> >>>>> time
> >>>>>>>> may
> >>>>>>>>>>>> degrade
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>> of topic deletion.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
> >>> delay
> >>>>> for
> >>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
> >>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>> are right
> >>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
> >>>> latency
> >>>>> in
> >>>>>>>>>> getting
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> log
> >>>>>>>>>>>>>>>>> dirs
> >>>>>>>>>>>>>>>>>>>> info,
> >>>>>>>>>>>>>>>>>>>>>> and it's only relevant
> >>>>>>>>>>>>>>>>>>>>>> when controller requests are
> >> involved.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>> Lucas
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
> >>> Lin
> >>>> <
> >>>>>>>>>>>>>> lindong28@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
> >>> good
> >>>>>>>> point.
> >>>>>>>>> So
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> feature
> >>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
> >>>>>> question
> >>>>>>>>>> below.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
> >>> useful
> >>>>> for
> >>>>>>>>>> non-JBOD
> >>>>>>>>>>>>> setup
> >>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
> >>>>> useful
> >>>>>> to
> >>>>>>>>>>>> understand
> >>>>>>>>>>>>>>> this.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
> >>> in
> >>>>>> order
> >>>>>>>> to
> >>>>>>>>>> move
> >>>>>>>>>>>>>> leaders
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> failed
> >>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
> >>> operator
> >>>>>> first
> >>>>>>>>>> needs
> >>>>>>>>>>> to
> >>>>>>>>>>>>> get
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> list
> >>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
> >> is
> >>>>>>>> currently
> >>>>>>>>>>>> achieved
> >>>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
> >>>> sends
> >>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
> >>>>> controller
> >>>>>>>>>>> requests,
> >>>>>>>>>>>>> then
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
> >>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
> >>>> processed
> >>>>>> by
> >>>>>>>> the
> >>>>>>>>>>>> broker.
> >>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> overall
> >>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
> >>>> failed
> >>>>>> disk
> >>>>>>>>> may
> >>>>>>>>>>>> still
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> long
> >>>>>>>>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>> Dong
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
> >> Lucas
> >>>>> Wang <
> >>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
> >>>> Jun.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> @Dong,
> >>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
> >>> your
> >>>>>>>> previous
> >>>>>>>>>>> email
> >>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
> >>> it's
> >>>>>>>> useful,
> >>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
> >>> you
> >>>>>> agree
> >>>>>>>>> that
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
> >> by
> >>>> Jun?
> >>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Lucas
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
> >> Jun
> >>>> Rao
> >>>>> <
> >>>>>>>>>>>>>> jun@confluent.io>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
> >> slow,
> >>>> one
> >>>>>>>>> probably
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>> kill
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
> >> may
> >>>> not
> >>>>>>>> help.
> >>>>>>>>> If
> >>>>>>>>>>>> only
> >>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> disks
> >>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
> >>> fail
> >>>>>> that
> >>>>>>>>> disk
> >>>>>>>>>>> and
> >>>>>>>>>>>>> move
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> leaders
> >>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
> >>> case,
> >>>>>> being
> >>>>>>>>> able
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> process
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
> >>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
> >>>> help
> >>>>>> the
> >>>>>>>>>>> producers
> >>>>>>>>>>>>>>> recover
> >>>>>>>>>>>>>>>>>>>> quicker.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Jun
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
> >>> Dong
> >>>>>> Lin <
> >>>>>>>>>>>>>>>>> lindong28@gmail.com
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
> >>> follow
> >>>> up
> >>>>>>>>>> questions
> >>>>>>>>>>>>> below.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
> >>>> ProduceRequest
> >>>>>>>> covers
> >>>>>>>>> 20
> >>>>>>>>>>>>>>> partitions
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>> randomly
> >>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
> >>> partitions,
> >>>>>> then
> >>>>>>>>> each
> >>>>>>>>>>>>>>>>> ProduceRequest
> >>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>> likely
> >>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
> >> which
> >>>> the
> >>>>>>>> broker
> >>>>>>>>> is
> >>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>> leader
> >>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>> quickly
> >>>>>>>>>>>>>>>>>>>>>>>>>> processes the
> >>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
> >> broker
> >>>>> will
> >>>>>>>> still
> >>>>>>>>>> be
> >>>>>>>>>>>> slow
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
> >> will
> >>>>> still
> >>>>>> be
> >>>>>>>>> very
> >>>>>>>>>>>> high
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> KIP.
> >>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
> >>>> still
> >>>>>>>> timeout
> >>>>>>>>>>> after
> >>>>>>>>>>>>> 30
> >>>>>>>>>>>>>>>>>> seconds.
> >>>>>>>>>>>>>>>>>>> Is
> >>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
> >>>> ProduceRequest
> >>>>>> will
> >>>>>>>>>> still
> >>>>>>>>>>>>>> timeout
> >>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>> 30
> >>>>>>>>>>>>>>>>>>>>>>>> seconds,
> >>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
> >>> KIP
> >>>>>>>> reduces
> >>>>>>>>>>>> average
> >>>>>>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>>>>>>> latency.
> >>>>>>>>>>>>>>>>>>>>>>> Can
> >>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
> >>>> improved
> >>>>>> by
> >>>>>>>>> this
> >>>>>>>>>>> KIP?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
> >>>>> directly
> >>>>>>>> cares
> >>>>>>>>>>>> number
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> truncated
> >>>>>>>>>>>>>>>>>>>>>>>> messages.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
> >> improve
> >>>>>> average
> >>>>>>>>>>>> throughput
> >>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>> reduce
> >>>>>>>>>>>>>>>>>>>>>>> message
> >>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
> >> to
> >>>>>>>> understand
> >>>>>>>>>>> this.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Dong
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
> >>>> Lucas
> >>>>>>>> Wang <
> >>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
> >>>> comments.
> >>>>>>>> Please
> >>>>>>>>>> see
> >>>>>>>>>>>> my
> >>>>>>>>>>>>>>> reply
> >>>>>>>>>>>>>>>>>>> below.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
> >> only
> >>> 1
> >>>>>>>>> partition.
> >>>>>>>>>>> Now
> >>>>>>>>>>>>>> let's
> >>>>>>>>>>>>>>>>>>> consider
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>> common
> >>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
> >>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
> >> of
> >>>>> many
> >>>>>>>>>>> partitions.
> >>>>>>>>>>>>> And
> >>>>>>>>>>>>>>>>> let's
> >>>>>>>>>>>>>>>>>>> say
> >>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
> >>> partitions
> >>>> on
> >>>>>>>>> broker0
> >>>>>>>>>> is
> >>>>>>>>>>>> so
> >>>>>>>>>>>>>>> large,
> >>>>>>>>>>>>>>>>>> say
> >>>>>>>>>>>>>>>>>>>> 10K,
> >>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
> >> to
> >>>>> shift
> >>>>>>>> the
> >>>>>>>>>>>>> leadership
> >>>>>>>>>>>>>>>>> for a
> >>>>>>>>>>>>>>>>>>> lot
> >>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
> >>>>>> brokers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
> >>> some
> >>>>>>>> service
> >>>>>>>>>> like
> >>>>>>>>>>>>> cruise
> >>>>>>>>>>>>>>>>>> control.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
> >>> the
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Jun,

The usage of correlation ID might still be useful to address the cases that the controller epoch and leader epoch check are not sufficient to guarantee correct behavior. For example, if the controller sends a LeaderAndIsrRequest followed by a StopReplicaRequest, and the broker processes it in the reverse order, the replica may still be wrongly recreated, right?

Thanks,

Jiangjie (Becket) Qin

> On Jul 22, 2018, at 11:47 AM, Jun Rao <ju...@confluent.io> wrote:
> 
> Hmm, since we already use controller epoch and leader epoch for properly
> caching the latest partition state, do we really need correlation id for
> ordering the controller requests?
> 
> Thanks,
> 
> Jun
> 
> On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com> wrote:
> 
>> Lucas and Mayuresh,
>> 
>> Good idea. The correlation id should work.
>> 
>> In the ControllerChannelManager, a request will be resent until a response
>> is received. So if the controller to broker connection disconnects after
>> controller sends R1_a, but before the response of R1_a is received, a
>> disconnection may cause the controller to resend R1_b. i.e. until R1 is
>> acked, R2 won't be sent by the controller.
>> This gives two guarantees:
>> 1. Correlation id wise: R1_a < R1_b < R2.
>> 2. On the broker side, when R2 is seen, R1 must have been processed at
>> least once.
>> 
>> So on the broker side, with a single thread controller request handler, the
>> logic should be:
>> 1. Process what ever request seen in the controller request queue
>> 2. For the given epoch, drop request if its correlation id is smaller than
>> that of the last processed request.
>> 
>> Thanks,
>> 
>> Jiangjie (Becket) Qin
>> 
>> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
>> 
>>> I agree that there is no strong ordering when there are more than one
>>> socket connections. Currently, we rely on controllerEpoch and leaderEpoch
>>> to ensure that the receiving broker picks up the latest state for each
>>> partition.
>>> 
>>> One potential issue with the dequeue approach is that if the queue is
>> full,
>>> there is no guarantee that the controller requests will be enqueued
>>> quickly.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
>>> gharatmayuresh15@gmail.com
>>>> wrote:
>>> 
>>>> Yea, the correlationId is only set to 0 in the NetworkClient
>> constructor.
>>>> Since we reuse the same NetworkClient between Controller and the
>> broker,
>>> a
>>>> disconnection should not cause it to reset to 0, in which case it can
>> be
>>>> used to reject obsolete requests.
>>>> 
>>>> Thanks,
>>>> 
>>>> Mayuresh
>>>> 
>>>> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
>>> wrote:
>>>> 
>>>>> @Dong,
>>>>> Great example and explanation, thanks!
>>>>> 
>>>>> @All
>>>>> Regarding the example given by Dong, it seems even if we use a queue,
>>>> and a
>>>>> dedicated controller request handling thread,
>>>>> the same result can still happen because R1_a will be sent on one
>>>>> connection, and R1_b & R2 will be sent on a different connection,
>>>>> and there is no ordering between different connections on the broker
>>>> side.
>>>>> I was discussing with Mayuresh offline, and it seems correlation id
>>>> within
>>>>> the same NetworkClient object is monotonically increasing and never
>>>> reset,
>>>>> hence a broker can leverage that to properly reject obsolete
>> requests.
>>>>> Thoughts?
>>>>> 
>>>>> Thanks,
>>>>> Lucas
>>>>> 
>>>>> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
>>>>> gharatmayuresh15@gmail.com> wrote:
>>>>> 
>>>>>> Actually nvm, correlationId is reset in case of connection loss, I
>>>> think.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Mayuresh
>>>>>> 
>>>>>> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
>>>>>> gharatmayuresh15@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> I agree with Dong that out-of-order processing can happen with
>>>> having 2
>>>>>>> separate queues as well and it can even happen today.
>>>>>>> Can we use the correlationId in the request from the controller
>> to
>>>> the
>>>>>>> broker to handle ordering ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Mayuresh
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket.qin@gmail.com
>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>>> Good point, Joel. I agree that a dedicated controller request
>>>> handling
>>>>>>>> thread would be a better isolation. It also solves the
>> reordering
>>>>> issue.
>>>>>>>> 
>>>>>>>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
>> jjkoshy.w@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Good example. I think this scenario can occur in the current
>>> code
>>>> as
>>>>>>>> well
>>>>>>>>> but with even lower probability given that there are other
>>>>>>>> non-controller
>>>>>>>>> requests interleaved. It is still sketchy though and I think a
>>>> safer
>>>>>>>>> approach would be separate queues and pinning controller
>> request
>>>>>>>> handling
>>>>>>>>> to one handler thread.
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
>> lindong28@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hey Becket,
>>>>>>>>>> 
>>>>>>>>>> I think you are right that there may be out-of-order
>>> processing.
>>>>>>>> However,
>>>>>>>>>> it seems that out-of-order processing may also happen even
>> if
>>> we
>>>>>> use a
>>>>>>>>>> separate queue.
>>>>>>>>>> 
>>>>>>>>>> Here is the example:
>>>>>>>>>> 
>>>>>>>>>> - Controller sends R1 and got disconnected before receiving
>>>>>> response.
>>>>>>>>> Then
>>>>>>>>>> it reconnects and sends R2. Both requests now stay in the
>>>>> controller
>>>>>>>>>> request queue in the order they are sent.
>>>>>>>>>> - thread1 takes R1_a from the request queue and then thread2
>>>> takes
>>>>>> R2
>>>>>>>>> from
>>>>>>>>>> the request queue almost at the same time.
>>>>>>>>>> - So R1_a and R2 are processed in parallel. There is chance
>>> that
>>>>>> R2's
>>>>>>>>>> processing is completed before R1.
>>>>>>>>>> 
>>>>>>>>>> If out-of-order processing can happen for both approaches
>> with
>>>>> very
>>>>>>>> low
>>>>>>>>>> probability, it may not be worthwhile to add the extra
>> queue.
>>>> What
>>>>>> do
>>>>>>>> you
>>>>>>>>>> think?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dong
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
>>>> becket.qin@gmail.com
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Mayuresh/Joel,
>>>>>>>>>>> 
>>>>>>>>>>> Using the request channel as a dequeue was bright up some
>>> time
>>>>> ago
>>>>>>>> when
>>>>>>>>>> we
>>>>>>>>>>> initially thinking of prioritizing the request. The
>> concern
>>>> was
>>>>>> that
>>>>>>>>> the
>>>>>>>>>>> controller requests are supposed to be processed in order.
>>> If
>>>> we
>>>>>> can
>>>>>>>>>> ensure
>>>>>>>>>>> that there is one controller request in the request
>> channel,
>>>> the
>>>>>>>> order
>>>>>>>>> is
>>>>>>>>>>> not a concern. But in cases that there are more than one
>>>>>> controller
>>>>>>>>>> request
>>>>>>>>>>> inserted into the queue, the controller request order may
>>>> change
>>>>>> and
>>>>>>>>>> cause
>>>>>>>>>>> problem. For example, think about the following sequence:
>>>>>>>>>>> 1. Controller successfully sent a request R1 to broker
>>>>>>>>>>> 2. Broker receives R1 and put the request to the head of
>> the
>>>>>> request
>>>>>>>>>> queue.
>>>>>>>>>>> 3. Controller to broker connection failed and the
>> controller
>>>>>>>>> reconnected
>>>>>>>>>> to
>>>>>>>>>>> the broker.
>>>>>>>>>>> 4. Controller sends a request R2 to the broker
>>>>>>>>>>> 5. Broker receives R2 and add it to the head of the
>> request
>>>>> queue.
>>>>>>>>>>> Now on the broker side, R2 will be processed before R1 is
>>>>>> processed,
>>>>>>>>>> which
>>>>>>>>>>> may cause problem.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
>>>>> jjkoshy.w@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> @Mayuresh - I like your idea. It appears to be a simpler
>>>> less
>>>>>>>>> invasive
>>>>>>>>>>>> alternative and it should work. Jun/Becket/others, do
>> you
>>>> see
>>>>>> any
>>>>>>>>>>> pitfalls
>>>>>>>>>>>> with this approach?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>>>>>>>> lucasatucla@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> @Mayuresh,
>>>>>>>>>>>>> That's a very interesting idea that I haven't thought
>>>>> before.
>>>>>>>>>>>>> It seems to solve our problem at hand pretty well, and
>>>> also
>>>>>>>>>>>>> avoids the need to have a new size metric and capacity
>>>>> config
>>>>>>>>>>>>> for the controller request queue. In fact, if we were
>> to
>>>>> adopt
>>>>>>>>>>>>> this design, there is no public interface change, and
>> we
>>>>>>>>>>>>> probably don't need a KIP.
>>>>>>>>>>>>> Also implementation wise, it seems
>>>>>>>>>>>>> the java class LinkedBlockingQueue can readily satisfy
>>> the
>>>>>>>>>> requirement
>>>>>>>>>>>>> by supporting a capacity, and also allowing inserting
>> at
>>>>> both
>>>>>>>> ends.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My only concern is that this design is tied to the
>>>>> coincidence
>>>>>>>> that
>>>>>>>>>>>>> we have two request priorities and there are two ends
>>> to a
>>>>>>>> deque.
>>>>>>>>>>>>> Hence by using the proposed design, it seems the
>> network
>>>>> layer
>>>>>>>> is
>>>>>>>>>>>>> more tightly coupled with upper layer logic, e.g. if
>> we
>>>> were
>>>>>> to
>>>>>>>> add
>>>>>>>>>>>>> an extra priority level in the future for some reason,
>>> we
>>>>>> would
>>>>>>>>>>> probably
>>>>>>>>>>>>> need to go back to the design of separate queues, one
>>> for
>>>>> each
>>>>>>>>>> priority
>>>>>>>>>>>>> level.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In summary, I'm ok with both designs and lean toward
>>> your
>>>>>>>> suggested
>>>>>>>>>>>>> approach.
>>>>>>>>>>>>> Let's hear what others think.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Becket,
>>>>>>>>>>>>> In light of Mayuresh's suggested new design, I'm
>>> answering
>>>>>> your
>>>>>>>>>>> question
>>>>>>>>>>>>> only in the context
>>>>>>>>>>>>> of the current KIP design: I think your suggestion
>> makes
>>>>>> sense,
>>>>>>>> and
>>>>>>>>>> I'm
>>>>>>>>>>>> ok
>>>>>>>>>>>>> with removing the capacity config and
>>>>>>>>>>>>> just relying on the default value of 20 being
>> sufficient
>>>>>> enough.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>>>>>>>>>>>>> gharatmayuresh15@gmail.com
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Seems like the main intent here is to prioritize the
>>>>>>>> controller
>>>>>>>>>>> request
>>>>>>>>>>>>>> over any other requests.
>>>>>>>>>>>>>> In that case, we can change the request queue to a
>>>>> dequeue,
>>>>>>>> where
>>>>>>>>>> you
>>>>>>>>>>>>>> always insert the normal requests (produce,
>>>> consume,..etc)
>>>>>> to
>>>>>>>> the
>>>>>>>>>> end
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the dequeue, but if its a controller request, you
>>> insert
>>>>> it
>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> head
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> the queue. This ensures that the controller request
>>> will
>>>>> be
>>>>>>>> given
>>>>>>>>>>>> higher
>>>>>>>>>>>>>> priority over other requests.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Also since we only read one request from the socket
>>> and
>>>>> mute
>>>>>>>> it
>>>>>>>>> and
>>>>>>>>>>>> only
>>>>>>>>>>>>>> unmute it after handling the request, this would
>>> ensure
>>>>> that
>>>>>>>> we
>>>>>>>>>> don't
>>>>>>>>>>>>>> handle controller requests out of order.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With this approach we can avoid the second queue and
>>> the
>>>>>>>>> additional
>>>>>>>>>>>>> config
>>>>>>>>>>>>>> for the size of the queue.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you think ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Mayuresh
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>>>>>>>> becket.qin@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hey Joel,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank for the detail explanation. I agree the
>>> current
>>>>>> design
>>>>>>>>>> makes
>>>>>>>>>>>>> sense.
>>>>>>>>>>>>>>> My confusion is about whether the new config for
>> the
>>>>>>>> controller
>>>>>>>>>>> queue
>>>>>>>>>>>>>>> capacity is necessary. I cannot think of a case in
>>>> which
>>>>>>>> users
>>>>>>>>>>> would
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
>>>>>>>>>> becket.qin@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I guess my question can be rephrased to "do we
>>>> expect
>>>>>>>> user to
>>>>>>>>>>> ever
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>> the controller request queue capacity"? If we
>>> agree
>>>>> that
>>>>>>>> 20
>>>>>>>>> is
>>>>>>>>>>>>> already
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> very generous default number and we do not
>> expect
>>>> user
>>>>>> to
>>>>>>>>>> change
>>>>>>>>>>>> it,
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> still necessary to expose this as a config?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>>>>>>>>>>> lucasatucla@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> @Becket
>>>>>>>>>>>>>>>>> 1. Thanks for the comment. You are right that
>>>>> normally
>>>>>>>> there
>>>>>>>>>>>> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> one controller request because of muting,
>>>>>>>>>>>>>>>>> and I had NOT intended to say there would be
>> many
>>>>>>>> enqueued
>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>>>> I went through the KIP again, and I'm not sure
>>>> which
>>>>>> part
>>>>>>>>>>> conveys
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> info.
>>>>>>>>>>>>>>>>> I'd be happy to revise if you point it out the
>>>>> section.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. Though it should not happen in normal
>>>> conditions,
>>>>>> the
>>>>>>>>>> current
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> does not preclude multiple controllers running
>>>>>>>>>>>>>>>>> at the same time, hence if we don't have the
>>>>> controller
>>>>>>>>> queue
>>>>>>>>>>>>> capacity
>>>>>>>>>>>>>>>>> config and simply make its capacity to be 1,
>>>>>>>>>>>>>>>>> network threads handling requests from
>> different
>>>>>>>> controllers
>>>>>>>>>>> will
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> blocked during those troublesome times,
>>>>>>>>>>>>>>>>> which is probably not what we want. On the
>> other
>>>>> hand,
>>>>>>>>> adding
>>>>>>>>>>> the
>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>> config with a default value, say 20, guards us
>>> from
>>>>>>>> issues
>>>>>>>>> in
>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>> troublesome times, and IMO there isn't much
>>>> downside
>>>>> of
>>>>>>>>> adding
>>>>>>>>>>> the
>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>> config.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> @Mayuresh
>>>>>>>>>>>>>>>>> Good catch, this sentence is an obsolete
>>> statement
>>>>>> based
>>>>>>>> on
>>>>>>>>> a
>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>> design. I've revised the wording in the KIP.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
>>> Gharat <
>>>>>>>>>>>>>>>>> gharatmayuresh15@gmail.com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Lucas,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>> I am trying to understand why you think "The
>>>> memory
>>>>>>>>>>> consumption
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> rise
>>>>>>>>>>>>>>>>>> given the total number of queued requests can
>>> go
>>>> up
>>>>>> to
>>>>>>>> 2x"
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> impact
>>>>>>>>>>>>>>>>>> section. Normally the requests from
>> controller
>>>> to a
>>>>>>>> Broker
>>>>>>>>>> are
>>>>>>>>>>>> not
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>>> volume, right ?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Mayuresh
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>>>>>>>>>>>> becket.qin@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the KIP, Lucas. Separating the
>>>> control
>>>>>>>> plane
>>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> plane
>>>>>>>>>>>>>>>>>>> makes a lot of sense.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> In the KIP you mentioned that the
>> controller
>>>>>> request
>>>>>>>>> queue
>>>>>>>>>>> may
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>> requests in it. Will this be a common case?
>>> The
>>>>>>>>> controller
>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> goes through the SocketServer. The
>>> SocketServer
>>>>>> will
>>>>>>>>> mute
>>>>>>>>>>> the
>>>>>>>>>>>>>>> channel
>>>>>>>>>>>>>>>>>> once
>>>>>>>>>>>>>>>>>>> a request is read and put into the request
>>>>> channel.
>>>>>>>> So
>>>>>>>>>>>> assuming
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> only one connection between controller and
>>> each
>>>>>>>> broker,
>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>> side,
>>>>>>>>>>>>>>>>>>> there should be only one controller request
>>> in
>>>>> the
>>>>>>>>>>> controller
>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>>>> at any given time. If that is the case, do
>> we
>>>>> need
>>>>>> a
>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>>>> request queue capacity config? The default
>>>> value
>>>>> 20
>>>>>>>>> means
>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> expect
>>>>>>>>>>>>>>>>>>> there are 20 controller switches to happen
>>> in a
>>>>>> short
>>>>>>>>>> period
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>> not sure whether someone should increase
>> the
>>>>>>>> controller
>>>>>>>>>>>> request
>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>>>> capacity to handle such case, as it seems
>>>>>> indicating
>>>>>>>>>>> something
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> wrong
>>>>>>>>>>>>>>>>>>> has happened.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>>>>>>>>>>>> lindong28@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for the update Lucas.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I think the motivation section is
>>> intuitive.
>>>> It
>>>>>>>> will
>>>>>>>>> be
>>>>>>>>>>> good
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> learn
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>> about the comments from other reviewers.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Jul 12, 2018 at 9:48 PM, Lucas
>>> Wang <
>>>>>>>>>>>>>>> lucasatucla@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I've updated the motivation section of
>>> the
>>>>> KIP
>>>>>> by
>>>>>>>>>>>> explaining
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> would have user impacts.
>>>>>>>>>>>>>>>>>>>>> Please take a look at let me know your
>>>>>> comments.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 9, 2018 at 5:53 PM, Lucas
>>> Wang
>>>> <
>>>>>>>>>>>>>>> lucasatucla@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The simulation of disk being slow is
>>>> merely
>>>>>>>> for me
>>>>>>>>>> to
>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>>> construct
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> testing scenario
>>>>>>>>>>>>>>>>>>>>>> with a backlog of produce requests.
>> In
>>>>>>>> production,
>>>>>>>>>>> other
>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> disk
>>>>>>>>>>>>>>>>>>>>>> being slow, a backlog of
>>>>>>>>>>>>>>>>>>>>>> produce requests may also be caused
>> by
>>>> high
>>>>>>>>> produce
>>>>>>>>>>> QPS.
>>>>>>>>>>>>>>>>>>>>>> In that case, we may not want to kill
>>> the
>>>>>>>> broker
>>>>>>>>> and
>>>>>>>>>>>>> that's
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>> can be useful, both for JBOD
>>>>>>>>>>>>>>>>>>>>>> and non-JBOD setup.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Going back to your previous question
>>>> about
>>>>>> each
>>>>>>>>>>>>>> ProduceRequest
>>>>>>>>>>>>>>>>>>> covering
>>>>>>>>>>>>>>>>>>>>> 20
>>>>>>>>>>>>>>>>>>>>>> partitions that are randomly
>>>>>>>>>>>>>>>>>>>>>> distributed, let's say a LeaderAndIsr
>>>>> request
>>>>>>>> is
>>>>>>>>>>>> enqueued
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> tries
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> switch the current broker, say
>> broker0,
>>>>> from
>>>>>>>>> leader
>>>>>>>>>> to
>>>>>>>>>>>>>>> follower
>>>>>>>>>>>>>>>>>>>>>> *for one of the partitions*, say
>>>> *test-0*.
>>>>>> For
>>>>>>>> the
>>>>>>>>>>> sake
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> argument,
>>>>>>>>>>>>>>>>>>>>>> let's also assume the other brokers,
>>> say
>>>>>>>> broker1,
>>>>>>>>>> have
>>>>>>>>>>>>>>> *stopped*
>>>>>>>>>>>>>>>>>>>> fetching
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>> the current broker, i.e. broker0.
>>>>>>>>>>>>>>>>>>>>>> 1. If the enqueued produce requests
>>> have
>>>>>> acks =
>>>>>>>>> -1
>>>>>>>>>>>> (ALL)
>>>>>>>>>>>>>>>>>>>>>>  1.1 without this KIP, the
>>>> ProduceRequests
>>>>>>>> ahead
>>>>>>>>> of
>>>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> put into the purgatory,
>>>>>>>>>>>>>>>>>>>>>>        and since they'll never be
>>>>> replicated
>>>>>>>> to
>>>>>>>>>> other
>>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>>>>>>> (because
>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> the assumption made above), they will
>>>>>>>>>>>>>>>>>>>>>>        be completed either when the
>>>>>>>> LeaderAndISR
>>>>>>>>>>>> request
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>> when the timeout happens.
>>>>>>>>>>>>>>>>>>>>>>  1.2 With this KIP, broker0 will
>>>>> immediately
>>>>>>>>>>> transition
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>>>>>>>>>> test-0 to become a follower,
>>>>>>>>>>>>>>>>>>>>>>        after the current broker sees
>>> the
>>>>>>>>>> replication
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>>>>>> 19
>>>>>>>>>>>>>>>>>>>>>> partitions, it can send a response
>>>>> indicating
>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>        it's no longer the leader for
>>> the
>>>>>>>>> "test-0".
>>>>>>>>>>>>>>>>>>>>>>  To see the latency difference
>> between
>>>> 1.1
>>>>>> and
>>>>>>>>> 1.2,
>>>>>>>>>>>> let's
>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>> 24K produce requests ahead of the
>>>>>> LeaderAndISR,
>>>>>>>>> and
>>>>>>>>>>>> there
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> 8
>>>>>>>>>>>>>>>>> io
>>>>>>>>>>>>>>>>>>>>> threads,
>>>>>>>>>>>>>>>>>>>>>>  so each io thread will process
>>>>>> approximately
>>>>>>>>> 3000
>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>> requests.
>>>>>>>>>>>>>>>>>>>> Now
>>>>>>>>>>>>>>>>>>>>>> let's investigate the io thread that
>>>>> finally
>>>>>>>>>> processed
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> LeaderAndISR.
>>>>>>>>>>>>>>>>>>>>>>  For the 3000 produce requests, if
>> we
>>>>> model
>>>>>>>> the
>>>>>>>>>> time
>>>>>>>>>>>> when
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>>>>>>>> 19 partitions catch up as t0, t1,
>>>> ...t2999,
>>>>>> and
>>>>>>>>> the
>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> processed at time t3000.
>>>>>>>>>>>>>>>>>>>>>>  Without this KIP, the 1st produce
>>>> request
>>>>>>>> would
>>>>>>>>>> have
>>>>>>>>>>>>>> waited
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> extra
>>>>>>>>>>>>>>>>>>>>>> t3000 - t0 time in the purgatory, the
>>> 2nd
>>>>> an
>>>>>>>> extra
>>>>>>>>>>> time
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> t3000 -
>>>>>>>>>>>>>>>>>>> t1,
>>>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>  Roughly speaking, the latency
>>>> difference
>>>>> is
>>>>>>>>> bigger
>>>>>>>>>>> for
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> earlier
>>>>>>>>>>>>>>>>>>>>>> produce requests than for the later
>>> ones.
>>>>> For
>>>>>>>> the
>>>>>>>>>> same
>>>>>>>>>>>>>> reason,
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>> ProduceRequests queued
>>>>>>>>>>>>>>>>>>>>>>  before the LeaderAndISR, the bigger
>>>>> benefit
>>>>>>>> we
>>>>>>>>> get
>>>>>>>>>>>>> (capped
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> produce timeout).
>>>>>>>>>>>>>>>>>>>>>> 2. If the enqueued produce requests
>>> have
>>>>>>>> acks=0 or
>>>>>>>>>>>> acks=1
>>>>>>>>>>>>>>>>>>>>>>  There will be no latency
>> differences
>>> in
>>>>>> this
>>>>>>>>> case,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>  2.1 without this KIP, the records
>> of
>>>>>>>> partition
>>>>>>>>>>> test-0
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> ProduceRequests ahead of the
>>> LeaderAndISR
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>> appended
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>>> log,
>>>>>>>>>>>>>>>>>>>>>>        and eventually be truncated
>>> after
>>>>>>>>> processing
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> LeaderAndISR.
>>>>>>>>>>>>>>>>>>>>>> This is what's referred to as
>>>>>>>>>>>>>>>>>>>>>>        "some unofficial definition
>> of
>>>> data
>>>>>>>> loss
>>>>>>>>> in
>>>>>>>>>>>> terms
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>> beyond the high watermark".
>>>>>>>>>>>>>>>>>>>>>>  2.2 with this KIP, we can mitigate
>>> the
>>>>>> effect
>>>>>>>>>> since
>>>>>>>>>>> if
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> LeaderAndISR
>>>>>>>>>>>>>>>>>>>>>> is immediately processed, the
>> response
>>> to
>>>>>>>>> producers
>>>>>>>>>>> will
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>        the NotLeaderForPartition
>>> error,
>>>>>>>> causing
>>>>>>>>>>>> producers
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> retry
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> This explanation above is the benefit
>>> for
>>>>>>>> reducing
>>>>>>>>>> the
>>>>>>>>>>>>>> latency
>>>>>>>>>>>>>>>>> of a
>>>>>>>>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>>>>>>>> becoming the follower,
>>>>>>>>>>>>>>>>>>>>>> closely related is reducing the
>> latency
>>>> of
>>>>> a
>>>>>>>>> broker
>>>>>>>>>>>>> becoming
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> leader.
>>>>>>>>>>>>>>>>>>>>>> In this case, the benefit is even
>> more
>>>>>>>> obvious, if
>>>>>>>>>>> other
>>>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>> resigned leadership, and the
>>>>>>>>>>>>>>>>>>>>>> current broker should take
>> leadership.
>>>> Any
>>>>>>>> delay
>>>>>>>>> in
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> LeaderAndISR will be perceived
>>>>>>>>>>>>>>>>>>>>>> by clients as unavailability. In
>>> extreme
>>>>>> cases,
>>>>>>>>> this
>>>>>>>>>>> can
>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>>>>>>> produce requests if the retries are
>>>>>>>>>>>>>>>>>>>>>> exhausted.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Another two types of controller
>>> requests
>>>>> are
>>>>>>>>>>>>> UpdateMetadata
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> StopReplica, which I'll briefly
>> discuss
>>>> as
>>>>>>>>> follows:
>>>>>>>>>>>>>>>>>>>>>> For UpdateMetadata requests, delayed
>>>>>> processing
>>>>>>>>>> means
>>>>>>>>>>>>>> clients
>>>>>>>>>>>>>>>>>>> receiving
>>>>>>>>>>>>>>>>>>>>>> stale metadata, e.g. with the wrong
>>>>>> leadership
>>>>>>>>> info
>>>>>>>>>>>>>>>>>>>>>> for certain partitions, and the
>> effect
>>> is
>>>>>> more
>>>>>>>>>> retries
>>>>>>>>>>>> or
>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>> fatal
>>>>>>>>>>>>>>>>>>>>>> failure if the retries are exhausted.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> For StopReplica requests, a long
>>> queuing
>>>>> time
>>>>>>>> may
>>>>>>>>>>>> degrade
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>> of topic deletion.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regarding your last question of the
>>> delay
>>>>> for
>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest,
>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>> are right
>>>>>>>>>>>>>>>>>>>>>> that this KIP cannot help with the
>>>> latency
>>>>> in
>>>>>>>>>> getting
>>>>>>>>>>>> the
>>>>>>>>>>>>>> log
>>>>>>>>>>>>>>>>> dirs
>>>>>>>>>>>>>>>>>>>> info,
>>>>>>>>>>>>>>>>>>>>>> and it's only relevant
>>>>>>>>>>>>>>>>>>>>>> when controller requests are
>> involved.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 5:11 PM, Dong
>>> Lin
>>>> <
>>>>>>>>>>>>>> lindong28@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hey Jun,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks much for the comments. It is
>>> good
>>>>>>>> point.
>>>>>>>>> So
>>>>>>>>>>> the
>>>>>>>>>>>>>>> feature
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> useful for JBOD use-case. I have one
>>>>>> question
>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Do you think this feature is also
>>> useful
>>>>> for
>>>>>>>>>> non-JBOD
>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>> useful for the JBOD setup? It may be
>>>>> useful
>>>>>> to
>>>>>>>>>>>> understand
>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> When the broker is setup using JBOD,
>>> in
>>>>>> order
>>>>>>>> to
>>>>>>>>>> move
>>>>>>>>>>>>>> leaders
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>>>>>>>> disk to other disks, the system
>>> operator
>>>>>> first
>>>>>>>>>> needs
>>>>>>>>>>> to
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> partitions on the failed disk. This
>> is
>>>>>>>> currently
>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>>>>> AdminClient.describeLogDirs(), which
>>>> sends
>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> broker. If we only prioritize the
>>>>> controller
>>>>>>>>>>> requests,
>>>>>>>>>>>>> then
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> DescribeLogDirsRequest
>>>>>>>>>>>>>>>>>>>>>>> may still take a long time to be
>>>> processed
>>>>>> by
>>>>>>>> the
>>>>>>>>>>>> broker.
>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> overall
>>>>>>>>>>>>>>>>>>>>>>> time to move leaders away from the
>>>> failed
>>>>>> disk
>>>>>>>>> may
>>>>>>>>>>>> still
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> this KIP. What do you think?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Dong
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 4:38 PM,
>> Lucas
>>>>> Wang <
>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the insightful comment,
>>>> Jun.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> @Dong,
>>>>>>>>>>>>>>>>>>>>>>>> Since both of the two comments in
>>> your
>>>>>>>> previous
>>>>>>>>>>> email
>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> benefits of this KIP and whether
>>> it's
>>>>>>>> useful,
>>>>>>>>>>>>>>>>>>>>>>>> in light of Jun's last comment, do
>>> you
>>>>>> agree
>>>>>>>>> that
>>>>>>>>>>>> this
>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> beneficial in the case mentioned
>> by
>>>> Jun?
>>>>>>>>>>>>>>>>>>>>>>>> Please let me know, thanks!
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 3, 2018 at 2:07 PM,
>> Jun
>>>> Rao
>>>>> <
>>>>>>>>>>>>>> jun@confluent.io>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lucas, Dong,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> If all disks on a broker are
>> slow,
>>>> one
>>>>>>>>> probably
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> kill
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> broker. In that case, this KIP
>> may
>>>> not
>>>>>>>> help.
>>>>>>>>> If
>>>>>>>>>>>> only
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> disks
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> broker is slow, one may want to
>>> fail
>>>>>> that
>>>>>>>>> disk
>>>>>>>>>>> and
>>>>>>>>>>>>> move
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> leaders
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> disk to other brokers. In that
>>> case,
>>>>>> being
>>>>>>>>> able
>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsr
>>>>>>>>>>>>>>>>>>>>>>>>> requests faster will potentially
>>>> help
>>>>>> the
>>>>>>>>>>> producers
>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>>>>>> quicker.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jun
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 7:56 PM,
>>> Dong
>>>>>> Lin <
>>>>>>>>>>>>>>>>> lindong28@gmail.com
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Some
>>> follow
>>>> up
>>>>>>>>>> questions
>>>>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 1, if each
>>>> ProduceRequest
>>>>>>>> covers
>>>>>>>>> 20
>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>> randomly
>>>>>>>>>>>>>>>>>>>>>>>>>> distributed across all
>>> partitions,
>>>>>> then
>>>>>>>>> each
>>>>>>>>>>>>>>>>> ProduceRequest
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> likely
>>>>>>>>>>>>>>>>>>>>>>>>>> cover some partitions for
>> which
>>>> the
>>>>>>>> broker
>>>>>>>>> is
>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>> leader
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> quickly
>>>>>>>>>>>>>>>>>>>>>>>>>> processes the
>>>>>>>>>>>>>>>>>>>>>>>>>> LeaderAndIsrRequest. Then
>> broker
>>>>> will
>>>>>>>> still
>>>>>>>>>> be
>>>>>>>>>>>> slow
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>> ProduceRequest and request
>> will
>>>>> still
>>>>>> be
>>>>>>>>> very
>>>>>>>>>>>> high
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> KIP.
>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>>>>>>>>> that most ProduceRequest will
>>>> still
>>>>>>>> timeout
>>>>>>>>>>> after
>>>>>>>>>>>>> 30
>>>>>>>>>>>>>>>>>> seconds.
>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>> understanding correct?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding 2, if most
>>>> ProduceRequest
>>>>>> will
>>>>>>>>>> still
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> 30
>>>>>>>>>>>>>>>>>>>>>>>> seconds,
>>>>>>>>>>>>>>>>>>>>>>>>>> then it is less clear how this
>>> KIP
>>>>>>>> reduces
>>>>>>>>>>>> average
>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>>>> latency.
>>>>>>>>>>>>>>>>>>>>>>> Can
>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>> clarify what metrics can be
>>>> improved
>>>>>> by
>>>>>>>>> this
>>>>>>>>>>> KIP?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Not sure why system operator
>>>>> directly
>>>>>>>> cares
>>>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> truncated
>>>>>>>>>>>>>>>>>>>>>>>> messages.
>>>>>>>>>>>>>>>>>>>>>>>>>> Do you mean this KIP can
>> improve
>>>>>> average
>>>>>>>>>>>> throughput
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>>>>>>> duplication? It will be good
>> to
>>>>>>>> understand
>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Dong
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 Jul 2018 at 7:12 AM
>>>> Lucas
>>>>>>>> Wang <
>>>>>>>>>>>>>>>>>>> lucasatucla@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dong,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your valuable
>>>> comments.
>>>>>>>> Please
>>>>>>>>>> see
>>>>>>>>>>>> my
>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The Google doc showed
>> only
>>> 1
>>>>>>>>> partition.
>>>>>>>>>>> Now
>>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>>>>>>>>>>> where broker0 is the leader
>> of
>>>>> many
>>>>>>>>>>> partitions.
>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>> reason its IO becomes slow.
>>>>>>>>>>>>>>>>>>>>>>>>>>> The number of leader
>>> partitions
>>>> on
>>>>>>>>> broker0
>>>>>>>>>> is
>>>>>>>>>>>> so
>>>>>>>>>>>>>>> large,
>>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>>> 10K,
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster is skewed,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and the operator would like
>> to
>>>>> shift
>>>>>>>> the
>>>>>>>>>>>>> leadership
>>>>>>>>>>>>>>>>> for a
>>>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>> partitions, say 9K, to other
>>>>>> brokers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> either manually or through
>>> some
>>>>>>>> service
>>>>>>>>>> like
>>>>>>>>>>>>> cruise
>>>>>>>>>>>>>>>>>> control.
>>>>>>>>>>>>>>>>>>>>>>>>>>> With this KIP, not only will
>>> the
>>>>>>>>> leadership
>>>>>>>>>>>>>>> transitions
>>>>>>>>>>>>>>>>>>> finish
>>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>> quickly, helping the cluster
>>>>> itself
>>>>>>>>>> becoming
>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>> balanced,
>>>>>>>>>>>>>>>>>>>>>>>>>>> but all existing producers
>>>>>>>> corresponding
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> 9K
>>>>>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> errors relatively quickly
>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than relying on their
>>>>>> timeout,
>>>>>>>>>> thanks
>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> batched
>>>>>>>>>>>>>>>>>>>>> async
>>>>>>>>>>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>>>>>>> operations.
>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it's a useful feature
>> to
>>>>> have
>>>>>>>>> during
>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>> troublesome
>>>>>>>>>>>>>>>>>>>>> times.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. The experiments in the
>>> Google
>>>>> Doc
>>>>>>>> have
>>>>>>>>>>> shown
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>>>>>> producers
>>>>>>>>>>>>>>>>>>>>>>>>>>> receive an explicit error
>>>>>>>>>>>> NotLeaderForPartition,
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>>>> retry
>>>>>>>>>>>>>>>>>>>>>>>>>>> immediately.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore the latency (~14
>>>>>>>> seconds+quick
>>>>>>>>>>> retry)
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>> much smaller
>>>>>>>>>>>>>>>>>>>>>>>>>>> compared with the case of
>>> timing
>>>>> out
>>>>>>>>>> without
>>>>>>>>>>>> the
>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>> (30
>>>>>>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> timing
>>>>>>>>>>>>>>>>>>>>>>>>>>> out + quick retry).
>>>>>>>>>>>>>>>>>>>>>>>>>>> One might argue that
>> reducing
>>>> the
>>>>>>>> timing
>>>>>>>>>> out
>>>>>>>>>>> on
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>> achieve the same result,
>>>>>>>>>>>>>>>>>>>>>>>>>>> yet reducing the timeout has
>>> its
>>>>> own
>>>>>>>>>>>>> drawbacks[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Also *IF* there were a
>> metric
>>> to
>>>>>> show
>>>>>>>> the
>>>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> truncated
>>>>>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>> brokers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> with the experiments done in
>>> the
>>>>>>>> Google
>>>>>>>>>> Doc,
>>>>>>>>>>> it
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> easy
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> a lot fewer messages need
>>>>>>>>>>>>>>>>>>>>>>>>>>> to be truncated on broker0
>>> since
>>>>> the
>>>>>>>>>>> up-to-date
>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>> avoids
>>>>>>>>>>>>>>>>>>>>>>>>> appending
>>>>>>>>>>>>>>>>>>>>>>>>>>> of messages
>>>>>>>>>>>>>>>>>>>>>>>>>>> in subsequent PRODUCE
>>> requests.
>>>> If
>>>>>> we
>>>>>>>>> talk
>>>>>>>>>>> to a
>>>>>>>>>>>>>>> system
>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> ask
>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>>>>>>>>>> they prefer fewer wasteful
>>> IOs,
>>>> I
>>>>>> bet
>>>>>>>>> most
>>>>>>>>>>>> likely
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> answer
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> yes.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. To answer your question,
>> I
>>>>> think
>>>>>> it
>>>>>>>>>> might
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> helpful to
>>>>>>>>>>>>>>>>>>>>>>> construct
>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>> formulas.
>>>>>>>>>>>>>>>>>>>>>>>>>>> To simplify the modeling,
>> I'm
>>>>> going
>>>>>>>> back
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>> ONE partition involved.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the experiments in
>>> the
>>>>>>>> Google
>>>>>>>>>> Doc,
>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>> broker0
>>>>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> follower at time t0,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and after t0 there were
>> still
>>> N
>>>>>>>> produce
>>>>>>>>>>>> requests
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>>> queue.
>>>>>>>>>>>>>>>>>>>>>>>>>>> With the up-to-date metadata
>>>>> brought
>>>>>>>> by
>>>>>>>>>> this
>>>>>>>>>>>> KIP,
>>>>>>>>>>>>>>>>> broker0
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>> NotLeaderForPartition
>>> exception,
>>>>>>>>>>>>>>>>>>>>>>>>>>> let's use M1 to denote the
>>>> average
>>>>>>>>>> processing
>>>>>>>>>>>>> time
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> replying
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>> error message.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Without this KIP, the broker
>>>> will
>>>>>>>> need to
>>>>>>>>>>>> append
>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> segments,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which may trigger a flush to
>>>> disk,
>>>>>>>>>>>>>>>>>>>>>>>>>>> let's use M2 to denote the
>>>> average
>>>>>>>>>> processing
>>>>>>>>>>>>> time
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>> logic.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Then the average extra
>> latency
>>>>>>>> incurred
>>>>>>>>>>> without
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> N
>>>>>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>>>>>>>>> (M2 -
>>>>>>>>>>>>>>>>>>>>>>>>>> M1) /
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> In practice, M2 should
>> always
>>> be
>>>>>>>> larger
>>>>>>>>>> than
>>>>>>>>>>>> M1,
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>>>>>>>>>>> as N
>>>>>>>>>>>>>>>>>>>>>>>>>>> is positive,
>>>>>>>>>>>>>>>>>>>>>>>>>>> we would see improvements on
>>> the
>>>>>>>> average
>>>>>>>>>>>> latency.
>>>>>>>>>>>>>>>>>>>>>>>>>>> There does not need to be
>>>>>> significant
>>>>>>>>>> backlog
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>>>>>> queue,
>>>>>>>>>>>>>>>>>>>>>>>>>>> or severe degradation of
>> disk
>>>>>>>> performance
>>>>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> improvement.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Lucas
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] For instance, reducing
>> the
>>>>>>>> timeout on
>>>>>>>>>> the
>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary duplicate
>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>>> when the corresponding
>> leader
>>>>> broker
>>>>>>>> is
>>>>>>>>>>>>> overloaded,
>>>>>>>>>>>>>>>>>>>> exacerbating
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Jul 1, 2018 at 9:18
>>> PM,
>>>>> Dong
>>>>>>>> Lin
>>>>>>>>> <
>>>>>>>>>>>>>>>>>>> lindong28@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Lucas,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks much for the
>> detailed
>>>>>>>>>> documentation
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> experiment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially I also think
>>> having
>>>> a
>>>>>>>>> separate
>>>>>>>>>>>> queue
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful because, as you
>>>> mentioned
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>> summary
>>>>>>>>>>>>>>>>> section
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> Google
>>>>>>>>>>>>>>>>>>>>>>>>>>> doc,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> controller requests are
>>>>> generally
>>>>>>>> more
>>>>>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Regards,
>>>> Mayuresh R. Gharat
>>>> (862) 250-7125
>>>> 
>>> 
>>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.

Hmm, since we already use controller epoch and leader epoch for properly
caching the latest partition state, do we really need correlation id for
ordering the controller requests?

Thanks,

Jun

On Fri, Jul 20, 2018 at 2:18 PM, Becket Qin <be...@gmail.com> wrote:

> Lucas and Mayuresh,
>
> Good idea. The correlation id should work.
>
> In the ControllerChannelManager, a request will be resent until a response
> is received. So if the controller to broker connection disconnects after
> controller sends R1_a, but before the response of R1_a is received, a
> disconnection may cause the controller to resend R1_b. i.e. until R1 is
> acked, R2 won't be sent by the controller.
> This gives two guarantees:
> 1. Correlation id wise: R1_a < R1_b < R2.
> 2. On the broker side, when R2 is seen, R1 must have been processed at
> least once.
>
> So on the broker side, with a single thread controller request handler, the
> logic should be:
> 1. Process what ever request seen in the controller request queue
> 2. For the given epoch, drop request if its correlation id is smaller than
> that of the last processed request.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:
>
> > I agree that there is no strong ordering when there are more than one
> > socket connections. Currently, we rely on controllerEpoch and leaderEpoch
> > to ensure that the receiving broker picks up the latest state for each
> > partition.
> >
> > One potential issue with the dequeue approach is that if the queue is
> full,
> > there is no guarantee that the controller requests will be enqueued
> > quickly.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com
> > > wrote:
> >
> > > Yea, the correlationId is only set to 0 in the NetworkClient
> constructor.
> > > Since we reuse the same NetworkClient between Controller and the
> broker,
> > a
> > > disconnection should not cause it to reset to 0, in which case it can
> be
> > > used to reject obsolete requests.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > @Dong,
> > > > Great example and explanation, thanks!
> > > >
> > > > @All
> > > > Regarding the example given by Dong, it seems even if we use a queue,
> > > and a
> > > > dedicated controller request handling thread,
> > > > the same result can still happen because R1_a will be sent on one
> > > > connection, and R1_b & R2 will be sent on a different connection,
> > > > and there is no ordering between different connections on the broker
> > > side.
> > > > I was discussing with Mayuresh offline, and it seems correlation id
> > > within
> > > > the same NetworkClient object is monotonically increasing and never
> > > reset,
> > > > hence a broker can leverage that to properly reject obsolete
> requests.
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com> wrote:
> > > >
> > > > > Actually nvm, correlationId is reset in case of connection loss, I
> > > think.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I agree with Dong that out-of-order processing can happen with
> > > having 2
> > > > > > separate queues as well and it can even happen today.
> > > > > > Can we use the correlationId in the request from the controller
> to
> > > the
> > > > > > broker to handle ordering ?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Good point, Joel. I agree that a dedicated controller request
> > > handling
> > > > > >> thread would be a better isolation. It also solves the
> reordering
> > > > issue.
> > > > > >>
> > > > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <
> jjkoshy.w@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > Good example. I think this scenario can occur in the current
> > code
> > > as
> > > > > >> well
> > > > > >> > but with even lower probability given that there are other
> > > > > >> non-controller
> > > > > >> > requests interleaved. It is still sketchy though and I think a
> > > safer
> > > > > >> > approach would be separate queues and pinning controller
> request
> > > > > >> handling
> > > > > >> > to one handler thread.
> > > > > >> >
> > > > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Hey Becket,
> > > > > >> > >
> > > > > >> > > I think you are right that there may be out-of-order
> > processing.
> > > > > >> However,
> > > > > >> > > it seems that out-of-order processing may also happen even
> if
> > we
> > > > > use a
> > > > > >> > > separate queue.
> > > > > >> > >
> > > > > >> > > Here is the example:
> > > > > >> > >
> > > > > >> > > - Controller sends R1 and got disconnected before receiving
> > > > > response.
> > > > > >> > Then
> > > > > >> > > it reconnects and sends R2. Both requests now stay in the
> > > > controller
> > > > > >> > > request queue in the order they are sent.
> > > > > >> > > - thread1 takes R1_a from the request queue and then thread2
> > > takes
> > > > > R2
> > > > > >> > from
> > > > > >> > > the request queue almost at the same time.
> > > > > >> > > - So R1_a and R2 are processed in parallel. There is chance
> > that
> > > > > R2's
> > > > > >> > > processing is completed before R1.
> > > > > >> > >
> > > > > >> > > If out-of-order processing can happen for both approaches
> with
> > > > very
> > > > > >> low
> > > > > >> > > probability, it may not be worthwhile to add the extra
> queue.
> > > What
> > > > > do
> > > > > >> you
> > > > > >> > > think?
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Dong
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Mayuresh/Joel,
> > > > > >> > > >
> > > > > >> > > > Using the request channel as a dequeue was bright up some
> > time
> > > > ago
> > > > > >> when
> > > > > >> > > we
> > > > > >> > > > initially thinking of prioritizing the request. The
> concern
> > > was
> > > > > that
> > > > > >> > the
> > > > > >> > > > controller requests are supposed to be processed in order.
> > If
> > > we
> > > > > can
> > > > > >> > > ensure
> > > > > >> > > > that there is one controller request in the request
> channel,
> > > the
> > > > > >> order
> > > > > >> > is
> > > > > >> > > > not a concern. But in cases that there are more than one
> > > > > controller
> > > > > >> > > request
> > > > > >> > > > inserted into the queue, the controller request order may
> > > change
> > > > > and
> > > > > >> > > cause
> > > > > >> > > > problem. For example, think about the following sequence:
> > > > > >> > > > 1. Controller successfully sent a request R1 to broker
> > > > > >> > > > 2. Broker receives R1 and put the request to the head of
> the
> > > > > request
> > > > > >> > > queue.
> > > > > >> > > > 3. Controller to broker connection failed and the
> controller
> > > > > >> > reconnected
> > > > > >> > > to
> > > > > >> > > > the broker.
> > > > > >> > > > 4. Controller sends a request R2 to the broker
> > > > > >> > > > 5. Broker receives R2 and add it to the head of the
> request
> > > > queue.
> > > > > >> > > > Now on the broker side, R2 will be processed before R1 is
> > > > > processed,
> > > > > >> > > which
> > > > > >> > > > may cause problem.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > >
> > > > > >> > > > Jiangjie (Becket) Qin
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > > jjkoshy.w@gmail.com>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler
> > > less
> > > > > >> > invasive
> > > > > >> > > > > alternative and it should work. Jun/Becket/others, do
> you
> > > see
> > > > > any
> > > > > >> > > > pitfalls
> > > > > >> > > > > with this approach?
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > @Mayuresh,
> > > > > >> > > > > > That's a very interesting idea that I haven't thought
> > > > before.
> > > > > >> > > > > > It seems to solve our problem at hand pretty well, and
> > > also
> > > > > >> > > > > > avoids the need to have a new size metric and capacity
> > > > config
> > > > > >> > > > > > for the controller request queue. In fact, if we were
> to
> > > > adopt
> > > > > >> > > > > > this design, there is no public interface change, and
> we
> > > > > >> > > > > > probably don't need a KIP.
> > > > > >> > > > > > Also implementation wise, it seems
> > > > > >> > > > > > the java class LinkedBlockingQueue can readily satisfy
> > the
> > > > > >> > > requirement
> > > > > >> > > > > > by supporting a capacity, and also allowing inserting
> at
> > > > both
> > > > > >> ends.
> > > > > >> > > > > >
> > > > > >> > > > > > My only concern is that this design is tied to the
> > > > coincidence
> > > > > >> that
> > > > > >> > > > > > we have two request priorities and there are two ends
> > to a
> > > > > >> deque.
> > > > > >> > > > > > Hence by using the proposed design, it seems the
> network
> > > > layer
> > > > > >> is
> > > > > >> > > > > > more tightly coupled with upper layer logic, e.g. if
> we
> > > were
> > > > > to
> > > > > >> add
> > > > > >> > > > > > an extra priority level in the future for some reason,
> > we
> > > > > would
> > > > > >> > > > probably
> > > > > >> > > > > > need to go back to the design of separate queues, one
> > for
> > > > each
> > > > > >> > > priority
> > > > > >> > > > > > level.
> > > > > >> > > > > >
> > > > > >> > > > > > In summary, I'm ok with both designs and lean toward
> > your
> > > > > >> suggested
> > > > > >> > > > > > approach.
> > > > > >> > > > > > Let's hear what others think.
> > > > > >> > > > > >
> > > > > >> > > > > > @Becket,
> > > > > >> > > > > > In light of Mayuresh's suggested new design, I'm
> > answering
> > > > > your
> > > > > >> > > > question
> > > > > >> > > > > > only in the context
> > > > > >> > > > > > of the current KIP design: I think your suggestion
> makes
> > > > > sense,
> > > > > >> and
> > > > > >> > > I'm
> > > > > >> > > > > ok
> > > > > >> > > > > > with removing the capacity config and
> > > > > >> > > > > > just relying on the default value of 20 being
> sufficient
> > > > > enough.
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks,
> > > > > >> > > > > > Lucas
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > >> > > > > > gharatmayuresh15@gmail.com
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi Lucas,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Seems like the main intent here is to prioritize the
> > > > > >> controller
> > > > > >> > > > request
> > > > > >> > > > > > > over any other requests.
> > > > > >> > > > > > > In that case, we can change the request queue to a
> > > > dequeue,
> > > > > >> where
> > > > > >> > > you
> > > > > >> > > > > > > always insert the normal requests (produce,
> > > consume,..etc)
> > > > > to
> > > > > >> the
> > > > > >> > > end
> > > > > >> > > > > of
> > > > > >> > > > > > > the dequeue, but if its a controller request, you
> > insert
> > > > it
> > > > > to
> > > > > >> > the
> > > > > >> > > > head
> > > > > >> > > > > > of
> > > > > >> > > > > > > the queue. This ensures that the controller request
> > will
> > > > be
> > > > > >> given
> > > > > >> > > > > higher
> > > > > >> > > > > > > priority over other requests.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Also since we only read one request from the socket
> > and
> > > > mute
> > > > > >> it
> > > > > >> > and
> > > > > >> > > > > only
> > > > > >> > > > > > > unmute it after handling the request, this would
> > ensure
> > > > that
> > > > > >> we
> > > > > >> > > don't
> > > > > >> > > > > > > handle controller requests out of order.
> > > > > >> > > > > > >
> > > > > >> > > > > > > With this approach we can avoid the second queue and
> > the
> > > > > >> > additional
> > > > > >> > > > > > config
> > > > > >> > > > > > > for the size of the queue.
> > > > > >> > > > > > >
> > > > > >> > > > > > > What do you think ?
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Mayuresh
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > > >> becket.qin@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hey Joel,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thank for the detail explanation. I agree the
> > current
> > > > > design
> > > > > >> > > makes
> > > > > >> > > > > > sense.
> > > > > >> > > > > > > > My confusion is about whether the new config for
> the
> > > > > >> controller
> > > > > >> > > > queue
> > > > > >> > > > > > > > capacity is necessary. I cannot think of a case in
> > > which
> > > > > >> users
> > > > > >> > > > would
> > > > > >> > > > > > > change
> > > > > >> > > > > > > > it.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > > >> > > becket.qin@gmail.com>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I guess my question can be rephrased to "do we
> > > expect
> > > > > >> user to
> > > > > >> > > > ever
> > > > > >> > > > > > > change
> > > > > >> > > > > > > > > the controller request queue capacity"? If we
> > agree
> > > > that
> > > > > >> 20
> > > > > >> > is
> > > > > >> > > > > > already
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > > very generous default number and we do not
> expect
> > > user
> > > > > to
> > > > > >> > > change
> > > > > >> > > > > it,
> > > > > >> > > > > > is
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > still necessary to expose this as a config?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > >> > > > lucasatucla@gmail.com
> > > > > >> > > > > >
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> @Becket
> > > > > >> > > > > > > > >> 1. Thanks for the comment. You are right that
> > > > normally
> > > > > >> there
> > > > > >> > > > > should
> > > > > >> > > > > > be
> > > > > >> > > > > > > > >> just
> > > > > >> > > > > > > > >> one controller request because of muting,
> > > > > >> > > > > > > > >> and I had NOT intended to say there would be
> many
> > > > > >> enqueued
> > > > > >> > > > > > controller
> > > > > >> > > > > > > > >> requests.
> > > > > >> > > > > > > > >> I went through the KIP again, and I'm not sure
> > > which
> > > > > part
> > > > > >> > > > conveys
> > > > > >> > > > > > that
> > > > > >> > > > > > > > >> info.
> > > > > >> > > > > > > > >> I'd be happy to revise if you point it out the
> > > > section.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> 2. Though it should not happen in normal
> > > conditions,
> > > > > the
> > > > > >> > > current
> > > > > >> > > > > > > design
> > > > > >> > > > > > > > >> does not preclude multiple controllers running
> > > > > >> > > > > > > > >> at the same time, hence if we don't have the
> > > > controller
> > > > > >> > queue
> > > > > >> > > > > > capacity
> > > > > >> > > > > > > > >> config and simply make its capacity to be 1,
> > > > > >> > > > > > > > >> network threads handling requests from
> different
> > > > > >> controllers
> > > > > >> > > > will
> > > > > >> > > > > be
> > > > > >> > > > > > > > >> blocked during those troublesome times,
> > > > > >> > > > > > > > >> which is probably not what we want. On the
> other
> > > > hand,
> > > > > >> > adding
> > > > > >> > > > the
> > > > > >> > > > > > > extra
> > > > > >> > > > > > > > >> config with a default value, say 20, guards us
> > from
> > > > > >> issues
> > > > > >> > in
> > > > > >> > > > > those
> > > > > >> > > > > > > > >> troublesome times, and IMO there isn't much
> > > downside
> > > > of
> > > > > >> > adding
> > > > > >> > > > the
> > > > > >> > > > > > > extra
> > > > > >> > > > > > > > >> config.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> @Mayuresh
> > > > > >> > > > > > > > >> Good catch, this sentence is an obsolete
> > statement
> > > > > based
> > > > > >> on
> > > > > >> > a
> > > > > >> > > > > > previous
> > > > > >> > > > > > > > >> design. I've revised the wording in the KIP.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Thanks,
> > > > > >> > > > > > > > >> Lucas
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> > Gharat <
> > > > > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> > Hi Lucas,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thanks for the KIP.
> > > > > >> > > > > > > > >> > I am trying to understand why you think "The
> > > memory
> > > > > >> > > > consumption
> > > > > >> > > > > > can
> > > > > >> > > > > > > > rise
> > > > > >> > > > > > > > >> > given the total number of queued requests can
> > go
> > > up
> > > > > to
> > > > > >> 2x"
> > > > > >> > > in
> > > > > >> > > > > the
> > > > > >> > > > > > > > impact
> > > > > >> > > > > > > > >> > section. Normally the requests from
> controller
> > > to a
> > > > > >> Broker
> > > > > >> > > are
> > > > > >> > > > > not
> > > > > >> > > > > > > > high
> > > > > >> > > > > > > > >> > volume, right ?
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thanks,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Mayuresh
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > >> > > > > becket.qin@gmail.com>
> > > > > >> > > > > > > > >> wrote:
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the
> > > control
> > > > > >> plane
> > > > > >> > > from
> > > > > >> > > > > the
> > > > > >> > > > > > > > data
> > > > > >> > > > > > > > >> > plane
> > > > > >> > > > > > > > >> > > makes a lot of sense.
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > In the KIP you mentioned that the
> controller
> > > > > request
> > > > > >> > queue
> > > > > >> > > > may
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > >> many
> > > > > >> > > > > > > > >> > > requests in it. Will this be a common case?
> > The
> > > > > >> > controller
> > > > > >> > > > > > > requests
> > > > > >> > > > > > > > >> still
> > > > > >> > > > > > > > >> > > goes through the SocketServer. The
> > SocketServer
> > > > > will
> > > > > >> > mute
> > > > > >> > > > the
> > > > > >> > > > > > > > channel
> > > > > >> > > > > > > > >> > once
> > > > > >> > > > > > > > >> > > a request is read and put into the request
> > > > channel.
> > > > > >> So
> > > > > >> > > > > assuming
> > > > > >> > > > > > > > there
> > > > > >> > > > > > > > >> is
> > > > > >> > > > > > > > >> > > only one connection between controller and
> > each
> > > > > >> broker,
> > > > > >> > on
> > > > > >> > > > the
> > > > > >> > > > > > > > broker
> > > > > >> > > > > > > > >> > side,
> > > > > >> > > > > > > > >> > > there should be only one controller request
> > in
> > > > the
> > > > > >> > > > controller
> > > > > >> > > > > > > > request
> > > > > >> > > > > > > > >> > queue
> > > > > >> > > > > > > > >> > > at any given time. If that is the case, do
> we
> > > > need
> > > > > a
> > > > > >> > > > separate
> > > > > >> > > > > > > > >> controller
> > > > > >> > > > > > > > >> > > request queue capacity config? The default
> > > value
> > > > 20
> > > > > >> > means
> > > > > >> > > > that
> > > > > >> > > > > > we
> > > > > >> > > > > > > > >> expect
> > > > > >> > > > > > > > >> > > there are 20 controller switches to happen
> > in a
> > > > > short
> > > > > >> > > period
> > > > > >> > > > > of
> > > > > >> > > > > > > > time.
> > > > > >> > > > > > > > >> I
> > > > > >> > > > > > > > >> > am
> > > > > >> > > > > > > > >> > > not sure whether someone should increase
> the
> > > > > >> controller
> > > > > >> > > > > request
> > > > > >> > > > > > > > queue
> > > > > >> > > > > > > > >> > > capacity to handle such case, as it seems
> > > > > indicating
> > > > > >> > > > something
> > > > > >> > > > > > > very
> > > > > >> > > > > > > > >> wrong
> > > > > >> > > > > > > > >> > > has happened.
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > Thanks,
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > >> > > > > lindong28@gmail.com>
> > > > > >> > > > > > > > >> wrote:
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > Thanks for the update Lucas.
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > I think the motivation section is
> > intuitive.
> > > It
> > > > > >> will
> > > > > >> > be
> > > > > >> > > > good
> > > > > >> > > > > > to
> > > > > >> > > > > > > > >> learn
> > > > > >> > > > > > > > >> > > more
> > > > > >> > > > > > > > >> > > > about the comments from other reviewers.
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> > Wang <
> > > > > >> > > > > > > > lucasatucla@gmail.com>
> > > > > >> > > > > > > > >> > > wrote:
> > > > > >> > > > > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > I've updated the motivation section of
> > the
> > > > KIP
> > > > > by
> > > > > >> > > > > explaining
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > cases
> > > > > >> > > > > > > > >> > > > that
> > > > > >> > > > > > > > >> > > > > would have user impacts.
> > > > > >> > > > > > > > >> > > > > Please take a look at let me know your
> > > > > comments.
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > Thanks,
> > > > > >> > > > > > > > >> > > > > Lucas
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> > Wang
> > > <
> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > > > wrote:
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is
> > > merely
> > > > > >> for me
> > > > > >> > > to
> > > > > >> > > > > > easily
> > > > > >> > > > > > > > >> > > construct
> > > > > >> > > > > > > > >> > > > a
> > > > > >> > > > > > > > >> > > > > > testing scenario
> > > > > >> > > > > > > > >> > > > > > with a backlog of produce requests.
> In
> > > > > >> production,
> > > > > >> > > > other
> > > > > >> > > > > > > than
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > disk
> > > > > >> > > > > > > > >> > > > > > being slow, a backlog of
> > > > > >> > > > > > > > >> > > > > > produce requests may also be caused
> by
> > > high
> > > > > >> > produce
> > > > > >> > > > QPS.
> > > > > >> > > > > > > > >> > > > > > In that case, we may not want to kill
> > the
> > > > > >> broker
> > > > > >> > and
> > > > > >> > > > > > that's
> > > > > >> > > > > > > > when
> > > > > >> > > > > > > > >> > this
> > > > > >> > > > > > > > >> > > > KIP
> > > > > >> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > >> > > > > > > > >> > > > > > and non-JBOD setup.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Going back to your previous question
> > > about
> > > > > each
> > > > > >> > > > > > > ProduceRequest
> > > > > >> > > > > > > > >> > > covering
> > > > > >> > > > > > > > >> > > > > 20
> > > > > >> > > > > > > > >> > > > > > partitions that are randomly
> > > > > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr
> > > > request
> > > > > >> is
> > > > > >> > > > > enqueued
> > > > > >> > > > > > > that
> > > > > >> > > > > > > > >> > tries
> > > > > >> > > > > > > > >> > > to
> > > > > >> > > > > > > > >> > > > > > switch the current broker, say
> broker0,
> > > > from
> > > > > >> > leader
> > > > > >> > > to
> > > > > >> > > > > > > > follower
> > > > > >> > > > > > > > >> > > > > > *for one of the partitions*, say
> > > *test-0*.
> > > > > For
> > > > > >> the
> > > > > >> > > > sake
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> > argument,
> > > > > >> > > > > > > > >> > > > > > let's also assume the other brokers,
> > say
> > > > > >> broker1,
> > > > > >> > > have
> > > > > >> > > > > > > > *stopped*
> > > > > >> > > > > > > > >> > > > fetching
> > > > > >> > > > > > > > >> > > > > > from
> > > > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests
> > have
> > > > > acks =
> > > > > >> > -1
> > > > > >> > > > > (ALL)
> > > > > >> > > > > > > > >> > > > > >   1.1 without this KIP, the
> > > ProduceRequests
> > > > > >> ahead
> > > > > >> > of
> > > > > >> > > > > > > > >> LeaderAndISR
> > > > > >> > > > > > > > >> > > will
> > > > > >> > > > > > > > >> > > > be
> > > > > >> > > > > > > > >> > > > > > put into the purgatory,
> > > > > >> > > > > > > > >> > > > > >         and since they'll never be
> > > > replicated
> > > > > >> to
> > > > > >> > > other
> > > > > >> > > > > > > brokers
> > > > > >> > > > > > > > >> > > (because
> > > > > >> > > > > > > > >> > > > > of
> > > > > >> > > > > > > > >> > > > > > the assumption made above), they will
> > > > > >> > > > > > > > >> > > > > >         be completed either when the
> > > > > >> LeaderAndISR
> > > > > >> > > > > request
> > > > > >> > > > > > is
> > > > > >> > > > > > > > >> > > processed
> > > > > >> > > > > > > > >> > > > or
> > > > > >> > > > > > > > >> > > > > > when the timeout happens.
> > > > > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
> > > > immediately
> > > > > >> > > > transition
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > partition
> > > > > >> > > > > > > > >> > > > > > test-0 to become a follower,
> > > > > >> > > > > > > > >> > > > > >         after the current broker sees
> > the
> > > > > >> > > replication
> > > > > >> > > > of
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > remaining
> > > > > >> > > > > > > > >> > > > 19
> > > > > >> > > > > > > > >> > > > > > partitions, it can send a response
> > > > indicating
> > > > > >> that
> > > > > >> > > > > > > > >> > > > > >         it's no longer the leader for
> > the
> > > > > >> > "test-0".
> > > > > >> > > > > > > > >> > > > > >   To see the latency difference
> between
> > > 1.1
> > > > > and
> > > > > >> > 1.2,
> > > > > >> > > > > let's
> > > > > >> > > > > > > say
> > > > > >> > > > > > > > >> > there
> > > > > >> > > > > > > > >> > > > are
> > > > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> > > > > LeaderAndISR,
> > > > > >> > and
> > > > > >> > > > > there
> > > > > >> > > > > > > are
> > > > > >> > > > > > > > 8
> > > > > >> > > > > > > > >> io
> > > > > >> > > > > > > > >> > > > > threads,
> > > > > >> > > > > > > > >> > > > > >   so each io thread will process
> > > > > approximately
> > > > > >> > 3000
> > > > > >> > > > > > produce
> > > > > >> > > > > > > > >> > requests.
> > > > > >> > > > > > > > >> > > > Now
> > > > > >> > > > > > > > >> > > > > > let's investigate the io thread that
> > > > finally
> > > > > >> > > processed
> > > > > >> > > > > the
> > > > > >> > > > > > > > >> > > > LeaderAndISR.
> > > > > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if
> we
> > > > model
> > > > > >> the
> > > > > >> > > time
> > > > > >> > > > > when
> > > > > >> > > > > > > > their
> > > > > >> > > > > > > > >> > > > > remaining
> > > > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1,
> > > ...t2999,
> > > > > and
> > > > > >> > the
> > > > > >> > > > > > > > LeaderAndISR
> > > > > >> > > > > > > > >> > > > request
> > > > > >> > > > > > > > >> > > > > is
> > > > > >> > > > > > > > >> > > > > > processed at time t3000.
> > > > > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce
> > > request
> > > > > >> would
> > > > > >> > > have
> > > > > >> > > > > > > waited
> > > > > >> > > > > > > > an
> > > > > >> > > > > > > > >> > > extra
> > > > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the
> > 2nd
> > > > an
> > > > > >> extra
> > > > > >> > > > time
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> t3000 -
> > > > > >> > > > > > > > >> > > t1,
> > > > > >> > > > > > > > >> > > > > etc.
> > > > > >> > > > > > > > >> > > > > >   Roughly speaking, the latency
> > > difference
> > > > is
> > > > > >> > bigger
> > > > > >> > > > for
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > earlier
> > > > > >> > > > > > > > >> > > > > > produce requests than for the later
> > ones.
> > > > For
> > > > > >> the
> > > > > >> > > same
> > > > > >> > > > > > > reason,
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > more
> > > > > >> > > > > > > > >> > > > > > ProduceRequests queued
> > > > > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger
> > > > benefit
> > > > > >> we
> > > > > >> > get
> > > > > >> > > > > > (capped
> > > > > >> > > > > > > > by
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > > produce timeout).
> > > > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests
> > have
> > > > > >> acks=0 or
> > > > > >> > > > > acks=1
> > > > > >> > > > > > > > >> > > > > >   There will be no latency
> differences
> > in
> > > > > this
> > > > > >> > case,
> > > > > >> > > > but
> > > > > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records
> of
> > > > > >> partition
> > > > > >> > > > test-0
> > > > > >> > > > > in
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the
> > LeaderAndISR
> > > > > will
> > > > > >> be
> > > > > >> > > > > appended
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > local
> > > > > >> > > > > > > > >> > > > > log,
> > > > > >> > > > > > > > >> > > > > >         and eventually be truncated
> > after
> > > > > >> > processing
> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > LeaderAndISR.
> > > > > >> > > > > > > > >> > > > > > This is what's referred to as
> > > > > >> > > > > > > > >> > > > > >         "some unofficial definition
> of
> > > data
> > > > > >> loss
> > > > > >> > in
> > > > > >> > > > > terms
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > messages
> > > > > >> > > > > > > > >> > > > > > beyond the high watermark".
> > > > > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate
> > the
> > > > > effect
> > > > > >> > > since
> > > > > >> > > > if
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > > LeaderAndISR
> > > > > >> > > > > > > > >> > > > > > is immediately processed, the
> response
> > to
> > > > > >> > producers
> > > > > >> > > > will
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > >> > > > > >         the NotLeaderForPartition
> > error,
> > > > > >> causing
> > > > > >> > > > > producers
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > >> retry
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > This explanation above is the benefit
> > for
> > > > > >> reducing
> > > > > >> > > the
> > > > > >> > > > > > > latency
> > > > > >> > > > > > > > >> of a
> > > > > >> > > > > > > > >> > > > > broker
> > > > > >> > > > > > > > >> > > > > > becoming the follower,
> > > > > >> > > > > > > > >> > > > > > closely related is reducing the
> latency
> > > of
> > > > a
> > > > > >> > broker
> > > > > >> > > > > > becoming
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > leader.
> > > > > >> > > > > > > > >> > > > > > In this case, the benefit is even
> more
> > > > > >> obvious, if
> > > > > >> > > > other
> > > > > >> > > > > > > > brokers
> > > > > >> > > > > > > > >> > have
> > > > > >> > > > > > > > >> > > > > > resigned leadership, and the
> > > > > >> > > > > > > > >> > > > > > current broker should take
> leadership.
> > > Any
> > > > > >> delay
> > > > > >> > in
> > > > > >> > > > > > > processing
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > >> > > > > > > > >> > > > > > by clients as unavailability. In
> > extreme
> > > > > cases,
> > > > > >> > this
> > > > > >> > > > can
> > > > > >> > > > > > > cause
> > > > > >> > > > > > > > >> > failed
> > > > > >> > > > > > > > >> > > > > > produce requests if the retries are
> > > > > >> > > > > > > > >> > > > > > exhausted.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Another two types of controller
> > requests
> > > > are
> > > > > >> > > > > > UpdateMetadata
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly
> discuss
> > > as
> > > > > >> > follows:
> > > > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> > > > > processing
> > > > > >> > > means
> > > > > >> > > > > > > clients
> > > > > >> > > > > > > > >> > > receiving
> > > > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> > > > > leadership
> > > > > >> > info
> > > > > >> > > > > > > > >> > > > > > for certain partitions, and the
> effect
> > is
> > > > > more
> > > > > >> > > retries
> > > > > >> > > > > or
> > > > > >> > > > > > > even
> > > > > >> > > > > > > > >> > fatal
> > > > > >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long
> > queuing
> > > > time
> > > > > >> may
> > > > > >> > > > > degrade
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > performance
> > > > > >> > > > > > > > >> > > > > > of topic deletion.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Regarding your last question of the
> > delay
> > > > for
> > > > > >> > > > > > > > >> > DescribeLogDirsRequest,
> > > > > >> > > > > > > > >> > > > you
> > > > > >> > > > > > > > >> > > > > > are right
> > > > > >> > > > > > > > >> > > > > > that this KIP cannot help with the
> > > latency
> > > > in
> > > > > >> > > getting
> > > > > >> > > > > the
> > > > > >> > > > > > > log
> > > > > >> > > > > > > > >> dirs
> > > > > >> > > > > > > > >> > > > info,
> > > > > >> > > > > > > > >> > > > > > and it's only relevant
> > > > > >> > > > > > > > >> > > > > > when controller requests are
> involved.
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > Regards,
> > > > > >> > > > > > > > >> > > > > > Lucas
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong
> > Lin
> > > <
> > > > > >> > > > > > > lindong28@gmail.com
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> > > wrote:
> > > > > >> > > > > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> Hey Jun,
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is
> > good
> > > > > >> point.
> > > > > >> > So
> > > > > >> > > > the
> > > > > >> > > > > > > > feature
> > > > > >> > > > > > > > >> may
> > > > > >> > > > > > > > >> > > be
> > > > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> > > > > question
> > > > > >> > > below.
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Do you think this feature is also
> > useful
> > > > for
> > > > > >> > > non-JBOD
> > > > > >> > > > > > setup
> > > > > >> > > > > > > > or
> > > > > >> > > > > > > > >> it
> > > > > >> > > > > > > > >> > is
> > > > > >> > > > > > > > >> > > > > only
> > > > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be
> > > > useful
> > > > > to
> > > > > >> > > > > understand
> > > > > >> > > > > > > > this.
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD,
> > in
> > > > > order
> > > > > >> to
> > > > > >> > > move
> > > > > >> > > > > > > leaders
> > > > > >> > > > > > > > >> on
> > > > > >> > > > > > > > >> > the
> > > > > >> > > > > > > > >> > > > > >> failed
> > > > > >> > > > > > > > >> > > > > >> disk to other disks, the system
> > operator
> > > > > first
> > > > > >> > > needs
> > > > > >> > > > to
> > > > > >> > > > > > get
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > list
> > > > > >> > > > > > > > >> > > > of
> > > > > >> > > > > > > > >> > > > > >> partitions on the failed disk. This
> is
> > > > > >> currently
> > > > > >> > > > > achieved
> > > > > >> > > > > > > > using
> > > > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which
> > > sends
> > > > > >> > > > > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > > > > >> > to
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
> > > > controller
> > > > > >> > > > requests,
> > > > > >> > > > > > then
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > > > > >> > > > > >> may still take a long time to be
> > > processed
> > > > > by
> > > > > >> the
> > > > > >> > > > > broker.
> > > > > >> > > > > > > So
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > overall
> > > > > >> > > > > > > > >> > > > > >> time to move leaders away from the
> > > failed
> > > > > disk
> > > > > >> > may
> > > > > >> > > > > still
> > > > > >> > > > > > be
> > > > > >> > > > > > > > >> long
> > > > > >> > > > > > > > >> > > even
> > > > > >> > > > > > > > >> > > > > with
> > > > > >> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> Thanks,
> > > > > >> > > > > > > > >> > > > > >> Dong
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM,
> Lucas
> > > > Wang <
> > > > > >> > > > > > > > >> lucasatucla@gmail.com
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > > wrote:
> > > > > >> > > > > > > > >> > > > > >>
> > > > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment,
> > > Jun.
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > @Dong,
> > > > > >> > > > > > > > >> > > > > >> > Since both of the two comments in
> > your
> > > > > >> previous
> > > > > >> > > > email
> > > > > >> > > > > > are
> > > > > >> > > > > > > > >> about
> > > > > >> > > > > > > > >> > > the
> > > > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether
> > it's
> > > > > >> useful,
> > > > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do
> > you
> > > > > agree
> > > > > >> > that
> > > > > >> > > > > this
> > > > > >> > > > > > > KIP
> > > > > >> > > > > > > > >> can
> > > > > >> > > > > > > > >> > be
> > > > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned
> by
> > > Jun?
> > > > > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > Regards,
> > > > > >> > > > > > > > >> > > > > >> > Lucas
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM,
> Jun
> > > Rao
> > > > <
> > > > > >> > > > > > > jun@confluent.io>
> > > > > >> > > > > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> >
> > > > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are
> slow,
> > > one
> > > > > >> > probably
> > > > > >> > > > > > should
> > > > > >> > > > > > > > just
> > > > > >> > > > > > > > >> > kill
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP
> may
> > > not
> > > > > >> help.
> > > > > >> > If
> > > > > >> > > > > only
> > > > > >> > > > > > > one
> > > > > >> > > > > > > > of
> > > > > >> > > > > > > > >> > the
> > > > > >> > > > > > > > >> > > > > disks
> > > > > >> > > > > > > > >> > > > > >> on
> > > > > >> > > > > > > > >> > > > > >> > a
> > > > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to
> > fail
> > > > > that
> > > > > >> > disk
> > > > > >> > > > and
> > > > > >> > > > > > move
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >> > > > leaders
> > > > > >> > > > > > > > >> > > > > on
> > > > > >> > > > > > > > >> > > > > >> > that
> > > > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that
> > case,
> > > > > being
> > > > > >> > able
> > > > > >> > > > to
> > > > > >> > > > > > > > process
> > > > > >> > > > > > > > >> the
> > > > > >> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > >> > > > > > > > >> > > > > >> > > requests faster will potentially
> > > help
> > > > > the
> > > > > >> > > > producers
> > > > > >> > > > > > > > recover
> > > > > >> > > > > > > > >> > > > quicker.
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > Jun
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM,
> > Dong
> > > > > Lin <
> > > > > >> > > > > > > > >> lindong28@gmail.com
> > > > > >> > > > > > > > >> > >
> > > > > >> > > > > > > > >> > > > > wrote:
> > > > > >> > > > > > > > >> > > > > >> > >
> > > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some
> > follow
> > > up
> > > > > >> > > questions
> > > > > >> > > > > > below.
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each
> > > ProduceRequest
> > > > > >> covers
> > > > > >> > 20
> > > > > >> > > > > > > > partitions
> > > > > >> > > > > > > > >> > that
> > > > > >> > > > > > > > >> > > > are
> > > > > >> > > > > > > > >> > > > > >> > > randomly
> > > > > >> > > > > > > > >> > > > > >> > > > distributed across all
> > partitions,
> > > > > then
> > > > > >> > each
> > > > > >> > > > > > > > >> ProduceRequest
> > > > > >> > > > > > > > >> > > will
> > > > > >> > > > > > > > >> > > > > >> likely
> > > > > >> > > > > > > > >> > > > > >> > > > cover some partitions for
> which
> > > the
> > > > > >> broker
> > > > > >> > is
> > > > > >> > > > > still
> > > > > >> > > > > > > > >> leader
> > > > > >> > > > > > > > >> > > after
> > > > > >> > > > > > > > >> > > > > it
> > > > > >> > > > > > > > >> > > > > >> > > quickly
> > > > > >> > > > > > > > >> > > > > >> > > > processes the
> > > > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then
> broker
> > > > will
> > > > > >> still
> > > > > >> > > be
> > > > > >> > > > > slow
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > >> > > > processing
> > > > > >> > > > > > > > >> > > > > >> these
> > > > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request
> will
> > > > still
> > > > > be
> > > > > >> > very
> > > > > >> > > > > high
> > > > > >> > > > > > > with
> > > > > >> > > > > > > > >> this
> > > > > >> > > > > > > > >> > > > KIP.
> > > > > >> > > > > > > > >> > > > > It
> > > > > >> > > > > > > > >> > > > > >> > > seems
> > > > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will
> > > still
> > > > > >> timeout
> > > > > >> > > > after
> > > > > >> > > > > > 30
> > > > > >> > > > > > > > >> > seconds.
> > > > > >> > > > > > > > >> > > Is
> > > > > >> > > > > > > > >> > > > > >> this
> > > > > >> > > > > > > > >> > > > > >> > > > understanding correct?
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most
> > > ProduceRequest
> > > > > will
> > > > > >> > > still
> > > > > >> > > > > > > timeout
> > > > > >> > > > > > > > >> after
> > > > > >> > > > > > > > >> > > 30
> > > > > >> > > > > > > > >> > > > > >> > seconds,
> > > > > >> > > > > > > > >> > > > > >> > > > then it is less clear how this
> > KIP
> > > > > >> reduces
> > > > > >> > > > > average
> > > > > >> > > > > > > > >> produce
> > > > > >> > > > > > > > >> > > > > latency.
> > > > > >> > > > > > > > >> > > > > >> Can
> > > > > >> > > > > > > > >> > > > > >> > > you
> > > > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be
> > > improved
> > > > > by
> > > > > >> > this
> > > > > >> > > > KIP?
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
> > > > directly
> > > > > >> cares
> > > > > >> > > > > number
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > > truncated
> > > > > >> > > > > > > > >> > > > > >> > messages.
> > > > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can
> improve
> > > > > average
> > > > > >> > > > > throughput
> > > > > >> > > > > > > or
> > > > > >> > > > > > > > >> > reduce
> > > > > >> > > > > > > > >> > > > > >> message
> > > > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good
> to
> > > > > >> understand
> > > > > >> > > > this.
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > Thanks,
> > > > > >> > > > > > > > >> > > > > >> > > > Dong
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM
> > > Lucas
> > > > > >> Wang <
> > > > > >> > > > > > > > >> > > lucasatucla@gmail.com
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable
> > > comments.
> > > > > >> Please
> > > > > >> > > see
> > > > > >> > > > > my
> > > > > >> > > > > > > > reply
> > > > > >> > > > > > > > >> > > below.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed
> only
> > 1
> > > > > >> > partition.
> > > > > >> > > > Now
> > > > > >> > > > > > > let's
> > > > > >> > > > > > > > >> > > consider
> > > > > >> > > > > > > > >> > > > a
> > > > > >> > > > > > > > >> > > > > >> more
> > > > > >> > > > > > > > >> > > > > >> > > > common
> > > > > >> > > > > > > > >> > > > > >> > > > > scenario
> > > > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader
> of
> > > > many
> > > > > >> > > > partitions.
> > > > > >> > > > > > And
> > > > > >> > > > > > > > >> let's
> > > > > >> > > > > > > > >> > > say
> > > > > >> > > > > > > > >> > > > > for
> > > > > >> > > > > > > > >> > > > > >> > some
> > > > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > > > > >> > > > > >> > > > > The number of leader
> > partitions
> > > on
> > > > > >> > broker0
> > > > > >> > > is
> > > > > >> > > > > so
> > > > > >> > > > > > > > large,
> > > > > >> > > > > > > > >> > say
> > > > > >> > > > > > > > >> > > > 10K,
> > > > > >> > > > > > > > >> > > > > >> that
> > > > > >> > > > > > > > >> > > > > >> > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > > > > >> > > > > >> > > > > and the operator would like
> to
> > > > shift
> > > > > >> the
> > > > > >> > > > > > leadership
> > > > > >> > > > > > > > >> for a
> > > > > >> > > > > > > > >> > > lot
> > > > > >> > > > > > > > >> > > > of
> > > > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> > > > > brokers,
> > > > > >> > > > > > > > >> > > > > >> > > > > either manually or through
> > some
> > > > > >> service
> > > > > >> > > like
> > > > > >> > > > > > cruise
> > > > > >> > > > > > > > >> > control.
> > > > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will
> > the
> > > > > >> > leadership
> > > > > >> > > > > > > > transitions
> > > > > >> > > > > > > > >> > > finish
> > > > > >> > > > > > > > >> > > > > >> more
> > > > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster
> > > > itself
> > > > > >> > > becoming
> > > > > >> > > > > more
> > > > > >> > > > > > > > >> > balanced,
> > > > > >> > > > > > > > >> > > > > >> > > > > but all existing producers
> > > > > >> corresponding
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > 9K
> > > > > >> > > > > > > > >> > > partitions
> > > > > >> > > > > > > > >> > > > > will
> > > > > >> > > > > > > > >> > > > > >> > get
> > > > > >> > > > > > > > >> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> > > > > timeout,
> > > > > >> > > thanks
> > > > > >> > > > to
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > batched
> > > > > >> > > > > > > > >> > > > > async
> > > > > >> > > > > > > > >> > > > > >> ZK
> > > > > >> > > > > > > > >> > > > > >> > > > > operations.
> > > > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature
> to
> > > > have
> > > > > >> > during
> > > > > >> > > > such
> > > > > >> > > > > > > > >> > troublesome
> > > > > >> > > > > > > > >> > > > > times.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the
> > Google
> > > > Doc
> > > > > >> have
> > > > > >> > > > shown
> > > > > >> > > > > > > that
> > > > > >> > > > > > > > >> with
> > > > > >> > > > > > > > >> > > this
> > > > > >> > > > > > > > >> > > > > KIP
> > > > > >> > > > > > > > >> > > > > >> > many
> > > > > >> > > > > > > > >> > > > > >> > > > > producers
> > > > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > > >> > > > > NotLeaderForPartition,
> > > > > >> > > > > > > > based
> > > > > >> > > > > > > > >> on
> > > > > >> > > > > > > > >> > > > which
> > > > > >> > > > > > > > >> > > > > >> they
> > > > > >> > > > > > > > >> > > > > >> > > > retry
> > > > > >> > > > > > > > >> > > > > >> > > > > immediately.
> > > > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > > > > >> seconds+quick
> > > > > >> > > > retry)
> > > > > >> > > > > > for
> > > > > >> > > > > > > > >> their
> > > > > >> > > > > > > > >> > > > single
> > > > > >> > > > > > > > >> > > > > >> > > message
> > > > > >> > > > > > > > >> > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> > > > > much smaller
> > > > > >> > > > > > > > >> > > > > >> > > > > compared with the case of
> > timing
> > > > out
> > > > > >> > > without
> > > > > >> > > > > the
> > > > > >> > > > > > > KIP
> > > > > >> > > > > > > > >> (30
> > > > > >> > > > > > > > >> > > > seconds
> > > > > >> > > > > > > > >> > > > > >> for
> > > > > >> > > > > > > > >> > > > > >> > > > timing
> > > > > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > > > > >> > > > > >> > > > > One might argue that
> reducing
> > > the
> > > > > >> timing
> > > > > >> > > out
> > > > > >> > > > on
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > producer
> > > > > >> > > > > > > > >> > > > > side
> > > > > >> > > > > > > > >> > > > > >> can
> > > > > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has
> > its
> > > > own
> > > > > >> > > > > > drawbacks[1].
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a
> metric
> > to
> > > > > show
> > > > > >> the
> > > > > >> > > > > number
> > > > > >> > > > > > of
> > > > > >> > > > > > > > >> > > truncated
> > > > > >> > > > > > > > >> > > > > >> > messages
> > > > > >> > > > > > > > >> > > > > >> > > on
> > > > > >> > > > > > > > >> > > > > >> > > > > brokers,
> > > > > >> > > > > > > > >> > > > > >> > > > > with the experiments done in
> > the
> > > > > >> Google
> > > > > >> > > Doc,
> > > > > >> > > > it
> > > > > >> > > > > > > > should
> > > > > >> > > > > > > > >> be
> > > > > >> > > > > > > > >> > > easy
> > > > > >> > > > > > > > >> > > > > to
> > > > > >> > > > > > > > >> > > > > >> see
> > > > > >> > > > > > > > >> > > > > >> > > > that
> > > > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0
> > since
> > > > the
> > > > > >> > > > up-to-date
> > > > > >> > > > > > > > >> metadata
> > > > > >> > > > > > > > >> > > > avoids
> > > > > >> > > > > > > > >> > > > > >> > > appending
> > > > > >> > > > > > > > >> > > > > >> > > > > of messages
> > > > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE
> > requests.
> > > If
> > > > > we
> > > > > >> > talk
> > > > > >> > > > to a
> > > > > >> > > > > > > > system
> > > > > >> > > > > > > > >> > > > operator
> > > > > >> > > > > > > > >> > > > > >> and
> > > > > >> > > > > > > > >> > > > > >> > ask
> > > > > >> > > > > > > > >> > > > > >> > > > > whether
> > > > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful
> > IOs,
> > > I
> > > > > bet
> > > > > >> > most
> > > > > >> > > > > likely
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > answer
> > > > > >> > > > > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> yes.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question,
> I
> > > > think
> > > > > it
> > > > > >> > > might
> > > > > >> > > > be
> > > > > >> > > > > > > > >> helpful to
> > > > > >> > > > > > > > >> > > > > >> construct
> > > > > >> > > > > > > > >> > > > > >> > > some
> > > > > >> > > > > > > > >> > > > > >> > > > > formulas.
> > > > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling,
> I'm
> > > > going
> > > > > >> back
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > > case
> > > > > >> > > > > > > > >> where
> > > > > >> > > > > > > > >> > > > there
> > > > > >> > > > > > > > >> > > > > >> is
> > > > > >> > > > > > > > >> > > > > >> > > only
> > > > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > > > > >> > > > > >> > > > > Following the experiments in
> > the
> > > > > >> Google
> > > > > >> > > Doc,
> > > > > >> > > > > > let's
> > > > > >> > > > > > > > say
> > > > > >> > > > > > > > >> > > broker0
> > > > > >> > > > > > > > >> > > > > >> > becomes
> > > > > >> > > > > > > > >> > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were
> still
> > N
> > > > > >> produce
> > > > > >> > > > > requests
> > > > > >> > > > > > in
> > > > > >> > > > > > > > its
> > > > > >> > > > > > > > >> > > > request
> > > > > >> > > > > > > > >> > > > > >> > queue.
> > > > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata
> > > > brought
> > > > > >> by
> > > > > >> > > this
> > > > > >> > > > > KIP,
> > > > > >> > > > > > > > >> broker0
> > > > > >> > > > > > > > >> > > can
> > > > > >> > > > > > > > >> > > > > >> reply
> > > > > >> > > > > > > > >> > > > > >> > > with
> > > > > >> > > > > > > > >> > > > > >> > > > an
> > > > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition
> > exception,
> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the
> > > average
> > > > > >> > > processing
> > > > > >> > > > > > time
> > > > > >> > > > > > > of
> > > > > >> > > > > > > > >> > > replying
> > > > > >> > > > > > > > >> > > > > >> with
> > > > > >> > > > > > > > >> > > > > >> > > such
> > > > > >> > > > > > > > >> > > > > >> > > > an
> > > > > >> > > > > > > > >> > > > > >> > > > > error message.
> > > > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker
> > > will
> > > > > >> need to
> > > > > >> > > > > append
> > > > > >> > > > > > > > >> messages
> > > > > >> > > > > > > > >> > to
> > > > > >> > > > > > > > >> > > > > >> > segments,
> > > > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to
> > > disk,
> > > > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the
> > > average
> > > > > >> > > processing
> > > > > >> > > > > > time
> > > > > >> > > > > > > > for
> > > > > >> > > > > > > > >> > such
> > > > > >> > > > > > > > >> > > > > logic.
> > > > > >> > > > > > > > >> > > > > >> > > > > Then the average extra
> latency
> > > > > >> incurred
> > > > > >> > > > without
> > > > > >> > > > > > > this
> > > > > >> > > > > > > > >> KIP
> > > > > >> > > > > > > > >> > is
> > > > > >> > > > > > > > >> > > N
> > > > > >> > > > > > > > >> > > > *
> > > > > >> > > > > > > > >> > > > > >> (M2 -
> > > > > >> > > > > > > > >> > > > > >> > > > M1) /
> > > > > >> > > > > > > > >> > > > > >> > > > > 2.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should
> always
> > be
> > > > > >> larger
> > > > > >> > > than
> > > > > >> > > > > M1,
> > > > > >> > > > > > > > which
> > > > > >> > > > > > > > >> > means
> > > > > >> > > > > > > > >> > > > as
> > > > > >> > > > > > > > >> > > > > >> long
> > > > > >> > > > > > > > >> > > > > >> > > as N
> > > > > >> > > > > > > > >> > > > > >> > > > > is positive,
> > > > > >> > > > > > > > >> > > > > >> > > > > we would see improvements on
> > the
> > > > > >> average
> > > > > >> > > > > latency.
> > > > > >> > > > > > > > >> > > > > >> > > > > There does not need to be
> > > > > significant
> > > > > >> > > backlog
> > > > > >> > > > > of
> > > > > >> > > > > > > > >> requests
> > > > > >> > > > > > > > >> > in
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > > request
> > > > > >> > > > > > > > >> > > > > >> > > > > queue,
> > > > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of
> disk
> > > > > >> performance
> > > > > >> > > to
> > > > > >> > > > > have
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > >> > > > > improvement.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > Regards,
> > > > > >> > > > > > > > >> > > > > >> > > > > Lucas
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing
> the
> > > > > >> timeout on
> > > > > >> > > the
> > > > > >> > > > > > > > producer
> > > > > >> > > > > > > > >> > side
> > > > > >> > > > > > > > >> > > > can
> > > > > >> > > > > > > > >> > > > > >> > trigger
> > > > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate
> requests
> > > > > >> > > > > > > > >> > > > > >> > > > > when the corresponding
> leader
> > > > broker
> > > > > >> is
> > > > > >> > > > > > overloaded,
> > > > > >> > > > > > > > >> > > > exacerbating
> > > > > >> > > > > > > > >> > > > > >> the
> > > > > >> > > > > > > > >> > > > > >> > > > > situation.
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18
> > PM,
> > > > Dong
> > > > > >> Lin
> > > > > >> > <
> > > > > >> > > > > > > > >> > > lindong28@gmail.com
> > > > > >> > > > > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > > >> > > > > > > > >> > > > > >> > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the
> detailed
> > > > > >> > > documentation
> > > > > >> > > > of
> > > > > >> > > > > > the
> > > > > >> > > > > > > > >> > > > experiment.
> > > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think
> > having
> > > a
> > > > > >> > separate
> > > > > >> > > > > queue
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > >> > > > controller
> > > > > >> > > > > > > > >> > > > > >> > > requests
> > > > > >> > > > > > > > >> > > > > >> > > > is
> > > > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you
> > > mentioned
> > > > > in
> > > > > >> the
> > > > > >> > > > > summary
> > > > > >> > > > > > > > >> section
> > > > > >> > > > > > > > >> > of
> > > > > >> > > > > > > > >> > > > the
> > > > > >> > > > > > > > >> > > > > >> > Google
> > > > > >> > > > > > > > >> > > > > >> > > > > doc,
> > > > > >> > > > > > > > >> > > > > >> > > > > > controller requests are
> > > > generally
> > > > > >> more
> > > > > >> > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Lucas and Mayuresh,

Good idea. The correlation id should work.

In the ControllerChannelManager, a request will be resent until a response
is received. So if the controller to broker connection disconnects after
controller sends R1_a, but before the response of R1_a is received, a
disconnection may cause the controller to resend R1_b. i.e. until R1 is
acked, R2 won't be sent by the controller.
This gives two guarantees:
1. Correlation id wise: R1_a < R1_b < R2.
2. On the broker side, when R2 is seen, R1 must have been processed at
least once.

So on the broker side, with a single thread controller request handler, the
logic should be:
1. Process what ever request seen in the controller request queue
2. For the given epoch, drop request if its correlation id is smaller than
that of the last processed request.

Thanks,

Jiangjie (Becket) Qin

On Fri, Jul 20, 2018 at 8:07 AM, Jun Rao <ju...@confluent.io> wrote:

> I agree that there is no strong ordering when there are more than one
> socket connections. Currently, we rely on controllerEpoch and leaderEpoch
> to ensure that the receiving broker picks up the latest state for each
> partition.
>
> One potential issue with the dequeue approach is that if the queue is full,
> there is no guarantee that the controller requests will be enqueued
> quickly.
>
> Thanks,
>
> Jun
>
> On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com
> > wrote:
>
> > Yea, the correlationId is only set to 0 in the NetworkClient constructor.
> > Since we reuse the same NetworkClient between Controller and the broker,
> a
> > disconnection should not cause it to reset to 0, in which case it can be
> > used to reject obsolete requests.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > @Dong,
> > > Great example and explanation, thanks!
> > >
> > > @All
> > > Regarding the example given by Dong, it seems even if we use a queue,
> > and a
> > > dedicated controller request handling thread,
> > > the same result can still happen because R1_a will be sent on one
> > > connection, and R1_b & R2 will be sent on a different connection,
> > > and there is no ordering between different connections on the broker
> > side.
> > > I was discussing with Mayuresh offline, and it seems correlation id
> > within
> > > the same NetworkClient object is monotonically increasing and never
> > reset,
> > > hence a broker can leverage that to properly reject obsolete requests.
> > > Thoughts?
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com> wrote:
> > >
> > > > Actually nvm, correlationId is reset in case of connection loss, I
> > think.
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com>
> > > > wrote:
> > > >
> > > > > I agree with Dong that out-of-order processing can happen with
> > having 2
> > > > > separate queues as well and it can even happen today.
> > > > > Can we use the correlationId in the request from the controller to
> > the
> > > > > broker to handle ordering ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > >
> > > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Good point, Joel. I agree that a dedicated controller request
> > handling
> > > > >> thread would be a better isolation. It also solves the reordering
> > > issue.
> > > > >>
> > > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > Good example. I think this scenario can occur in the current
> code
> > as
> > > > >> well
> > > > >> > but with even lower probability given that there are other
> > > > >> non-controller
> > > > >> > requests interleaved. It is still sketchy though and I think a
> > safer
> > > > >> > approach would be separate queues and pinning controller request
> > > > >> handling
> > > > >> > to one handler thread.
> > > > >> >
> > > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <lindong28@gmail.com
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Hey Becket,
> > > > >> > >
> > > > >> > > I think you are right that there may be out-of-order
> processing.
> > > > >> However,
> > > > >> > > it seems that out-of-order processing may also happen even if
> we
> > > > use a
> > > > >> > > separate queue.
> > > > >> > >
> > > > >> > > Here is the example:
> > > > >> > >
> > > > >> > > - Controller sends R1 and got disconnected before receiving
> > > > response.
> > > > >> > Then
> > > > >> > > it reconnects and sends R2. Both requests now stay in the
> > > controller
> > > > >> > > request queue in the order they are sent.
> > > > >> > > - thread1 takes R1_a from the request queue and then thread2
> > takes
> > > > R2
> > > > >> > from
> > > > >> > > the request queue almost at the same time.
> > > > >> > > - So R1_a and R2 are processed in parallel. There is chance
> that
> > > > R2's
> > > > >> > > processing is completed before R1.
> > > > >> > >
> > > > >> > > If out-of-order processing can happen for both approaches with
> > > very
> > > > >> low
> > > > >> > > probability, it may not be worthwhile to add the extra queue.
> > What
> > > > do
> > > > >> you
> > > > >> > > think?
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Dong
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi Mayuresh/Joel,
> > > > >> > > >
> > > > >> > > > Using the request channel as a dequeue was bright up some
> time
> > > ago
> > > > >> when
> > > > >> > > we
> > > > >> > > > initially thinking of prioritizing the request. The concern
> > was
> > > > that
> > > > >> > the
> > > > >> > > > controller requests are supposed to be processed in order.
> If
> > we
> > > > can
> > > > >> > > ensure
> > > > >> > > > that there is one controller request in the request channel,
> > the
> > > > >> order
> > > > >> > is
> > > > >> > > > not a concern. But in cases that there are more than one
> > > > controller
> > > > >> > > request
> > > > >> > > > inserted into the queue, the controller request order may
> > change
> > > > and
> > > > >> > > cause
> > > > >> > > > problem. For example, think about the following sequence:
> > > > >> > > > 1. Controller successfully sent a request R1 to broker
> > > > >> > > > 2. Broker receives R1 and put the request to the head of the
> > > > request
> > > > >> > > queue.
> > > > >> > > > 3. Controller to broker connection failed and the controller
> > > > >> > reconnected
> > > > >> > > to
> > > > >> > > > the broker.
> > > > >> > > > 4. Controller sends a request R2 to the broker
> > > > >> > > > 5. Broker receives R2 and add it to the head of the request
> > > queue.
> > > > >> > > > Now on the broker side, R2 will be processed before R1 is
> > > > processed,
> > > > >> > > which
> > > > >> > > > may cause problem.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > >
> > > > >> > > > Jiangjie (Becket) Qin
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > > jjkoshy.w@gmail.com>
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler
> > less
> > > > >> > invasive
> > > > >> > > > > alternative and it should work. Jun/Becket/others, do you
> > see
> > > > any
> > > > >> > > > pitfalls
> > > > >> > > > > with this approach?
> > > > >> > > > >
> > > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > > >> lucasatucla@gmail.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > @Mayuresh,
> > > > >> > > > > > That's a very interesting idea that I haven't thought
> > > before.
> > > > >> > > > > > It seems to solve our problem at hand pretty well, and
> > also
> > > > >> > > > > > avoids the need to have a new size metric and capacity
> > > config
> > > > >> > > > > > for the controller request queue. In fact, if we were to
> > > adopt
> > > > >> > > > > > this design, there is no public interface change, and we
> > > > >> > > > > > probably don't need a KIP.
> > > > >> > > > > > Also implementation wise, it seems
> > > > >> > > > > > the java class LinkedBlockingQueue can readily satisfy
> the
> > > > >> > > requirement
> > > > >> > > > > > by supporting a capacity, and also allowing inserting at
> > > both
> > > > >> ends.
> > > > >> > > > > >
> > > > >> > > > > > My only concern is that this design is tied to the
> > > coincidence
> > > > >> that
> > > > >> > > > > > we have two request priorities and there are two ends
> to a
> > > > >> deque.
> > > > >> > > > > > Hence by using the proposed design, it seems the network
> > > layer
> > > > >> is
> > > > >> > > > > > more tightly coupled with upper layer logic, e.g. if we
> > were
> > > > to
> > > > >> add
> > > > >> > > > > > an extra priority level in the future for some reason,
> we
> > > > would
> > > > >> > > > probably
> > > > >> > > > > > need to go back to the design of separate queues, one
> for
> > > each
> > > > >> > > priority
> > > > >> > > > > > level.
> > > > >> > > > > >
> > > > >> > > > > > In summary, I'm ok with both designs and lean toward
> your
> > > > >> suggested
> > > > >> > > > > > approach.
> > > > >> > > > > > Let's hear what others think.
> > > > >> > > > > >
> > > > >> > > > > > @Becket,
> > > > >> > > > > > In light of Mayuresh's suggested new design, I'm
> answering
> > > > your
> > > > >> > > > question
> > > > >> > > > > > only in the context
> > > > >> > > > > > of the current KIP design: I think your suggestion makes
> > > > sense,
> > > > >> and
> > > > >> > > I'm
> > > > >> > > > > ok
> > > > >> > > > > > with removing the capacity config and
> > > > >> > > > > > just relying on the default value of 20 being sufficient
> > > > enough.
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Lucas
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > >> > > > > > gharatmayuresh15@gmail.com
> > > > >> > > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Lucas,
> > > > >> > > > > > >
> > > > >> > > > > > > Seems like the main intent here is to prioritize the
> > > > >> controller
> > > > >> > > > request
> > > > >> > > > > > > over any other requests.
> > > > >> > > > > > > In that case, we can change the request queue to a
> > > dequeue,
> > > > >> where
> > > > >> > > you
> > > > >> > > > > > > always insert the normal requests (produce,
> > consume,..etc)
> > > > to
> > > > >> the
> > > > >> > > end
> > > > >> > > > > of
> > > > >> > > > > > > the dequeue, but if its a controller request, you
> insert
> > > it
> > > > to
> > > > >> > the
> > > > >> > > > head
> > > > >> > > > > > of
> > > > >> > > > > > > the queue. This ensures that the controller request
> will
> > > be
> > > > >> given
> > > > >> > > > > higher
> > > > >> > > > > > > priority over other requests.
> > > > >> > > > > > >
> > > > >> > > > > > > Also since we only read one request from the socket
> and
> > > mute
> > > > >> it
> > > > >> > and
> > > > >> > > > > only
> > > > >> > > > > > > unmute it after handling the request, this would
> ensure
> > > that
> > > > >> we
> > > > >> > > don't
> > > > >> > > > > > > handle controller requests out of order.
> > > > >> > > > > > >
> > > > >> > > > > > > With this approach we can avoid the second queue and
> the
> > > > >> > additional
> > > > >> > > > > > config
> > > > >> > > > > > > for the size of the queue.
> > > > >> > > > > > >
> > > > >> > > > > > > What do you think ?
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Mayuresh
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > > >> becket.qin@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hey Joel,
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thank for the detail explanation. I agree the
> current
> > > > design
> > > > >> > > makes
> > > > >> > > > > > sense.
> > > > >> > > > > > > > My confusion is about whether the new config for the
> > > > >> controller
> > > > >> > > > queue
> > > > >> > > > > > > > capacity is necessary. I cannot think of a case in
> > which
> > > > >> users
> > > > >> > > > would
> > > > >> > > > > > > change
> > > > >> > > > > > > > it.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > >
> > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > >> > > becket.qin@gmail.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Lucas,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I guess my question can be rephrased to "do we
> > expect
> > > > >> user to
> > > > >> > > > ever
> > > > >> > > > > > > change
> > > > >> > > > > > > > > the controller request queue capacity"? If we
> agree
> > > that
> > > > >> 20
> > > > >> > is
> > > > >> > > > > > already
> > > > >> > > > > > > a
> > > > >> > > > > > > > > very generous default number and we do not expect
> > user
> > > > to
> > > > >> > > change
> > > > >> > > > > it,
> > > > >> > > > > > is
> > > > >> > > > > > > > it
> > > > >> > > > > > > > > still necessary to expose this as a config?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Jiangjie (Becket) Qin
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > >> > > > lucasatucla@gmail.com
> > > > >> > > > > >
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >> @Becket
> > > > >> > > > > > > > >> 1. Thanks for the comment. You are right that
> > > normally
> > > > >> there
> > > > >> > > > > should
> > > > >> > > > > > be
> > > > >> > > > > > > > >> just
> > > > >> > > > > > > > >> one controller request because of muting,
> > > > >> > > > > > > > >> and I had NOT intended to say there would be many
> > > > >> enqueued
> > > > >> > > > > > controller
> > > > >> > > > > > > > >> requests.
> > > > >> > > > > > > > >> I went through the KIP again, and I'm not sure
> > which
> > > > part
> > > > >> > > > conveys
> > > > >> > > > > > that
> > > > >> > > > > > > > >> info.
> > > > >> > > > > > > > >> I'd be happy to revise if you point it out the
> > > section.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> 2. Though it should not happen in normal
> > conditions,
> > > > the
> > > > >> > > current
> > > > >> > > > > > > design
> > > > >> > > > > > > > >> does not preclude multiple controllers running
> > > > >> > > > > > > > >> at the same time, hence if we don't have the
> > > controller
> > > > >> > queue
> > > > >> > > > > > capacity
> > > > >> > > > > > > > >> config and simply make its capacity to be 1,
> > > > >> > > > > > > > >> network threads handling requests from different
> > > > >> controllers
> > > > >> > > > will
> > > > >> > > > > be
> > > > >> > > > > > > > >> blocked during those troublesome times,
> > > > >> > > > > > > > >> which is probably not what we want. On the other
> > > hand,
> > > > >> > adding
> > > > >> > > > the
> > > > >> > > > > > > extra
> > > > >> > > > > > > > >> config with a default value, say 20, guards us
> from
> > > > >> issues
> > > > >> > in
> > > > >> > > > > those
> > > > >> > > > > > > > >> troublesome times, and IMO there isn't much
> > downside
> > > of
> > > > >> > adding
> > > > >> > > > the
> > > > >> > > > > > > extra
> > > > >> > > > > > > > >> config.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> @Mayuresh
> > > > >> > > > > > > > >> Good catch, this sentence is an obsolete
> statement
> > > > based
> > > > >> on
> > > > >> > a
> > > > >> > > > > > previous
> > > > >> > > > > > > > >> design. I've revised the wording in the KIP.
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> Thanks,
> > > > >> > > > > > > > >> Lucas
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh
> Gharat <
> > > > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > >> > > > > > > > >>
> > > > >> > > > > > > > >> > Hi Lucas,
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > Thanks for the KIP.
> > > > >> > > > > > > > >> > I am trying to understand why you think "The
> > memory
> > > > >> > > > consumption
> > > > >> > > > > > can
> > > > >> > > > > > > > rise
> > > > >> > > > > > > > >> > given the total number of queued requests can
> go
> > up
> > > > to
> > > > >> 2x"
> > > > >> > > in
> > > > >> > > > > the
> > > > >> > > > > > > > impact
> > > > >> > > > > > > > >> > section. Normally the requests from controller
> > to a
> > > > >> Broker
> > > > >> > > are
> > > > >> > > > > not
> > > > >> > > > > > > > high
> > > > >> > > > > > > > >> > volume, right ?
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > Thanks,
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > Mayuresh
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > >> > > > > becket.qin@gmail.com>
> > > > >> > > > > > > > >> wrote:
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the
> > control
> > > > >> plane
> > > > >> > > from
> > > > >> > > > > the
> > > > >> > > > > > > > data
> > > > >> > > > > > > > >> > plane
> > > > >> > > > > > > > >> > > makes a lot of sense.
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > In the KIP you mentioned that the controller
> > > > request
> > > > >> > queue
> > > > >> > > > may
> > > > >> > > > > > > have
> > > > >> > > > > > > > >> many
> > > > >> > > > > > > > >> > > requests in it. Will this be a common case?
> The
> > > > >> > controller
> > > > >> > > > > > > requests
> > > > >> > > > > > > > >> still
> > > > >> > > > > > > > >> > > goes through the SocketServer. The
> SocketServer
> > > > will
> > > > >> > mute
> > > > >> > > > the
> > > > >> > > > > > > > channel
> > > > >> > > > > > > > >> > once
> > > > >> > > > > > > > >> > > a request is read and put into the request
> > > channel.
> > > > >> So
> > > > >> > > > > assuming
> > > > >> > > > > > > > there
> > > > >> > > > > > > > >> is
> > > > >> > > > > > > > >> > > only one connection between controller and
> each
> > > > >> broker,
> > > > >> > on
> > > > >> > > > the
> > > > >> > > > > > > > broker
> > > > >> > > > > > > > >> > side,
> > > > >> > > > > > > > >> > > there should be only one controller request
> in
> > > the
> > > > >> > > > controller
> > > > >> > > > > > > > request
> > > > >> > > > > > > > >> > queue
> > > > >> > > > > > > > >> > > at any given time. If that is the case, do we
> > > need
> > > > a
> > > > >> > > > separate
> > > > >> > > > > > > > >> controller
> > > > >> > > > > > > > >> > > request queue capacity config? The default
> > value
> > > 20
> > > > >> > means
> > > > >> > > > that
> > > > >> > > > > > we
> > > > >> > > > > > > > >> expect
> > > > >> > > > > > > > >> > > there are 20 controller switches to happen
> in a
> > > > short
> > > > >> > > period
> > > > >> > > > > of
> > > > >> > > > > > > > time.
> > > > >> > > > > > > > >> I
> > > > >> > > > > > > > >> > am
> > > > >> > > > > > > > >> > > not sure whether someone should increase the
> > > > >> controller
> > > > >> > > > > request
> > > > >> > > > > > > > queue
> > > > >> > > > > > > > >> > > capacity to handle such case, as it seems
> > > > indicating
> > > > >> > > > something
> > > > >> > > > > > > very
> > > > >> > > > > > > > >> wrong
> > > > >> > > > > > > > >> > > has happened.
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > Thanks,
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > >> > > > > lindong28@gmail.com>
> > > > >> > > > > > > > >> wrote:
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > > Thanks for the update Lucas.
> > > > >> > > > > > > > >> > > >
> > > > >> > > > > > > > >> > > > I think the motivation section is
> intuitive.
> > It
> > > > >> will
> > > > >> > be
> > > > >> > > > good
> > > > >> > > > > > to
> > > > >> > > > > > > > >> learn
> > > > >> > > > > > > > >> > > more
> > > > >> > > > > > > > >> > > > about the comments from other reviewers.
> > > > >> > > > > > > > >> > > >
> > > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas
> Wang <
> > > > >> > > > > > > > lucasatucla@gmail.com>
> > > > >> > > > > > > > >> > > wrote:
> > > > >> > > > > > > > >> > > >
> > > > >> > > > > > > > >> > > > > Hi Dong,
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > I've updated the motivation section of
> the
> > > KIP
> > > > by
> > > > >> > > > > explaining
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> > cases
> > > > >> > > > > > > > >> > > > that
> > > > >> > > > > > > > >> > > > > would have user impacts.
> > > > >> > > > > > > > >> > > > > Please take a look at let me know your
> > > > comments.
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > Thanks,
> > > > >> > > > > > > > >> > > > > Lucas
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas
> Wang
> > <
> > > > >> > > > > > > > lucasatucla@gmail.com
> > > > >> > > > > > > > >> >
> > > > >> > > > > > > > >> > > > wrote:
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > > Hi Dong,
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > The simulation of disk being slow is
> > merely
> > > > >> for me
> > > > >> > > to
> > > > >> > > > > > easily
> > > > >> > > > > > > > >> > > construct
> > > > >> > > > > > > > >> > > > a
> > > > >> > > > > > > > >> > > > > > testing scenario
> > > > >> > > > > > > > >> > > > > > with a backlog of produce requests. In
> > > > >> production,
> > > > >> > > > other
> > > > >> > > > > > > than
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > disk
> > > > >> > > > > > > > >> > > > > > being slow, a backlog of
> > > > >> > > > > > > > >> > > > > > produce requests may also be caused by
> > high
> > > > >> > produce
> > > > >> > > > QPS.
> > > > >> > > > > > > > >> > > > > > In that case, we may not want to kill
> the
> > > > >> broker
> > > > >> > and
> > > > >> > > > > > that's
> > > > >> > > > > > > > when
> > > > >> > > > > > > > >> > this
> > > > >> > > > > > > > >> > > > KIP
> > > > >> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > >> > > > > > > > >> > > > > > and non-JBOD setup.
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > Going back to your previous question
> > about
> > > > each
> > > > >> > > > > > > ProduceRequest
> > > > >> > > > > > > > >> > > covering
> > > > >> > > > > > > > >> > > > > 20
> > > > >> > > > > > > > >> > > > > > partitions that are randomly
> > > > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr
> > > request
> > > > >> is
> > > > >> > > > > enqueued
> > > > >> > > > > > > that
> > > > >> > > > > > > > >> > tries
> > > > >> > > > > > > > >> > > to
> > > > >> > > > > > > > >> > > > > > switch the current broker, say broker0,
> > > from
> > > > >> > leader
> > > > >> > > to
> > > > >> > > > > > > > follower
> > > > >> > > > > > > > >> > > > > > *for one of the partitions*, say
> > *test-0*.
> > > > For
> > > > >> the
> > > > >> > > > sake
> > > > >> > > > > of
> > > > >> > > > > > > > >> > argument,
> > > > >> > > > > > > > >> > > > > > let's also assume the other brokers,
> say
> > > > >> broker1,
> > > > >> > > have
> > > > >> > > > > > > > *stopped*
> > > > >> > > > > > > > >> > > > fetching
> > > > >> > > > > > > > >> > > > > > from
> > > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests
> have
> > > > acks =
> > > > >> > -1
> > > > >> > > > > (ALL)
> > > > >> > > > > > > > >> > > > > >   1.1 without this KIP, the
> > ProduceRequests
> > > > >> ahead
> > > > >> > of
> > > > >> > > > > > > > >> LeaderAndISR
> > > > >> > > > > > > > >> > > will
> > > > >> > > > > > > > >> > > > be
> > > > >> > > > > > > > >> > > > > > put into the purgatory,
> > > > >> > > > > > > > >> > > > > >         and since they'll never be
> > > replicated
> > > > >> to
> > > > >> > > other
> > > > >> > > > > > > brokers
> > > > >> > > > > > > > >> > > (because
> > > > >> > > > > > > > >> > > > > of
> > > > >> > > > > > > > >> > > > > > the assumption made above), they will
> > > > >> > > > > > > > >> > > > > >         be completed either when the
> > > > >> LeaderAndISR
> > > > >> > > > > request
> > > > >> > > > > > is
> > > > >> > > > > > > > >> > > processed
> > > > >> > > > > > > > >> > > > or
> > > > >> > > > > > > > >> > > > > > when the timeout happens.
> > > > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
> > > immediately
> > > > >> > > > transition
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > > partition
> > > > >> > > > > > > > >> > > > > > test-0 to become a follower,
> > > > >> > > > > > > > >> > > > > >         after the current broker sees
> the
> > > > >> > > replication
> > > > >> > > > of
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > > remaining
> > > > >> > > > > > > > >> > > > 19
> > > > >> > > > > > > > >> > > > > > partitions, it can send a response
> > > indicating
> > > > >> that
> > > > >> > > > > > > > >> > > > > >         it's no longer the leader for
> the
> > > > >> > "test-0".
> > > > >> > > > > > > > >> > > > > >   To see the latency difference between
> > 1.1
> > > > and
> > > > >> > 1.2,
> > > > >> > > > > let's
> > > > >> > > > > > > say
> > > > >> > > > > > > > >> > there
> > > > >> > > > > > > > >> > > > are
> > > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> > > > LeaderAndISR,
> > > > >> > and
> > > > >> > > > > there
> > > > >> > > > > > > are
> > > > >> > > > > > > > 8
> > > > >> > > > > > > > >> io
> > > > >> > > > > > > > >> > > > > threads,
> > > > >> > > > > > > > >> > > > > >   so each io thread will process
> > > > approximately
> > > > >> > 3000
> > > > >> > > > > > produce
> > > > >> > > > > > > > >> > requests.
> > > > >> > > > > > > > >> > > > Now
> > > > >> > > > > > > > >> > > > > > let's investigate the io thread that
> > > finally
> > > > >> > > processed
> > > > >> > > > > the
> > > > >> > > > > > > > >> > > > LeaderAndISR.
> > > > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if we
> > > model
> > > > >> the
> > > > >> > > time
> > > > >> > > > > when
> > > > >> > > > > > > > their
> > > > >> > > > > > > > >> > > > > remaining
> > > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1,
> > ...t2999,
> > > > and
> > > > >> > the
> > > > >> > > > > > > > LeaderAndISR
> > > > >> > > > > > > > >> > > > request
> > > > >> > > > > > > > >> > > > > is
> > > > >> > > > > > > > >> > > > > > processed at time t3000.
> > > > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce
> > request
> > > > >> would
> > > > >> > > have
> > > > >> > > > > > > waited
> > > > >> > > > > > > > an
> > > > >> > > > > > > > >> > > extra
> > > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the
> 2nd
> > > an
> > > > >> extra
> > > > >> > > > time
> > > > >> > > > > of
> > > > >> > > > > > > > >> t3000 -
> > > > >> > > > > > > > >> > > t1,
> > > > >> > > > > > > > >> > > > > etc.
> > > > >> > > > > > > > >> > > > > >   Roughly speaking, the latency
> > difference
> > > is
> > > > >> > bigger
> > > > >> > > > for
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > earlier
> > > > >> > > > > > > > >> > > > > > produce requests than for the later
> ones.
> > > For
> > > > >> the
> > > > >> > > same
> > > > >> > > > > > > reason,
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > more
> > > > >> > > > > > > > >> > > > > > ProduceRequests queued
> > > > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger
> > > benefit
> > > > >> we
> > > > >> > get
> > > > >> > > > > > (capped
> > > > >> > > > > > > > by
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > > > > produce timeout).
> > > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests
> have
> > > > >> acks=0 or
> > > > >> > > > > acks=1
> > > > >> > > > > > > > >> > > > > >   There will be no latency differences
> in
> > > > this
> > > > >> > case,
> > > > >> > > > but
> > > > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> > > > >> partition
> > > > >> > > > test-0
> > > > >> > > > > in
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the
> LeaderAndISR
> > > > will
> > > > >> be
> > > > >> > > > > appended
> > > > >> > > > > > > to
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > local
> > > > >> > > > > > > > >> > > > > log,
> > > > >> > > > > > > > >> > > > > >         and eventually be truncated
> after
> > > > >> > processing
> > > > >> > > > the
> > > > >> > > > > > > > >> > > LeaderAndISR.
> > > > >> > > > > > > > >> > > > > > This is what's referred to as
> > > > >> > > > > > > > >> > > > > >         "some unofficial definition of
> > data
> > > > >> loss
> > > > >> > in
> > > > >> > > > > terms
> > > > >> > > > > > of
> > > > >> > > > > > > > >> > messages
> > > > >> > > > > > > > >> > > > > > beyond the high watermark".
> > > > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate
> the
> > > > effect
> > > > >> > > since
> > > > >> > > > if
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > > > LeaderAndISR
> > > > >> > > > > > > > >> > > > > > is immediately processed, the response
> to
> > > > >> > producers
> > > > >> > > > will
> > > > >> > > > > > > have
> > > > >> > > > > > > > >> > > > > >         the NotLeaderForPartition
> error,
> > > > >> causing
> > > > >> > > > > producers
> > > > >> > > > > > > to
> > > > >> > > > > > > > >> retry
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > This explanation above is the benefit
> for
> > > > >> reducing
> > > > >> > > the
> > > > >> > > > > > > latency
> > > > >> > > > > > > > >> of a
> > > > >> > > > > > > > >> > > > > broker
> > > > >> > > > > > > > >> > > > > > becoming the follower,
> > > > >> > > > > > > > >> > > > > > closely related is reducing the latency
> > of
> > > a
> > > > >> > broker
> > > > >> > > > > > becoming
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >> > > > leader.
> > > > >> > > > > > > > >> > > > > > In this case, the benefit is even more
> > > > >> obvious, if
> > > > >> > > > other
> > > > >> > > > > > > > brokers
> > > > >> > > > > > > > >> > have
> > > > >> > > > > > > > >> > > > > > resigned leadership, and the
> > > > >> > > > > > > > >> > > > > > current broker should take leadership.
> > Any
> > > > >> delay
> > > > >> > in
> > > > >> > > > > > > processing
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > >> > > > > > > > >> > > > > > by clients as unavailability. In
> extreme
> > > > cases,
> > > > >> > this
> > > > >> > > > can
> > > > >> > > > > > > cause
> > > > >> > > > > > > > >> > failed
> > > > >> > > > > > > > >> > > > > > produce requests if the retries are
> > > > >> > > > > > > > >> > > > > > exhausted.
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > Another two types of controller
> requests
> > > are
> > > > >> > > > > > UpdateMetadata
> > > > >> > > > > > > > and
> > > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss
> > as
> > > > >> > follows:
> > > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> > > > processing
> > > > >> > > means
> > > > >> > > > > > > clients
> > > > >> > > > > > > > >> > > receiving
> > > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> > > > leadership
> > > > >> > info
> > > > >> > > > > > > > >> > > > > > for certain partitions, and the effect
> is
> > > > more
> > > > >> > > retries
> > > > >> > > > > or
> > > > >> > > > > > > even
> > > > >> > > > > > > > >> > fatal
> > > > >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > For StopReplica requests, a long
> queuing
> > > time
> > > > >> may
> > > > >> > > > > degrade
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> > > > performance
> > > > >> > > > > > > > >> > > > > > of topic deletion.
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > Regarding your last question of the
> delay
> > > for
> > > > >> > > > > > > > >> > DescribeLogDirsRequest,
> > > > >> > > > > > > > >> > > > you
> > > > >> > > > > > > > >> > > > > > are right
> > > > >> > > > > > > > >> > > > > > that this KIP cannot help with the
> > latency
> > > in
> > > > >> > > getting
> > > > >> > > > > the
> > > > >> > > > > > > log
> > > > >> > > > > > > > >> dirs
> > > > >> > > > > > > > >> > > > info,
> > > > >> > > > > > > > >> > > > > > and it's only relevant
> > > > >> > > > > > > > >> > > > > > when controller requests are involved.
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > Regards,
> > > > >> > > > > > > > >> > > > > > Lucas
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong
> Lin
> > <
> > > > >> > > > > > > lindong28@gmail.com
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >> > > wrote:
> > > > >> > > > > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > >> Hey Jun,
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is
> good
> > > > >> point.
> > > > >> > So
> > > > >> > > > the
> > > > >> > > > > > > > feature
> > > > >> > > > > > > > >> may
> > > > >> > > > > > > > >> > > be
> > > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> > > > question
> > > > >> > > below.
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> Hey Lucas,
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> Do you think this feature is also
> useful
> > > for
> > > > >> > > non-JBOD
> > > > >> > > > > > setup
> > > > >> > > > > > > > or
> > > > >> > > > > > > > >> it
> > > > >> > > > > > > > >> > is
> > > > >> > > > > > > > >> > > > > only
> > > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be
> > > useful
> > > > to
> > > > >> > > > > understand
> > > > >> > > > > > > > this.
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD,
> in
> > > > order
> > > > >> to
> > > > >> > > move
> > > > >> > > > > > > leaders
> > > > >> > > > > > > > >> on
> > > > >> > > > > > > > >> > the
> > > > >> > > > > > > > >> > > > > >> failed
> > > > >> > > > > > > > >> > > > > >> disk to other disks, the system
> operator
> > > > first
> > > > >> > > needs
> > > > >> > > > to
> > > > >> > > > > > get
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >> > list
> > > > >> > > > > > > > >> > > > of
> > > > >> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> > > > >> currently
> > > > >> > > > > achieved
> > > > >> > > > > > > > using
> > > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which
> > sends
> > > > >> > > > > > > > >> DescribeLogDirsRequest
> > > > >> > > > > > > > >> > to
> > > > >> > > > > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
> > > controller
> > > > >> > > > requests,
> > > > >> > > > > > then
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > >> > > > > > > > >> > > > > >> may still take a long time to be
> > processed
> > > > by
> > > > >> the
> > > > >> > > > > broker.
> > > > >> > > > > > > So
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > > overall
> > > > >> > > > > > > > >> > > > > >> time to move leaders away from the
> > failed
> > > > disk
> > > > >> > may
> > > > >> > > > > still
> > > > >> > > > > > be
> > > > >> > > > > > > > >> long
> > > > >> > > > > > > > >> > > even
> > > > >> > > > > > > > >> > > > > with
> > > > >> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> Thanks,
> > > > >> > > > > > > > >> > > > > >> Dong
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas
> > > Wang <
> > > > >> > > > > > > > >> lucasatucla@gmail.com
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > > > wrote:
> > > > >> > > > > > > > >> > > > > >>
> > > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment,
> > Jun.
> > > > >> > > > > > > > >> > > > > >> >
> > > > >> > > > > > > > >> > > > > >> > @Dong,
> > > > >> > > > > > > > >> > > > > >> > Since both of the two comments in
> your
> > > > >> previous
> > > > >> > > > email
> > > > >> > > > > > are
> > > > >> > > > > > > > >> about
> > > > >> > > > > > > > >> > > the
> > > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether
> it's
> > > > >> useful,
> > > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do
> you
> > > > agree
> > > > >> > that
> > > > >> > > > > this
> > > > >> > > > > > > KIP
> > > > >> > > > > > > > >> can
> > > > >> > > > > > > > >> > be
> > > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by
> > Jun?
> > > > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > >> > > > > > > > >> > > > > >> >
> > > > >> > > > > > > > >> > > > > >> > Regards,
> > > > >> > > > > > > > >> > > > > >> > Lucas
> > > > >> > > > > > > > >> > > > > >> >
> > > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun
> > Rao
> > > <
> > > > >> > > > > > > jun@confluent.io>
> > > > >> > > > > > > > >> > wrote:
> > > > >> > > > > > > > >> > > > > >> >
> > > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > >> > > > > > > > >> > > > > >> > >
> > > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow,
> > one
> > > > >> > probably
> > > > >> > > > > > should
> > > > >> > > > > > > > just
> > > > >> > > > > > > > >> > kill
> > > > >> > > > > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may
> > not
> > > > >> help.
> > > > >> > If
> > > > >> > > > > only
> > > > >> > > > > > > one
> > > > >> > > > > > > > of
> > > > >> > > > > > > > >> > the
> > > > >> > > > > > > > >> > > > > disks
> > > > >> > > > > > > > >> > > > > >> on
> > > > >> > > > > > > > >> > > > > >> > a
> > > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to
> fail
> > > > that
> > > > >> > disk
> > > > >> > > > and
> > > > >> > > > > > move
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >> > > > leaders
> > > > >> > > > > > > > >> > > > > on
> > > > >> > > > > > > > >> > > > > >> > that
> > > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that
> case,
> > > > being
> > > > >> > able
> > > > >> > > > to
> > > > >> > > > > > > > process
> > > > >> > > > > > > > >> the
> > > > >> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > >> > > > > > > > >> > > > > >> > > requests faster will potentially
> > help
> > > > the
> > > > >> > > > producers
> > > > >> > > > > > > > recover
> > > > >> > > > > > > > >> > > > quicker.
> > > > >> > > > > > > > >> > > > > >> > >
> > > > >> > > > > > > > >> > > > > >> > > Thanks,
> > > > >> > > > > > > > >> > > > > >> > >
> > > > >> > > > > > > > >> > > > > >> > > Jun
> > > > >> > > > > > > > >> > > > > >> > >
> > > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM,
> Dong
> > > > Lin <
> > > > >> > > > > > > > >> lindong28@gmail.com
> > > > >> > > > > > > > >> > >
> > > > >> > > > > > > > >> > > > > wrote:
> > > > >> > > > > > > > >> > > > > >> > >
> > > > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some
> follow
> > up
> > > > >> > > questions
> > > > >> > > > > > below.
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each
> > ProduceRequest
> > > > >> covers
> > > > >> > 20
> > > > >> > > > > > > > partitions
> > > > >> > > > > > > > >> > that
> > > > >> > > > > > > > >> > > > are
> > > > >> > > > > > > > >> > > > > >> > > randomly
> > > > >> > > > > > > > >> > > > > >> > > > distributed across all
> partitions,
> > > > then
> > > > >> > each
> > > > >> > > > > > > > >> ProduceRequest
> > > > >> > > > > > > > >> > > will
> > > > >> > > > > > > > >> > > > > >> likely
> > > > >> > > > > > > > >> > > > > >> > > > cover some partitions for which
> > the
> > > > >> broker
> > > > >> > is
> > > > >> > > > > still
> > > > >> > > > > > > > >> leader
> > > > >> > > > > > > > >> > > after
> > > > >> > > > > > > > >> > > > > it
> > > > >> > > > > > > > >> > > > > >> > > quickly
> > > > >> > > > > > > > >> > > > > >> > > > processes the
> > > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker
> > > will
> > > > >> still
> > > > >> > > be
> > > > >> > > > > slow
> > > > >> > > > > > > in
> > > > >> > > > > > > > >> > > > processing
> > > > >> > > > > > > > >> > > > > >> these
> > > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will
> > > still
> > > > be
> > > > >> > very
> > > > >> > > > > high
> > > > >> > > > > > > with
> > > > >> > > > > > > > >> this
> > > > >> > > > > > > > >> > > > KIP.
> > > > >> > > > > > > > >> > > > > It
> > > > >> > > > > > > > >> > > > > >> > > seems
> > > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will
> > still
> > > > >> timeout
> > > > >> > > > after
> > > > >> > > > > > 30
> > > > >> > > > > > > > >> > seconds.
> > > > >> > > > > > > > >> > > Is
> > > > >> > > > > > > > >> > > > > >> this
> > > > >> > > > > > > > >> > > > > >> > > > understanding correct?
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most
> > ProduceRequest
> > > > will
> > > > >> > > still
> > > > >> > > > > > > timeout
> > > > >> > > > > > > > >> after
> > > > >> > > > > > > > >> > > 30
> > > > >> > > > > > > > >> > > > > >> > seconds,
> > > > >> > > > > > > > >> > > > > >> > > > then it is less clear how this
> KIP
> > > > >> reduces
> > > > >> > > > > average
> > > > >> > > > > > > > >> produce
> > > > >> > > > > > > > >> > > > > latency.
> > > > >> > > > > > > > >> > > > > >> Can
> > > > >> > > > > > > > >> > > > > >> > > you
> > > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be
> > improved
> > > > by
> > > > >> > this
> > > > >> > > > KIP?
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
> > > directly
> > > > >> cares
> > > > >> > > > > number
> > > > >> > > > > > of
> > > > >> > > > > > > > >> > > truncated
> > > > >> > > > > > > > >> > > > > >> > messages.
> > > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> > > > average
> > > > >> > > > > throughput
> > > > >> > > > > > > or
> > > > >> > > > > > > > >> > reduce
> > > > >> > > > > > > > >> > > > > >> message
> > > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> > > > >> understand
> > > > >> > > > this.
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > Thanks,
> > > > >> > > > > > > > >> > > > > >> > > > Dong
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM
> > Lucas
> > > > >> Wang <
> > > > >> > > > > > > > >> > > lucasatucla@gmail.com
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > >> > > > > > > > >> > > > > >> > > >
> > > > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable
> > comments.
> > > > >> Please
> > > > >> > > see
> > > > >> > > > > my
> > > > >> > > > > > > > reply
> > > > >> > > > > > > > >> > > below.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only
> 1
> > > > >> > partition.
> > > > >> > > > Now
> > > > >> > > > > > > let's
> > > > >> > > > > > > > >> > > consider
> > > > >> > > > > > > > >> > > > a
> > > > >> > > > > > > > >> > > > > >> more
> > > > >> > > > > > > > >> > > > > >> > > > common
> > > > >> > > > > > > > >> > > > > >> > > > > scenario
> > > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of
> > > many
> > > > >> > > > partitions.
> > > > >> > > > > > And
> > > > >> > > > > > > > >> let's
> > > > >> > > > > > > > >> > > say
> > > > >> > > > > > > > >> > > > > for
> > > > >> > > > > > > > >> > > > > >> > some
> > > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > >> > > > > > > > >> > > > > >> > > > > The number of leader
> partitions
> > on
> > > > >> > broker0
> > > > >> > > is
> > > > >> > > > > so
> > > > >> > > > > > > > large,
> > > > >> > > > > > > > >> > say
> > > > >> > > > > > > > >> > > > 10K,
> > > > >> > > > > > > > >> > > > > >> that
> > > > >> > > > > > > > >> > > > > >> > > the
> > > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > >> > > > > > > > >> > > > > >> > > > > and the operator would like to
> > > shift
> > > > >> the
> > > > >> > > > > > leadership
> > > > >> > > > > > > > >> for a
> > > > >> > > > > > > > >> > > lot
> > > > >> > > > > > > > >> > > > of
> > > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> > > > brokers,
> > > > >> > > > > > > > >> > > > > >> > > > > either manually or through
> some
> > > > >> service
> > > > >> > > like
> > > > >> > > > > > cruise
> > > > >> > > > > > > > >> > control.
> > > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will
> the
> > > > >> > leadership
> > > > >> > > > > > > > transitions
> > > > >> > > > > > > > >> > > finish
> > > > >> > > > > > > > >> > > > > >> more
> > > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster
> > > itself
> > > > >> > > becoming
> > > > >> > > > > more
> > > > >> > > > > > > > >> > balanced,
> > > > >> > > > > > > > >> > > > > >> > > > > but all existing producers
> > > > >> corresponding
> > > > >> > to
> > > > >> > > > the
> > > > >> > > > > > 9K
> > > > >> > > > > > > > >> > > partitions
> > > > >> > > > > > > > >> > > > > will
> > > > >> > > > > > > > >> > > > > >> > get
> > > > >> > > > > > > > >> > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> > > > timeout,
> > > > >> > > thanks
> > > > >> > > > to
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > batched
> > > > >> > > > > > > > >> > > > > async
> > > > >> > > > > > > > >> > > > > >> ZK
> > > > >> > > > > > > > >> > > > > >> > > > > operations.
> > > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to
> > > have
> > > > >> > during
> > > > >> > > > such
> > > > >> > > > > > > > >> > troublesome
> > > > >> > > > > > > > >> > > > > times.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the
> Google
> > > Doc
> > > > >> have
> > > > >> > > > shown
> > > > >> > > > > > > that
> > > > >> > > > > > > > >> with
> > > > >> > > > > > > > >> > > this
> > > > >> > > > > > > > >> > > > > KIP
> > > > >> > > > > > > > >> > > > > >> > many
> > > > >> > > > > > > > >> > > > > >> > > > > producers
> > > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > >> > > > > NotLeaderForPartition,
> > > > >> > > > > > > > based
> > > > >> > > > > > > > >> on
> > > > >> > > > > > > > >> > > > which
> > > > >> > > > > > > > >> > > > > >> they
> > > > >> > > > > > > > >> > > > > >> > > > retry
> > > > >> > > > > > > > >> > > > > >> > > > > immediately.
> > > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > > > >> seconds+quick
> > > > >> > > > retry)
> > > > >> > > > > > for
> > > > >> > > > > > > > >> their
> > > > >> > > > > > > > >> > > > single
> > > > >> > > > > > > > >> > > > > >> > > message
> > > > >> > > > > > > > >> > > > > >> > > > is
> > > > >> > > > > > > > >> > > > > >> > > > > much smaller
> > > > >> > > > > > > > >> > > > > >> > > > > compared with the case of
> timing
> > > out
> > > > >> > > without
> > > > >> > > > > the
> > > > >> > > > > > > KIP
> > > > >> > > > > > > > >> (30
> > > > >> > > > > > > > >> > > > seconds
> > > > >> > > > > > > > >> > > > > >> for
> > > > >> > > > > > > > >> > > > > >> > > > timing
> > > > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > >> > > > > > > > >> > > > > >> > > > > One might argue that reducing
> > the
> > > > >> timing
> > > > >> > > out
> > > > >> > > > on
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > producer
> > > > >> > > > > > > > >> > > > > side
> > > > >> > > > > > > > >> > > > > >> can
> > > > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has
> its
> > > own
> > > > >> > > > > > drawbacks[1].
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric
> to
> > > > show
> > > > >> the
> > > > >> > > > > number
> > > > >> > > > > > of
> > > > >> > > > > > > > >> > > truncated
> > > > >> > > > > > > > >> > > > > >> > messages
> > > > >> > > > > > > > >> > > > > >> > > on
> > > > >> > > > > > > > >> > > > > >> > > > > brokers,
> > > > >> > > > > > > > >> > > > > >> > > > > with the experiments done in
> the
> > > > >> Google
> > > > >> > > Doc,
> > > > >> > > > it
> > > > >> > > > > > > > should
> > > > >> > > > > > > > >> be
> > > > >> > > > > > > > >> > > easy
> > > > >> > > > > > > > >> > > > > to
> > > > >> > > > > > > > >> > > > > >> see
> > > > >> > > > > > > > >> > > > > >> > > > that
> > > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0
> since
> > > the
> > > > >> > > > up-to-date
> > > > >> > > > > > > > >> metadata
> > > > >> > > > > > > > >> > > > avoids
> > > > >> > > > > > > > >> > > > > >> > > appending
> > > > >> > > > > > > > >> > > > > >> > > > > of messages
> > > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE
> requests.
> > If
> > > > we
> > > > >> > talk
> > > > >> > > > to a
> > > > >> > > > > > > > system
> > > > >> > > > > > > > >> > > > operator
> > > > >> > > > > > > > >> > > > > >> and
> > > > >> > > > > > > > >> > > > > >> > ask
> > > > >> > > > > > > > >> > > > > >> > > > > whether
> > > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful
> IOs,
> > I
> > > > bet
> > > > >> > most
> > > > >> > > > > likely
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> > answer
> > > > >> > > > > > > > >> > > > is
> > > > >> > > > > > > > >> > > > > >> yes.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I
> > > think
> > > > it
> > > > >> > > might
> > > > >> > > > be
> > > > >> > > > > > > > >> helpful to
> > > > >> > > > > > > > >> > > > > >> construct
> > > > >> > > > > > > > >> > > > > >> > > some
> > > > >> > > > > > > > >> > > > > >> > > > > formulas.
> > > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm
> > > going
> > > > >> back
> > > > >> > to
> > > > >> > > > the
> > > > >> > > > > > > case
> > > > >> > > > > > > > >> where
> > > > >> > > > > > > > >> > > > there
> > > > >> > > > > > > > >> > > > > >> is
> > > > >> > > > > > > > >> > > > > >> > > only
> > > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > >> > > > > > > > >> > > > > >> > > > > Following the experiments in
> the
> > > > >> Google
> > > > >> > > Doc,
> > > > >> > > > > > let's
> > > > >> > > > > > > > say
> > > > >> > > > > > > > >> > > broker0
> > > > >> > > > > > > > >> > > > > >> > becomes
> > > > >> > > > > > > > >> > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were still
> N
> > > > >> produce
> > > > >> > > > > requests
> > > > >> > > > > > in
> > > > >> > > > > > > > its
> > > > >> > > > > > > > >> > > > request
> > > > >> > > > > > > > >> > > > > >> > queue.
> > > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata
> > > brought
> > > > >> by
> > > > >> > > this
> > > > >> > > > > KIP,
> > > > >> > > > > > > > >> broker0
> > > > >> > > > > > > > >> > > can
> > > > >> > > > > > > > >> > > > > >> reply
> > > > >> > > > > > > > >> > > > > >> > > with
> > > > >> > > > > > > > >> > > > > >> > > > an
> > > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition
> exception,
> > > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the
> > average
> > > > >> > > processing
> > > > >> > > > > > time
> > > > >> > > > > > > of
> > > > >> > > > > > > > >> > > replying
> > > > >> > > > > > > > >> > > > > >> with
> > > > >> > > > > > > > >> > > > > >> > > such
> > > > >> > > > > > > > >> > > > > >> > > > an
> > > > >> > > > > > > > >> > > > > >> > > > > error message.
> > > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker
> > will
> > > > >> need to
> > > > >> > > > > append
> > > > >> > > > > > > > >> messages
> > > > >> > > > > > > > >> > to
> > > > >> > > > > > > > >> > > > > >> > segments,
> > > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to
> > disk,
> > > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the
> > average
> > > > >> > > processing
> > > > >> > > > > > time
> > > > >> > > > > > > > for
> > > > >> > > > > > > > >> > such
> > > > >> > > > > > > > >> > > > > logic.
> > > > >> > > > > > > > >> > > > > >> > > > > Then the average extra latency
> > > > >> incurred
> > > > >> > > > without
> > > > >> > > > > > > this
> > > > >> > > > > > > > >> KIP
> > > > >> > > > > > > > >> > is
> > > > >> > > > > > > > >> > > N
> > > > >> > > > > > > > >> > > > *
> > > > >> > > > > > > > >> > > > > >> (M2 -
> > > > >> > > > > > > > >> > > > > >> > > > M1) /
> > > > >> > > > > > > > >> > > > > >> > > > > 2.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always
> be
> > > > >> larger
> > > > >> > > than
> > > > >> > > > > M1,
> > > > >> > > > > > > > which
> > > > >> > > > > > > > >> > means
> > > > >> > > > > > > > >> > > > as
> > > > >> > > > > > > > >> > > > > >> long
> > > > >> > > > > > > > >> > > > > >> > > as N
> > > > >> > > > > > > > >> > > > > >> > > > > is positive,
> > > > >> > > > > > > > >> > > > > >> > > > > we would see improvements on
> the
> > > > >> average
> > > > >> > > > > latency.
> > > > >> > > > > > > > >> > > > > >> > > > > There does not need to be
> > > > significant
> > > > >> > > backlog
> > > > >> > > > > of
> > > > >> > > > > > > > >> requests
> > > > >> > > > > > > > >> > in
> > > > >> > > > > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> > > request
> > > > >> > > > > > > > >> > > > > >> > > > > queue,
> > > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> > > > >> performance
> > > > >> > > to
> > > > >> > > > > have
> > > > >> > > > > > > the
> > > > >> > > > > > > > >> > > > > improvement.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > Regards,
> > > > >> > > > > > > > >> > > > > >> > > > > Lucas
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> > > > >> timeout on
> > > > >> > > the
> > > > >> > > > > > > > producer
> > > > >> > > > > > > > >> > side
> > > > >> > > > > > > > >> > > > can
> > > > >> > > > > > > > >> > > > > >> > trigger
> > > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > >> > > > > > > > >> > > > > >> > > > > when the corresponding leader
> > > broker
> > > > >> is
> > > > >> > > > > > overloaded,
> > > > >> > > > > > > > >> > > > exacerbating
> > > > >> > > > > > > > >> > > > > >> the
> > > > >> > > > > > > > >> > > > > >> > > > > situation.
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18
> PM,
> > > Dong
> > > > >> Lin
> > > > >> > <
> > > > >> > > > > > > > >> > > lindong28@gmail.com
> > > > >> > > > > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > wrote:
> > > > >> > > > > > > > >> > > > > >> > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > > > >> > > documentation
> > > > >> > > > of
> > > > >> > > > > > the
> > > > >> > > > > > > > >> > > > experiment.
> > > > >> > > > > > > > >> > > > > >> > > > > >
> > > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think
> having
> > a
> > > > >> > separate
> > > > >> > > > > queue
> > > > >> > > > > > > for
> > > > >> > > > > > > > >> > > > controller
> > > > >> > > > > > > > >> > > > > >> > > requests
> > > > >> > > > > > > > >> > > > > >> > > > is
> > > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you
> > mentioned
> > > > in
> > > > >> the
> > > > >> > > > > summary
> > > > >> > > > > > > > >> section
> > > > >> > > > > > > > >> > of
> > > > >> > > > > > > > >> > > > the
> > > > >> > > > > > > > >> > > > > >> > Google
> > > > >> > > > > > > > >> > > > > >> > > > > doc,
> > > > >> > > > > > > > >> > > > > >> > > > > > controller requests are
> > > generally
> > > > >> more
> > > > >> > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.

I agree that there is no strong ordering when there are more than one
socket connections. Currently, we rely on controllerEpoch and leaderEpoch
to ensure that the receiving broker picks up the latest state for each
partition.

One potential issue with the dequeue approach is that if the queue is full,
there is no guarantee that the controller requests will be enqueued quickly.

Thanks,

Jun

On Fri, Jul 20, 2018 at 5:25 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com
> wrote:

> Yea, the correlationId is only set to 0 in the NetworkClient constructor.
> Since we reuse the same NetworkClient between Controller and the broker, a
> disconnection should not cause it to reset to 0, in which case it can be
> used to reject obsolete requests.
>
> Thanks,
>
> Mayuresh
>
> On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com> wrote:
>
> > @Dong,
> > Great example and explanation, thanks!
> >
> > @All
> > Regarding the example given by Dong, it seems even if we use a queue,
> and a
> > dedicated controller request handling thread,
> > the same result can still happen because R1_a will be sent on one
> > connection, and R1_b & R2 will be sent on a different connection,
> > and there is no ordering between different connections on the broker
> side.
> > I was discussing with Mayuresh offline, and it seems correlation id
> within
> > the same NetworkClient object is monotonically increasing and never
> reset,
> > hence a broker can leverage that to properly reject obsolete requests.
> > Thoughts?
> >
> > Thanks,
> > Lucas
> >
> > On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com> wrote:
> >
> > > Actually nvm, correlationId is reset in case of connection loss, I
> think.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com>
> > > wrote:
> > >
> > > > I agree with Dong that out-of-order processing can happen with
> having 2
> > > > separate queues as well and it can even happen today.
> > > > Can we use the correlationId in the request from the controller to
> the
> > > > broker to handle ordering ?
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > >
> > > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > >> Good point, Joel. I agree that a dedicated controller request
> handling
> > > >> thread would be a better isolation. It also solves the reordering
> > issue.
> > > >>
> > > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Good example. I think this scenario can occur in the current code
> as
> > > >> well
> > > >> > but with even lower probability given that there are other
> > > >> non-controller
> > > >> > requests interleaved. It is still sketchy though and I think a
> safer
> > > >> > approach would be separate queues and pinning controller request
> > > >> handling
> > > >> > to one handler thread.
> > > >> >
> > > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Hey Becket,
> > > >> > >
> > > >> > > I think you are right that there may be out-of-order processing.
> > > >> However,
> > > >> > > it seems that out-of-order processing may also happen even if we
> > > use a
> > > >> > > separate queue.
> > > >> > >
> > > >> > > Here is the example:
> > > >> > >
> > > >> > > - Controller sends R1 and got disconnected before receiving
> > > response.
> > > >> > Then
> > > >> > > it reconnects and sends R2. Both requests now stay in the
> > controller
> > > >> > > request queue in the order they are sent.
> > > >> > > - thread1 takes R1_a from the request queue and then thread2
> takes
> > > R2
> > > >> > from
> > > >> > > the request queue almost at the same time.
> > > >> > > - So R1_a and R2 are processed in parallel. There is chance that
> > > R2's
> > > >> > > processing is completed before R1.
> > > >> > >
> > > >> > > If out-of-order processing can happen for both approaches with
> > very
> > > >> low
> > > >> > > probability, it may not be worthwhile to add the extra queue.
> What
> > > do
> > > >> you
> > > >> > > think?
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Dong
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <
> becket.qin@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi Mayuresh/Joel,
> > > >> > > >
> > > >> > > > Using the request channel as a dequeue was bright up some time
> > ago
> > > >> when
> > > >> > > we
> > > >> > > > initially thinking of prioritizing the request. The concern
> was
> > > that
> > > >> > the
> > > >> > > > controller requests are supposed to be processed in order. If
> we
> > > can
> > > >> > > ensure
> > > >> > > > that there is one controller request in the request channel,
> the
> > > >> order
> > > >> > is
> > > >> > > > not a concern. But in cases that there are more than one
> > > controller
> > > >> > > request
> > > >> > > > inserted into the queue, the controller request order may
> change
> > > and
> > > >> > > cause
> > > >> > > > problem. For example, think about the following sequence:
> > > >> > > > 1. Controller successfully sent a request R1 to broker
> > > >> > > > 2. Broker receives R1 and put the request to the head of the
> > > request
> > > >> > > queue.
> > > >> > > > 3. Controller to broker connection failed and the controller
> > > >> > reconnected
> > > >> > > to
> > > >> > > > the broker.
> > > >> > > > 4. Controller sends a request R2 to the broker
> > > >> > > > 5. Broker receives R2 and add it to the head of the request
> > queue.
> > > >> > > > Now on the broker side, R2 will be processed before R1 is
> > > processed,
> > > >> > > which
> > > >> > > > may cause problem.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Jiangjie (Becket) Qin
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> > jjkoshy.w@gmail.com>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler
> less
> > > >> > invasive
> > > >> > > > > alternative and it should work. Jun/Becket/others, do you
> see
> > > any
> > > >> > > > pitfalls
> > > >> > > > > with this approach?
> > > >> > > > >
> > > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > > >> lucasatucla@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > @Mayuresh,
> > > >> > > > > > That's a very interesting idea that I haven't thought
> > before.
> > > >> > > > > > It seems to solve our problem at hand pretty well, and
> also
> > > >> > > > > > avoids the need to have a new size metric and capacity
> > config
> > > >> > > > > > for the controller request queue. In fact, if we were to
> > adopt
> > > >> > > > > > this design, there is no public interface change, and we
> > > >> > > > > > probably don't need a KIP.
> > > >> > > > > > Also implementation wise, it seems
> > > >> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> > > >> > > requirement
> > > >> > > > > > by supporting a capacity, and also allowing inserting at
> > both
> > > >> ends.
> > > >> > > > > >
> > > >> > > > > > My only concern is that this design is tied to the
> > coincidence
> > > >> that
> > > >> > > > > > we have two request priorities and there are two ends to a
> > > >> deque.
> > > >> > > > > > Hence by using the proposed design, it seems the network
> > layer
> > > >> is
> > > >> > > > > > more tightly coupled with upper layer logic, e.g. if we
> were
> > > to
> > > >> add
> > > >> > > > > > an extra priority level in the future for some reason, we
> > > would
> > > >> > > > probably
> > > >> > > > > > need to go back to the design of separate queues, one for
> > each
> > > >> > > priority
> > > >> > > > > > level.
> > > >> > > > > >
> > > >> > > > > > In summary, I'm ok with both designs and lean toward your
> > > >> suggested
> > > >> > > > > > approach.
> > > >> > > > > > Let's hear what others think.
> > > >> > > > > >
> > > >> > > > > > @Becket,
> > > >> > > > > > In light of Mayuresh's suggested new design, I'm answering
> > > your
> > > >> > > > question
> > > >> > > > > > only in the context
> > > >> > > > > > of the current KIP design: I think your suggestion makes
> > > sense,
> > > >> and
> > > >> > > I'm
> > > >> > > > > ok
> > > >> > > > > > with removing the capacity config and
> > > >> > > > > > just relying on the default value of 20 being sufficient
> > > enough.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Lucas
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > >> > > > > > gharatmayuresh15@gmail.com
> > > >> > > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Lucas,
> > > >> > > > > > >
> > > >> > > > > > > Seems like the main intent here is to prioritize the
> > > >> controller
> > > >> > > > request
> > > >> > > > > > > over any other requests.
> > > >> > > > > > > In that case, we can change the request queue to a
> > dequeue,
> > > >> where
> > > >> > > you
> > > >> > > > > > > always insert the normal requests (produce,
> consume,..etc)
> > > to
> > > >> the
> > > >> > > end
> > > >> > > > > of
> > > >> > > > > > > the dequeue, but if its a controller request, you insert
> > it
> > > to
> > > >> > the
> > > >> > > > head
> > > >> > > > > > of
> > > >> > > > > > > the queue. This ensures that the controller request will
> > be
> > > >> given
> > > >> > > > > higher
> > > >> > > > > > > priority over other requests.
> > > >> > > > > > >
> > > >> > > > > > > Also since we only read one request from the socket and
> > mute
> > > >> it
> > > >> > and
> > > >> > > > > only
> > > >> > > > > > > unmute it after handling the request, this would ensure
> > that
> > > >> we
> > > >> > > don't
> > > >> > > > > > > handle controller requests out of order.
> > > >> > > > > > >
> > > >> > > > > > > With this approach we can avoid the second queue and the
> > > >> > additional
> > > >> > > > > > config
> > > >> > > > > > > for the size of the queue.
> > > >> > > > > > >
> > > >> > > > > > > What do you think ?
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Mayuresh
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > > >> becket.qin@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hey Joel,
> > > >> > > > > > > >
> > > >> > > > > > > > Thank for the detail explanation. I agree the current
> > > design
> > > >> > > makes
> > > >> > > > > > sense.
> > > >> > > > > > > > My confusion is about whether the new config for the
> > > >> controller
> > > >> > > > queue
> > > >> > > > > > > > capacity is necessary. I cannot think of a case in
> which
> > > >> users
> > > >> > > > would
> > > >> > > > > > > change
> > > >> > > > > > > > it.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > >
> > > >> > > > > > > > Jiangjie (Becket) Qin
> > > >> > > > > > > >
> > > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > >> > > becket.qin@gmail.com>
> > > >> > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Lucas,
> > > >> > > > > > > > >
> > > >> > > > > > > > > I guess my question can be rephrased to "do we
> expect
> > > >> user to
> > > >> > > > ever
> > > >> > > > > > > change
> > > >> > > > > > > > > the controller request queue capacity"? If we agree
> > that
> > > >> 20
> > > >> > is
> > > >> > > > > > already
> > > >> > > > > > > a
> > > >> > > > > > > > > very generous default number and we do not expect
> user
> > > to
> > > >> > > change
> > > >> > > > > it,
> > > >> > > > > > is
> > > >> > > > > > > > it
> > > >> > > > > > > > > still necessary to expose this as a config?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Jiangjie (Becket) Qin
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > >> > > > lucasatucla@gmail.com
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> @Becket
> > > >> > > > > > > > >> 1. Thanks for the comment. You are right that
> > normally
> > > >> there
> > > >> > > > > should
> > > >> > > > > > be
> > > >> > > > > > > > >> just
> > > >> > > > > > > > >> one controller request because of muting,
> > > >> > > > > > > > >> and I had NOT intended to say there would be many
> > > >> enqueued
> > > >> > > > > > controller
> > > >> > > > > > > > >> requests.
> > > >> > > > > > > > >> I went through the KIP again, and I'm not sure
> which
> > > part
> > > >> > > > conveys
> > > >> > > > > > that
> > > >> > > > > > > > >> info.
> > > >> > > > > > > > >> I'd be happy to revise if you point it out the
> > section.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> 2. Though it should not happen in normal
> conditions,
> > > the
> > > >> > > current
> > > >> > > > > > > design
> > > >> > > > > > > > >> does not preclude multiple controllers running
> > > >> > > > > > > > >> at the same time, hence if we don't have the
> > controller
> > > >> > queue
> > > >> > > > > > capacity
> > > >> > > > > > > > >> config and simply make its capacity to be 1,
> > > >> > > > > > > > >> network threads handling requests from different
> > > >> controllers
> > > >> > > > will
> > > >> > > > > be
> > > >> > > > > > > > >> blocked during those troublesome times,
> > > >> > > > > > > > >> which is probably not what we want. On the other
> > hand,
> > > >> > adding
> > > >> > > > the
> > > >> > > > > > > extra
> > > >> > > > > > > > >> config with a default value, say 20, guards us from
> > > >> issues
> > > >> > in
> > > >> > > > > those
> > > >> > > > > > > > >> troublesome times, and IMO there isn't much
> downside
> > of
> > > >> > adding
> > > >> > > > the
> > > >> > > > > > > extra
> > > >> > > > > > > > >> config.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> @Mayuresh
> > > >> > > > > > > > >> Good catch, this sentence is an obsolete statement
> > > based
> > > >> on
> > > >> > a
> > > >> > > > > > previous
> > > >> > > > > > > > >> design. I've revised the wording in the KIP.
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> Thanks,
> > > >> > > > > > > > >> Lucas
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> > Hi Lucas,
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > Thanks for the KIP.
> > > >> > > > > > > > >> > I am trying to understand why you think "The
> memory
> > > >> > > > consumption
> > > >> > > > > > can
> > > >> > > > > > > > rise
> > > >> > > > > > > > >> > given the total number of queued requests can go
> up
> > > to
> > > >> 2x"
> > > >> > > in
> > > >> > > > > the
> > > >> > > > > > > > impact
> > > >> > > > > > > > >> > section. Normally the requests from controller
> to a
> > > >> Broker
> > > >> > > are
> > > >> > > > > not
> > > >> > > > > > > > high
> > > >> > > > > > > > >> > volume, right ?
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > Thanks,
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > Mayuresh
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > >> > > > > becket.qin@gmail.com>
> > > >> > > > > > > > >> wrote:
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the
> control
> > > >> plane
> > > >> > > from
> > > >> > > > > the
> > > >> > > > > > > > data
> > > >> > > > > > > > >> > plane
> > > >> > > > > > > > >> > > makes a lot of sense.
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > In the KIP you mentioned that the controller
> > > request
> > > >> > queue
> > > >> > > > may
> > > >> > > > > > > have
> > > >> > > > > > > > >> many
> > > >> > > > > > > > >> > > requests in it. Will this be a common case? The
> > > >> > controller
> > > >> > > > > > > requests
> > > >> > > > > > > > >> still
> > > >> > > > > > > > >> > > goes through the SocketServer. The SocketServer
> > > will
> > > >> > mute
> > > >> > > > the
> > > >> > > > > > > > channel
> > > >> > > > > > > > >> > once
> > > >> > > > > > > > >> > > a request is read and put into the request
> > channel.
> > > >> So
> > > >> > > > > assuming
> > > >> > > > > > > > there
> > > >> > > > > > > > >> is
> > > >> > > > > > > > >> > > only one connection between controller and each
> > > >> broker,
> > > >> > on
> > > >> > > > the
> > > >> > > > > > > > broker
> > > >> > > > > > > > >> > side,
> > > >> > > > > > > > >> > > there should be only one controller request in
> > the
> > > >> > > > controller
> > > >> > > > > > > > request
> > > >> > > > > > > > >> > queue
> > > >> > > > > > > > >> > > at any given time. If that is the case, do we
> > need
> > > a
> > > >> > > > separate
> > > >> > > > > > > > >> controller
> > > >> > > > > > > > >> > > request queue capacity config? The default
> value
> > 20
> > > >> > means
> > > >> > > > that
> > > >> > > > > > we
> > > >> > > > > > > > >> expect
> > > >> > > > > > > > >> > > there are 20 controller switches to happen in a
> > > short
> > > >> > > period
> > > >> > > > > of
> > > >> > > > > > > > time.
> > > >> > > > > > > > >> I
> > > >> > > > > > > > >> > am
> > > >> > > > > > > > >> > > not sure whether someone should increase the
> > > >> controller
> > > >> > > > > request
> > > >> > > > > > > > queue
> > > >> > > > > > > > >> > > capacity to handle such case, as it seems
> > > indicating
> > > >> > > > something
> > > >> > > > > > > very
> > > >> > > > > > > > >> wrong
> > > >> > > > > > > > >> > > has happened.
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > Thanks,
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > >> > > > > lindong28@gmail.com>
> > > >> > > > > > > > >> wrote:
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > > Thanks for the update Lucas.
> > > >> > > > > > > > >> > > >
> > > >> > > > > > > > >> > > > I think the motivation section is intuitive.
> It
> > > >> will
> > > >> > be
> > > >> > > > good
> > > >> > > > > > to
> > > >> > > > > > > > >> learn
> > > >> > > > > > > > >> > > more
> > > >> > > > > > > > >> > > > about the comments from other reviewers.
> > > >> > > > > > > > >> > > >
> > > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > >> > > > > > > > lucasatucla@gmail.com>
> > > >> > > > > > > > >> > > wrote:
> > > >> > > > > > > > >> > > >
> > > >> > > > > > > > >> > > > > Hi Dong,
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > I've updated the motivation section of the
> > KIP
> > > by
> > > >> > > > > explaining
> > > >> > > > > > > the
> > > >> > > > > > > > >> > cases
> > > >> > > > > > > > >> > > > that
> > > >> > > > > > > > >> > > > > would have user impacts.
> > > >> > > > > > > > >> > > > > Please take a look at let me know your
> > > comments.
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > Thanks,
> > > >> > > > > > > > >> > > > > Lucas
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang
> <
> > > >> > > > > > > > lucasatucla@gmail.com
> > > >> > > > > > > > >> >
> > > >> > > > > > > > >> > > > wrote:
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > > Hi Dong,
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > The simulation of disk being slow is
> merely
> > > >> for me
> > > >> > > to
> > > >> > > > > > easily
> > > >> > > > > > > > >> > > construct
> > > >> > > > > > > > >> > > > a
> > > >> > > > > > > > >> > > > > > testing scenario
> > > >> > > > > > > > >> > > > > > with a backlog of produce requests. In
> > > >> production,
> > > >> > > > other
> > > >> > > > > > > than
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > disk
> > > >> > > > > > > > >> > > > > > being slow, a backlog of
> > > >> > > > > > > > >> > > > > > produce requests may also be caused by
> high
> > > >> > produce
> > > >> > > > QPS.
> > > >> > > > > > > > >> > > > > > In that case, we may not want to kill the
> > > >> broker
> > > >> > and
> > > >> > > > > > that's
> > > >> > > > > > > > when
> > > >> > > > > > > > >> > this
> > > >> > > > > > > > >> > > > KIP
> > > >> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > >> > > > > > > > >> > > > > > and non-JBOD setup.
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > Going back to your previous question
> about
> > > each
> > > >> > > > > > > ProduceRequest
> > > >> > > > > > > > >> > > covering
> > > >> > > > > > > > >> > > > > 20
> > > >> > > > > > > > >> > > > > > partitions that are randomly
> > > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr
> > request
> > > >> is
> > > >> > > > > enqueued
> > > >> > > > > > > that
> > > >> > > > > > > > >> > tries
> > > >> > > > > > > > >> > > to
> > > >> > > > > > > > >> > > > > > switch the current broker, say broker0,
> > from
> > > >> > leader
> > > >> > > to
> > > >> > > > > > > > follower
> > > >> > > > > > > > >> > > > > > *for one of the partitions*, say
> *test-0*.
> > > For
> > > >> the
> > > >> > > > sake
> > > >> > > > > of
> > > >> > > > > > > > >> > argument,
> > > >> > > > > > > > >> > > > > > let's also assume the other brokers, say
> > > >> broker1,
> > > >> > > have
> > > >> > > > > > > > *stopped*
> > > >> > > > > > > > >> > > > fetching
> > > >> > > > > > > > >> > > > > > from
> > > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have
> > > acks =
> > > >> > -1
> > > >> > > > > (ALL)
> > > >> > > > > > > > >> > > > > >   1.1 without this KIP, the
> ProduceRequests
> > > >> ahead
> > > >> > of
> > > >> > > > > > > > >> LeaderAndISR
> > > >> > > > > > > > >> > > will
> > > >> > > > > > > > >> > > > be
> > > >> > > > > > > > >> > > > > > put into the purgatory,
> > > >> > > > > > > > >> > > > > >         and since they'll never be
> > replicated
> > > >> to
> > > >> > > other
> > > >> > > > > > > brokers
> > > >> > > > > > > > >> > > (because
> > > >> > > > > > > > >> > > > > of
> > > >> > > > > > > > >> > > > > > the assumption made above), they will
> > > >> > > > > > > > >> > > > > >         be completed either when the
> > > >> LeaderAndISR
> > > >> > > > > request
> > > >> > > > > > is
> > > >> > > > > > > > >> > > processed
> > > >> > > > > > > > >> > > > or
> > > >> > > > > > > > >> > > > > > when the timeout happens.
> > > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
> > immediately
> > > >> > > > transition
> > > >> > > > > > the
> > > >> > > > > > > > >> > > partition
> > > >> > > > > > > > >> > > > > > test-0 to become a follower,
> > > >> > > > > > > > >> > > > > >         after the current broker sees the
> > > >> > > replication
> > > >> > > > of
> > > >> > > > > > the
> > > >> > > > > > > > >> > > remaining
> > > >> > > > > > > > >> > > > 19
> > > >> > > > > > > > >> > > > > > partitions, it can send a response
> > indicating
> > > >> that
> > > >> > > > > > > > >> > > > > >         it's no longer the leader for the
> > > >> > "test-0".
> > > >> > > > > > > > >> > > > > >   To see the latency difference between
> 1.1
> > > and
> > > >> > 1.2,
> > > >> > > > > let's
> > > >> > > > > > > say
> > > >> > > > > > > > >> > there
> > > >> > > > > > > > >> > > > are
> > > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> > > LeaderAndISR,
> > > >> > and
> > > >> > > > > there
> > > >> > > > > > > are
> > > >> > > > > > > > 8
> > > >> > > > > > > > >> io
> > > >> > > > > > > > >> > > > > threads,
> > > >> > > > > > > > >> > > > > >   so each io thread will process
> > > approximately
> > > >> > 3000
> > > >> > > > > > produce
> > > >> > > > > > > > >> > requests.
> > > >> > > > > > > > >> > > > Now
> > > >> > > > > > > > >> > > > > > let's investigate the io thread that
> > finally
> > > >> > > processed
> > > >> > > > > the
> > > >> > > > > > > > >> > > > LeaderAndISR.
> > > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if we
> > model
> > > >> the
> > > >> > > time
> > > >> > > > > when
> > > >> > > > > > > > their
> > > >> > > > > > > > >> > > > > remaining
> > > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1,
> ...t2999,
> > > and
> > > >> > the
> > > >> > > > > > > > LeaderAndISR
> > > >> > > > > > > > >> > > > request
> > > >> > > > > > > > >> > > > > is
> > > >> > > > > > > > >> > > > > > processed at time t3000.
> > > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce
> request
> > > >> would
> > > >> > > have
> > > >> > > > > > > waited
> > > >> > > > > > > > an
> > > >> > > > > > > > >> > > extra
> > > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd
> > an
> > > >> extra
> > > >> > > > time
> > > >> > > > > of
> > > >> > > > > > > > >> t3000 -
> > > >> > > > > > > > >> > > t1,
> > > >> > > > > > > > >> > > > > etc.
> > > >> > > > > > > > >> > > > > >   Roughly speaking, the latency
> difference
> > is
> > > >> > bigger
> > > >> > > > for
> > > >> > > > > > the
> > > >> > > > > > > > >> > earlier
> > > >> > > > > > > > >> > > > > > produce requests than for the later ones.
> > For
> > > >> the
> > > >> > > same
> > > >> > > > > > > reason,
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > more
> > > >> > > > > > > > >> > > > > > ProduceRequests queued
> > > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger
> > benefit
> > > >> we
> > > >> > get
> > > >> > > > > > (capped
> > > >> > > > > > > > by
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > > > > produce timeout).
> > > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
> > > >> acks=0 or
> > > >> > > > > acks=1
> > > >> > > > > > > > >> > > > > >   There will be no latency differences in
> > > this
> > > >> > case,
> > > >> > > > but
> > > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> > > >> partition
> > > >> > > > test-0
> > > >> > > > > in
> > > >> > > > > > > the
> > > >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR
> > > will
> > > >> be
> > > >> > > > > appended
> > > >> > > > > > > to
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > local
> > > >> > > > > > > > >> > > > > log,
> > > >> > > > > > > > >> > > > > >         and eventually be truncated after
> > > >> > processing
> > > >> > > > the
> > > >> > > > > > > > >> > > LeaderAndISR.
> > > >> > > > > > > > >> > > > > > This is what's referred to as
> > > >> > > > > > > > >> > > > > >         "some unofficial definition of
> data
> > > >> loss
> > > >> > in
> > > >> > > > > terms
> > > >> > > > > > of
> > > >> > > > > > > > >> > messages
> > > >> > > > > > > > >> > > > > > beyond the high watermark".
> > > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the
> > > effect
> > > >> > > since
> > > >> > > > if
> > > >> > > > > > the
> > > >> > > > > > > > >> > > > LeaderAndISR
> > > >> > > > > > > > >> > > > > > is immediately processed, the response to
> > > >> > producers
> > > >> > > > will
> > > >> > > > > > > have
> > > >> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
> > > >> causing
> > > >> > > > > producers
> > > >> > > > > > > to
> > > >> > > > > > > > >> retry
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > This explanation above is the benefit for
> > > >> reducing
> > > >> > > the
> > > >> > > > > > > latency
> > > >> > > > > > > > >> of a
> > > >> > > > > > > > >> > > > > broker
> > > >> > > > > > > > >> > > > > > becoming the follower,
> > > >> > > > > > > > >> > > > > > closely related is reducing the latency
> of
> > a
> > > >> > broker
> > > >> > > > > > becoming
> > > >> > > > > > > > the
> > > >> > > > > > > > >> > > > leader.
> > > >> > > > > > > > >> > > > > > In this case, the benefit is even more
> > > >> obvious, if
> > > >> > > > other
> > > >> > > > > > > > brokers
> > > >> > > > > > > > >> > have
> > > >> > > > > > > > >> > > > > > resigned leadership, and the
> > > >> > > > > > > > >> > > > > > current broker should take leadership.
> Any
> > > >> delay
> > > >> > in
> > > >> > > > > > > processing
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > >> > > > > > > > >> > > > > > by clients as unavailability. In extreme
> > > cases,
> > > >> > this
> > > >> > > > can
> > > >> > > > > > > cause
> > > >> > > > > > > > >> > failed
> > > >> > > > > > > > >> > > > > > produce requests if the retries are
> > > >> > > > > > > > >> > > > > > exhausted.
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > Another two types of controller requests
> > are
> > > >> > > > > > UpdateMetadata
> > > >> > > > > > > > and
> > > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss
> as
> > > >> > follows:
> > > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> > > processing
> > > >> > > means
> > > >> > > > > > > clients
> > > >> > > > > > > > >> > > receiving
> > > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> > > leadership
> > > >> > info
> > > >> > > > > > > > >> > > > > > for certain partitions, and the effect is
> > > more
> > > >> > > retries
> > > >> > > > > or
> > > >> > > > > > > even
> > > >> > > > > > > > >> > fatal
> > > >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing
> > time
> > > >> may
> > > >> > > > > degrade
> > > >> > > > > > > the
> > > >> > > > > > > > >> > > > performance
> > > >> > > > > > > > >> > > > > > of topic deletion.
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > Regarding your last question of the delay
> > for
> > > >> > > > > > > > >> > DescribeLogDirsRequest,
> > > >> > > > > > > > >> > > > you
> > > >> > > > > > > > >> > > > > > are right
> > > >> > > > > > > > >> > > > > > that this KIP cannot help with the
> latency
> > in
> > > >> > > getting
> > > >> > > > > the
> > > >> > > > > > > log
> > > >> > > > > > > > >> dirs
> > > >> > > > > > > > >> > > > info,
> > > >> > > > > > > > >> > > > > > and it's only relevant
> > > >> > > > > > > > >> > > > > > when controller requests are involved.
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > Regards,
> > > >> > > > > > > > >> > > > > > Lucas
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin
> <
> > > >> > > > > > > lindong28@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > >> > > wrote:
> > > >> > > > > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > >> Hey Jun,
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
> > > >> point.
> > > >> > So
> > > >> > > > the
> > > >> > > > > > > > feature
> > > >> > > > > > > > >> may
> > > >> > > > > > > > >> > > be
> > > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> > > question
> > > >> > > below.
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> Hey Lucas,
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> Do you think this feature is also useful
> > for
> > > >> > > non-JBOD
> > > >> > > > > > setup
> > > >> > > > > > > > or
> > > >> > > > > > > > >> it
> > > >> > > > > > > > >> > is
> > > >> > > > > > > > >> > > > > only
> > > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be
> > useful
> > > to
> > > >> > > > > understand
> > > >> > > > > > > > this.
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in
> > > order
> > > >> to
> > > >> > > move
> > > >> > > > > > > leaders
> > > >> > > > > > > > >> on
> > > >> > > > > > > > >> > the
> > > >> > > > > > > > >> > > > > >> failed
> > > >> > > > > > > > >> > > > > >> disk to other disks, the system operator
> > > first
> > > >> > > needs
> > > >> > > > to
> > > >> > > > > > get
> > > >> > > > > > > > the
> > > >> > > > > > > > >> > list
> > > >> > > > > > > > >> > > > of
> > > >> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> > > >> currently
> > > >> > > > > achieved
> > > >> > > > > > > > using
> > > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which
> sends
> > > >> > > > > > > > >> DescribeLogDirsRequest
> > > >> > > > > > > > >> > to
> > > >> > > > > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
> > controller
> > > >> > > > requests,
> > > >> > > > > > then
> > > >> > > > > > > > the
> > > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > >> > > > > > > > >> > > > > >> may still take a long time to be
> processed
> > > by
> > > >> the
> > > >> > > > > broker.
> > > >> > > > > > > So
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > > overall
> > > >> > > > > > > > >> > > > > >> time to move leaders away from the
> failed
> > > disk
> > > >> > may
> > > >> > > > > still
> > > >> > > > > > be
> > > >> > > > > > > > >> long
> > > >> > > > > > > > >> > > even
> > > >> > > > > > > > >> > > > > with
> > > >> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> Thanks,
> > > >> > > > > > > > >> > > > > >> Dong
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas
> > Wang <
> > > >> > > > > > > > >> lucasatucla@gmail.com
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > > > wrote:
> > > >> > > > > > > > >> > > > > >>
> > > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment,
> Jun.
> > > >> > > > > > > > >> > > > > >> >
> > > >> > > > > > > > >> > > > > >> > @Dong,
> > > >> > > > > > > > >> > > > > >> > Since both of the two comments in your
> > > >> previous
> > > >> > > > email
> > > >> > > > > > are
> > > >> > > > > > > > >> about
> > > >> > > > > > > > >> > > the
> > > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
> > > >> useful,
> > > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you
> > > agree
> > > >> > that
> > > >> > > > > this
> > > >> > > > > > > KIP
> > > >> > > > > > > > >> can
> > > >> > > > > > > > >> > be
> > > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by
> Jun?
> > > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > >> > > > > > > > >> > > > > >> >
> > > >> > > > > > > > >> > > > > >> > Regards,
> > > >> > > > > > > > >> > > > > >> > Lucas
> > > >> > > > > > > > >> > > > > >> >
> > > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun
> Rao
> > <
> > > >> > > > > > > jun@confluent.io>
> > > >> > > > > > > > >> > wrote:
> > > >> > > > > > > > >> > > > > >> >
> > > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > >> > > > > > > > >> > > > > >> > >
> > > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow,
> one
> > > >> > probably
> > > >> > > > > > should
> > > >> > > > > > > > just
> > > >> > > > > > > > >> > kill
> > > >> > > > > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may
> not
> > > >> help.
> > > >> > If
> > > >> > > > > only
> > > >> > > > > > > one
> > > >> > > > > > > > of
> > > >> > > > > > > > >> > the
> > > >> > > > > > > > >> > > > > disks
> > > >> > > > > > > > >> > > > > >> on
> > > >> > > > > > > > >> > > > > >> > a
> > > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail
> > > that
> > > >> > disk
> > > >> > > > and
> > > >> > > > > > move
> > > >> > > > > > > > the
> > > >> > > > > > > > >> > > > leaders
> > > >> > > > > > > > >> > > > > on
> > > >> > > > > > > > >> > > > > >> > that
> > > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case,
> > > being
> > > >> > able
> > > >> > > > to
> > > >> > > > > > > > process
> > > >> > > > > > > > >> the
> > > >> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > >> > > > > > > > >> > > > > >> > > requests faster will potentially
> help
> > > the
> > > >> > > > producers
> > > >> > > > > > > > recover
> > > >> > > > > > > > >> > > > quicker.
> > > >> > > > > > > > >> > > > > >> > >
> > > >> > > > > > > > >> > > > > >> > > Thanks,
> > > >> > > > > > > > >> > > > > >> > >
> > > >> > > > > > > > >> > > > > >> > > Jun
> > > >> > > > > > > > >> > > > > >> > >
> > > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong
> > > Lin <
> > > >> > > > > > > > >> lindong28@gmail.com
> > > >> > > > > > > > >> > >
> > > >> > > > > > > > >> > > > > wrote:
> > > >> > > > > > > > >> > > > > >> > >
> > > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow
> up
> > > >> > > questions
> > > >> > > > > > below.
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each
> ProduceRequest
> > > >> covers
> > > >> > 20
> > > >> > > > > > > > partitions
> > > >> > > > > > > > >> > that
> > > >> > > > > > > > >> > > > are
> > > >> > > > > > > > >> > > > > >> > > randomly
> > > >> > > > > > > > >> > > > > >> > > > distributed across all partitions,
> > > then
> > > >> > each
> > > >> > > > > > > > >> ProduceRequest
> > > >> > > > > > > > >> > > will
> > > >> > > > > > > > >> > > > > >> likely
> > > >> > > > > > > > >> > > > > >> > > > cover some partitions for which
> the
> > > >> broker
> > > >> > is
> > > >> > > > > still
> > > >> > > > > > > > >> leader
> > > >> > > > > > > > >> > > after
> > > >> > > > > > > > >> > > > > it
> > > >> > > > > > > > >> > > > > >> > > quickly
> > > >> > > > > > > > >> > > > > >> > > > processes the
> > > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker
> > will
> > > >> still
> > > >> > > be
> > > >> > > > > slow
> > > >> > > > > > > in
> > > >> > > > > > > > >> > > > processing
> > > >> > > > > > > > >> > > > > >> these
> > > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will
> > still
> > > be
> > > >> > very
> > > >> > > > > high
> > > >> > > > > > > with
> > > >> > > > > > > > >> this
> > > >> > > > > > > > >> > > > KIP.
> > > >> > > > > > > > >> > > > > It
> > > >> > > > > > > > >> > > > > >> > > seems
> > > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will
> still
> > > >> timeout
> > > >> > > > after
> > > >> > > > > > 30
> > > >> > > > > > > > >> > seconds.
> > > >> > > > > > > > >> > > Is
> > > >> > > > > > > > >> > > > > >> this
> > > >> > > > > > > > >> > > > > >> > > > understanding correct?
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most
> ProduceRequest
> > > will
> > > >> > > still
> > > >> > > > > > > timeout
> > > >> > > > > > > > >> after
> > > >> > > > > > > > >> > > 30
> > > >> > > > > > > > >> > > > > >> > seconds,
> > > >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> > > >> reduces
> > > >> > > > > average
> > > >> > > > > > > > >> produce
> > > >> > > > > > > > >> > > > > latency.
> > > >> > > > > > > > >> > > > > >> Can
> > > >> > > > > > > > >> > > > > >> > > you
> > > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be
> improved
> > > by
> > > >> > this
> > > >> > > > KIP?
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
> > directly
> > > >> cares
> > > >> > > > > number
> > > >> > > > > > of
> > > >> > > > > > > > >> > > truncated
> > > >> > > > > > > > >> > > > > >> > messages.
> > > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> > > average
> > > >> > > > > throughput
> > > >> > > > > > > or
> > > >> > > > > > > > >> > reduce
> > > >> > > > > > > > >> > > > > >> message
> > > >> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> > > >> understand
> > > >> > > > this.
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > Thanks,
> > > >> > > > > > > > >> > > > > >> > > > Dong
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM
> Lucas
> > > >> Wang <
> > > >> > > > > > > > >> > > lucasatucla@gmail.com
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > wrote:
> > > >> > > > > > > > >> > > > > >> > > >
> > > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable
> comments.
> > > >> Please
> > > >> > > see
> > > >> > > > > my
> > > >> > > > > > > > reply
> > > >> > > > > > > > >> > > below.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> > > >> > partition.
> > > >> > > > Now
> > > >> > > > > > > let's
> > > >> > > > > > > > >> > > consider
> > > >> > > > > > > > >> > > > a
> > > >> > > > > > > > >> > > > > >> more
> > > >> > > > > > > > >> > > > > >> > > > common
> > > >> > > > > > > > >> > > > > >> > > > > scenario
> > > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of
> > many
> > > >> > > > partitions.
> > > >> > > > > > And
> > > >> > > > > > > > >> let's
> > > >> > > > > > > > >> > > say
> > > >> > > > > > > > >> > > > > for
> > > >> > > > > > > > >> > > > > >> > some
> > > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > >> > > > > > > > >> > > > > >> > > > > The number of leader partitions
> on
> > > >> > broker0
> > > >> > > is
> > > >> > > > > so
> > > >> > > > > > > > large,
> > > >> > > > > > > > >> > say
> > > >> > > > > > > > >> > > > 10K,
> > > >> > > > > > > > >> > > > > >> that
> > > >> > > > > > > > >> > > > > >> > > the
> > > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > >> > > > > > > > >> > > > > >> > > > > and the operator would like to
> > shift
> > > >> the
> > > >> > > > > > leadership
> > > >> > > > > > > > >> for a
> > > >> > > > > > > > >> > > lot
> > > >> > > > > > > > >> > > > of
> > > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> > > brokers,
> > > >> > > > > > > > >> > > > > >> > > > > either manually or through some
> > > >> service
> > > >> > > like
> > > >> > > > > > cruise
> > > >> > > > > > > > >> > control.
> > > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> > > >> > leadership
> > > >> > > > > > > > transitions
> > > >> > > > > > > > >> > > finish
> > > >> > > > > > > > >> > > > > >> more
> > > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster
> > itself
> > > >> > > becoming
> > > >> > > > > more
> > > >> > > > > > > > >> > balanced,
> > > >> > > > > > > > >> > > > > >> > > > > but all existing producers
> > > >> corresponding
> > > >> > to
> > > >> > > > the
> > > >> > > > > > 9K
> > > >> > > > > > > > >> > > partitions
> > > >> > > > > > > > >> > > > > will
> > > >> > > > > > > > >> > > > > >> > get
> > > >> > > > > > > > >> > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> > > timeout,
> > > >> > > thanks
> > > >> > > > to
> > > >> > > > > > the
> > > >> > > > > > > > >> > batched
> > > >> > > > > > > > >> > > > > async
> > > >> > > > > > > > >> > > > > >> ZK
> > > >> > > > > > > > >> > > > > >> > > > > operations.
> > > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to
> > have
> > > >> > during
> > > >> > > > such
> > > >> > > > > > > > >> > troublesome
> > > >> > > > > > > > >> > > > > times.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google
> > Doc
> > > >> have
> > > >> > > > shown
> > > >> > > > > > > that
> > > >> > > > > > > > >> with
> > > >> > > > > > > > >> > > this
> > > >> > > > > > > > >> > > > > KIP
> > > >> > > > > > > > >> > > > > >> > many
> > > >> > > > > > > > >> > > > > >> > > > > producers
> > > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > >> > > > > NotLeaderForPartition,
> > > >> > > > > > > > based
> > > >> > > > > > > > >> on
> > > >> > > > > > > > >> > > > which
> > > >> > > > > > > > >> > > > > >> they
> > > >> > > > > > > > >> > > > > >> > > > retry
> > > >> > > > > > > > >> > > > > >> > > > > immediately.
> > > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > > >> seconds+quick
> > > >> > > > retry)
> > > >> > > > > > for
> > > >> > > > > > > > >> their
> > > >> > > > > > > > >> > > > single
> > > >> > > > > > > > >> > > > > >> > > message
> > > >> > > > > > > > >> > > > > >> > > > is
> > > >> > > > > > > > >> > > > > >> > > > > much smaller
> > > >> > > > > > > > >> > > > > >> > > > > compared with the case of timing
> > out
> > > >> > > without
> > > >> > > > > the
> > > >> > > > > > > KIP
> > > >> > > > > > > > >> (30
> > > >> > > > > > > > >> > > > seconds
> > > >> > > > > > > > >> > > > > >> for
> > > >> > > > > > > > >> > > > > >> > > > timing
> > > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > >> > > > > > > > >> > > > > >> > > > > One might argue that reducing
> the
> > > >> timing
> > > >> > > out
> > > >> > > > on
> > > >> > > > > > the
> > > >> > > > > > > > >> > producer
> > > >> > > > > > > > >> > > > > side
> > > >> > > > > > > > >> > > > > >> can
> > > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its
> > own
> > > >> > > > > > drawbacks[1].
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to
> > > show
> > > >> the
> > > >> > > > > number
> > > >> > > > > > of
> > > >> > > > > > > > >> > > truncated
> > > >> > > > > > > > >> > > > > >> > messages
> > > >> > > > > > > > >> > > > > >> > > on
> > > >> > > > > > > > >> > > > > >> > > > > brokers,
> > > >> > > > > > > > >> > > > > >> > > > > with the experiments done in the
> > > >> Google
> > > >> > > Doc,
> > > >> > > > it
> > > >> > > > > > > > should
> > > >> > > > > > > > >> be
> > > >> > > > > > > > >> > > easy
> > > >> > > > > > > > >> > > > > to
> > > >> > > > > > > > >> > > > > >> see
> > > >> > > > > > > > >> > > > > >> > > > that
> > > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since
> > the
> > > >> > > > up-to-date
> > > >> > > > > > > > >> metadata
> > > >> > > > > > > > >> > > > avoids
> > > >> > > > > > > > >> > > > > >> > > appending
> > > >> > > > > > > > >> > > > > >> > > > > of messages
> > > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests.
> If
> > > we
> > > >> > talk
> > > >> > > > to a
> > > >> > > > > > > > system
> > > >> > > > > > > > >> > > > operator
> > > >> > > > > > > > >> > > > > >> and
> > > >> > > > > > > > >> > > > > >> > ask
> > > >> > > > > > > > >> > > > > >> > > > > whether
> > > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs,
> I
> > > bet
> > > >> > most
> > > >> > > > > likely
> > > >> > > > > > > the
> > > >> > > > > > > > >> > answer
> > > >> > > > > > > > >> > > > is
> > > >> > > > > > > > >> > > > > >> yes.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I
> > think
> > > it
> > > >> > > might
> > > >> > > > be
> > > >> > > > > > > > >> helpful to
> > > >> > > > > > > > >> > > > > >> construct
> > > >> > > > > > > > >> > > > > >> > > some
> > > >> > > > > > > > >> > > > > >> > > > > formulas.
> > > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm
> > going
> > > >> back
> > > >> > to
> > > >> > > > the
> > > >> > > > > > > case
> > > >> > > > > > > > >> where
> > > >> > > > > > > > >> > > > there
> > > >> > > > > > > > >> > > > > >> is
> > > >> > > > > > > > >> > > > > >> > > only
> > > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > >> > > > > > > > >> > > > > >> > > > > Following the experiments in the
> > > >> Google
> > > >> > > Doc,
> > > >> > > > > > let's
> > > >> > > > > > > > say
> > > >> > > > > > > > >> > > broker0
> > > >> > > > > > > > >> > > > > >> > becomes
> > > >> > > > > > > > >> > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
> > > >> produce
> > > >> > > > > requests
> > > >> > > > > > in
> > > >> > > > > > > > its
> > > >> > > > > > > > >> > > > request
> > > >> > > > > > > > >> > > > > >> > queue.
> > > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata
> > brought
> > > >> by
> > > >> > > this
> > > >> > > > > KIP,
> > > >> > > > > > > > >> broker0
> > > >> > > > > > > > >> > > can
> > > >> > > > > > > > >> > > > > >> reply
> > > >> > > > > > > > >> > > > > >> > > with
> > > >> > > > > > > > >> > > > > >> > > > an
> > > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the
> average
> > > >> > > processing
> > > >> > > > > > time
> > > >> > > > > > > of
> > > >> > > > > > > > >> > > replying
> > > >> > > > > > > > >> > > > > >> with
> > > >> > > > > > > > >> > > > > >> > > such
> > > >> > > > > > > > >> > > > > >> > > > an
> > > >> > > > > > > > >> > > > > >> > > > > error message.
> > > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker
> will
> > > >> need to
> > > >> > > > > append
> > > >> > > > > > > > >> messages
> > > >> > > > > > > > >> > to
> > > >> > > > > > > > >> > > > > >> > segments,
> > > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to
> disk,
> > > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the
> average
> > > >> > > processing
> > > >> > > > > > time
> > > >> > > > > > > > for
> > > >> > > > > > > > >> > such
> > > >> > > > > > > > >> > > > > logic.
> > > >> > > > > > > > >> > > > > >> > > > > Then the average extra latency
> > > >> incurred
> > > >> > > > without
> > > >> > > > > > > this
> > > >> > > > > > > > >> KIP
> > > >> > > > > > > > >> > is
> > > >> > > > > > > > >> > > N
> > > >> > > > > > > > >> > > > *
> > > >> > > > > > > > >> > > > > >> (M2 -
> > > >> > > > > > > > >> > > > > >> > > > M1) /
> > > >> > > > > > > > >> > > > > >> > > > > 2.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
> > > >> larger
> > > >> > > than
> > > >> > > > > M1,
> > > >> > > > > > > > which
> > > >> > > > > > > > >> > means
> > > >> > > > > > > > >> > > > as
> > > >> > > > > > > > >> > > > > >> long
> > > >> > > > > > > > >> > > > > >> > > as N
> > > >> > > > > > > > >> > > > > >> > > > > is positive,
> > > >> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> > > >> average
> > > >> > > > > latency.
> > > >> > > > > > > > >> > > > > >> > > > > There does not need to be
> > > significant
> > > >> > > backlog
> > > >> > > > > of
> > > >> > > > > > > > >> requests
> > > >> > > > > > > > >> > in
> > > >> > > > > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> > > request
> > > >> > > > > > > > >> > > > > >> > > > > queue,
> > > >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> > > >> performance
> > > >> > > to
> > > >> > > > > have
> > > >> > > > > > > the
> > > >> > > > > > > > >> > > > > improvement.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > Regards,
> > > >> > > > > > > > >> > > > > >> > > > > Lucas
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> > > >> timeout on
> > > >> > > the
> > > >> > > > > > > > producer
> > > >> > > > > > > > >> > side
> > > >> > > > > > > > >> > > > can
> > > >> > > > > > > > >> > > > > >> > trigger
> > > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > >> > > > > > > > >> > > > > >> > > > > when the corresponding leader
> > broker
> > > >> is
> > > >> > > > > > overloaded,
> > > >> > > > > > > > >> > > > exacerbating
> > > >> > > > > > > > >> > > > > >> the
> > > >> > > > > > > > >> > > > > >> > > > > situation.
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM,
> > Dong
> > > >> Lin
> > > >> > <
> > > >> > > > > > > > >> > > lindong28@gmail.com
> > > >> > > > > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > wrote:
> > > >> > > > > > > > >> > > > > >> > > > >
> > > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > >> > > > > > > > >> > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > > >> > > documentation
> > > >> > > > of
> > > >> > > > > > the
> > > >> > > > > > > > >> > > > experiment.
> > > >> > > > > > > > >> > > > > >> > > > > >
> > > >> > > > > > > > >> > > > > >> > > > > > Initially I also think having
> a
> > > >> > separate
> > > >> > > > > queue
> > > >> > > > > > > for
> > > >> > > > > > > > >> > > > controller
> > > >> > > > > > > > >> > > > > >> > > requests
> > > >> > > > > > > > >> > > > > >> > > > is
> > > >> > > > > > > > >> > > > > >> > > > > > useful because, as you
> mentioned
> > > in
> > > >> the
> > > >> > > > > summary
> > > >> > > > > > > > >> section
> > > >> > > > > > > > >> > of
> > > >> > > > > > > > >> > > > the
> > > >> > > > > > > > >> > > > > >> > Google
> > > >> > > > > > > > >> > > > > >> > > > > doc,
> > > >> > > > > > > > >> > > > > >> > > > > > controller requests are
> > generally
> > > >> more
> > > >> > >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

Yea, the correlationId is only set to 0 in the NetworkClient constructor.
Since we reuse the same NetworkClient between Controller and the broker, a
disconnection should not cause it to reset to 0, in which case it can be
used to reject obsolete requests.

Thanks,

Mayuresh

On Thu, Jul 19, 2018 at 1:52 PM Lucas Wang <lu...@gmail.com> wrote:

> @Dong,
> Great example and explanation, thanks!
>
> @All
> Regarding the example given by Dong, it seems even if we use a queue, and a
> dedicated controller request handling thread,
> the same result can still happen because R1_a will be sent on one
> connection, and R1_b & R2 will be sent on a different connection,
> and there is no ordering between different connections on the broker side.
> I was discussing with Mayuresh offline, and it seems correlation id within
> the same NetworkClient object is monotonically increasing and never reset,
> hence a broker can leverage that to properly reject obsolete requests.
> Thoughts?
>
> Thanks,
> Lucas
>
> On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
>
> > Actually nvm, correlationId is reset in case of connection loss, I think.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> > gharatmayuresh15@gmail.com>
> > wrote:
> >
> > > I agree with Dong that out-of-order processing can happen with having 2
> > > separate queues as well and it can even happen today.
> > > Can we use the correlationId in the request from the controller to the
> > > broker to handle ordering ?
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > >> Good point, Joel. I agree that a dedicated controller request handling
> > >> thread would be a better isolation. It also solves the reordering
> issue.
> > >>
> > >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > >>
> > >> > Good example. I think this scenario can occur in the current code as
> > >> well
> > >> > but with even lower probability given that there are other
> > >> non-controller
> > >> > requests interleaved. It is still sketchy though and I think a safer
> > >> > approach would be separate queues and pinning controller request
> > >> handling
> > >> > to one handler thread.
> > >> >
> > >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hey Becket,
> > >> > >
> > >> > > I think you are right that there may be out-of-order processing.
> > >> However,
> > >> > > it seems that out-of-order processing may also happen even if we
> > use a
> > >> > > separate queue.
> > >> > >
> > >> > > Here is the example:
> > >> > >
> > >> > > - Controller sends R1 and got disconnected before receiving
> > response.
> > >> > Then
> > >> > > it reconnects and sends R2. Both requests now stay in the
> controller
> > >> > > request queue in the order they are sent.
> > >> > > - thread1 takes R1_a from the request queue and then thread2 takes
> > R2
> > >> > from
> > >> > > the request queue almost at the same time.
> > >> > > - So R1_a and R2 are processed in parallel. There is chance that
> > R2's
> > >> > > processing is completed before R1.
> > >> > >
> > >> > > If out-of-order processing can happen for both approaches with
> very
> > >> low
> > >> > > probability, it may not be worthwhile to add the extra queue. What
> > do
> > >> you
> > >> > > think?
> > >> > >
> > >> > > Thanks,
> > >> > > Dong
> > >> > >
> > >> > >
> > >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <becket.qin@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi Mayuresh/Joel,
> > >> > > >
> > >> > > > Using the request channel as a dequeue was bright up some time
> ago
> > >> when
> > >> > > we
> > >> > > > initially thinking of prioritizing the request. The concern was
> > that
> > >> > the
> > >> > > > controller requests are supposed to be processed in order. If we
> > can
> > >> > > ensure
> > >> > > > that there is one controller request in the request channel, the
> > >> order
> > >> > is
> > >> > > > not a concern. But in cases that there are more than one
> > controller
> > >> > > request
> > >> > > > inserted into the queue, the controller request order may change
> > and
> > >> > > cause
> > >> > > > problem. For example, think about the following sequence:
> > >> > > > 1. Controller successfully sent a request R1 to broker
> > >> > > > 2. Broker receives R1 and put the request to the head of the
> > request
> > >> > > queue.
> > >> > > > 3. Controller to broker connection failed and the controller
> > >> > reconnected
> > >> > > to
> > >> > > > the broker.
> > >> > > > 4. Controller sends a request R2 to the broker
> > >> > > > 5. Broker receives R2 and add it to the head of the request
> queue.
> > >> > > > Now on the broker side, R2 will be processed before R1 is
> > processed,
> > >> > > which
> > >> > > > may cause problem.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jiangjie (Becket) Qin
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <
> jjkoshy.w@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> > >> > invasive
> > >> > > > > alternative and it should work. Jun/Becket/others, do you see
> > any
> > >> > > > pitfalls
> > >> > > > > with this approach?
> > >> > > > >
> > >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > >> lucasatucla@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > @Mayuresh,
> > >> > > > > > That's a very interesting idea that I haven't thought
> before.
> > >> > > > > > It seems to solve our problem at hand pretty well, and also
> > >> > > > > > avoids the need to have a new size metric and capacity
> config
> > >> > > > > > for the controller request queue. In fact, if we were to
> adopt
> > >> > > > > > this design, there is no public interface change, and we
> > >> > > > > > probably don't need a KIP.
> > >> > > > > > Also implementation wise, it seems
> > >> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> > >> > > requirement
> > >> > > > > > by supporting a capacity, and also allowing inserting at
> both
> > >> ends.
> > >> > > > > >
> > >> > > > > > My only concern is that this design is tied to the
> coincidence
> > >> that
> > >> > > > > > we have two request priorities and there are two ends to a
> > >> deque.
> > >> > > > > > Hence by using the proposed design, it seems the network
> layer
> > >> is
> > >> > > > > > more tightly coupled with upper layer logic, e.g. if we were
> > to
> > >> add
> > >> > > > > > an extra priority level in the future for some reason, we
> > would
> > >> > > > probably
> > >> > > > > > need to go back to the design of separate queues, one for
> each
> > >> > > priority
> > >> > > > > > level.
> > >> > > > > >
> > >> > > > > > In summary, I'm ok with both designs and lean toward your
> > >> suggested
> > >> > > > > > approach.
> > >> > > > > > Let's hear what others think.
> > >> > > > > >
> > >> > > > > > @Becket,
> > >> > > > > > In light of Mayuresh's suggested new design, I'm answering
> > your
> > >> > > > question
> > >> > > > > > only in the context
> > >> > > > > > of the current KIP design: I think your suggestion makes
> > sense,
> > >> and
> > >> > > I'm
> > >> > > > > ok
> > >> > > > > > with removing the capacity config and
> > >> > > > > > just relying on the default value of 20 being sufficient
> > enough.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Lucas
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > >> > > > > > gharatmayuresh15@gmail.com
> > >> > > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Lucas,
> > >> > > > > > >
> > >> > > > > > > Seems like the main intent here is to prioritize the
> > >> controller
> > >> > > > request
> > >> > > > > > > over any other requests.
> > >> > > > > > > In that case, we can change the request queue to a
> dequeue,
> > >> where
> > >> > > you
> > >> > > > > > > always insert the normal requests (produce, consume,..etc)
> > to
> > >> the
> > >> > > end
> > >> > > > > of
> > >> > > > > > > the dequeue, but if its a controller request, you insert
> it
> > to
> > >> > the
> > >> > > > head
> > >> > > > > > of
> > >> > > > > > > the queue. This ensures that the controller request will
> be
> > >> given
> > >> > > > > higher
> > >> > > > > > > priority over other requests.
> > >> > > > > > >
> > >> > > > > > > Also since we only read one request from the socket and
> mute
> > >> it
> > >> > and
> > >> > > > > only
> > >> > > > > > > unmute it after handling the request, this would ensure
> that
> > >> we
> > >> > > don't
> > >> > > > > > > handle controller requests out of order.
> > >> > > > > > >
> > >> > > > > > > With this approach we can avoid the second queue and the
> > >> > additional
> > >> > > > > > config
> > >> > > > > > > for the size of the queue.
> > >> > > > > > >
> > >> > > > > > > What do you think ?
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Mayuresh
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > >> becket.qin@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hey Joel,
> > >> > > > > > > >
> > >> > > > > > > > Thank for the detail explanation. I agree the current
> > design
> > >> > > makes
> > >> > > > > > sense.
> > >> > > > > > > > My confusion is about whether the new config for the
> > >> controller
> > >> > > > queue
> > >> > > > > > > > capacity is necessary. I cannot think of a case in which
> > >> users
> > >> > > > would
> > >> > > > > > > change
> > >> > > > > > > > it.
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > >
> > >> > > > > > > > Jiangjie (Becket) Qin
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > >> > > becket.qin@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Lucas,
> > >> > > > > > > > >
> > >> > > > > > > > > I guess my question can be rephrased to "do we expect
> > >> user to
> > >> > > > ever
> > >> > > > > > > change
> > >> > > > > > > > > the controller request queue capacity"? If we agree
> that
> > >> 20
> > >> > is
> > >> > > > > > already
> > >> > > > > > > a
> > >> > > > > > > > > very generous default number and we do not expect user
> > to
> > >> > > change
> > >> > > > > it,
> > >> > > > > > is
> > >> > > > > > > > it
> > >> > > > > > > > > still necessary to expose this as a config?
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > >
> > >> > > > > > > > > Jiangjie (Becket) Qin
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > >> > > > lucasatucla@gmail.com
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> @Becket
> > >> > > > > > > > >> 1. Thanks for the comment. You are right that
> normally
> > >> there
> > >> > > > > should
> > >> > > > > > be
> > >> > > > > > > > >> just
> > >> > > > > > > > >> one controller request because of muting,
> > >> > > > > > > > >> and I had NOT intended to say there would be many
> > >> enqueued
> > >> > > > > > controller
> > >> > > > > > > > >> requests.
> > >> > > > > > > > >> I went through the KIP again, and I'm not sure which
> > part
> > >> > > > conveys
> > >> > > > > > that
> > >> > > > > > > > >> info.
> > >> > > > > > > > >> I'd be happy to revise if you point it out the
> section.
> > >> > > > > > > > >>
> > >> > > > > > > > >> 2. Though it should not happen in normal conditions,
> > the
> > >> > > current
> > >> > > > > > > design
> > >> > > > > > > > >> does not preclude multiple controllers running
> > >> > > > > > > > >> at the same time, hence if we don't have the
> controller
> > >> > queue
> > >> > > > > > capacity
> > >> > > > > > > > >> config and simply make its capacity to be 1,
> > >> > > > > > > > >> network threads handling requests from different
> > >> controllers
> > >> > > > will
> > >> > > > > be
> > >> > > > > > > > >> blocked during those troublesome times,
> > >> > > > > > > > >> which is probably not what we want. On the other
> hand,
> > >> > adding
> > >> > > > the
> > >> > > > > > > extra
> > >> > > > > > > > >> config with a default value, say 20, guards us from
> > >> issues
> > >> > in
> > >> > > > > those
> > >> > > > > > > > >> troublesome times, and IMO there isn't much downside
> of
> > >> > adding
> > >> > > > the
> > >> > > > > > > extra
> > >> > > > > > > > >> config.
> > >> > > > > > > > >>
> > >> > > > > > > > >> @Mayuresh
> > >> > > > > > > > >> Good catch, this sentence is an obsolete statement
> > based
> > >> on
> > >> > a
> > >> > > > > > previous
> > >> > > > > > > > >> design. I've revised the wording in the KIP.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Thanks,
> > >> > > > > > > > >> Lucas
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >> > Hi Lucas,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Thanks for the KIP.
> > >> > > > > > > > >> > I am trying to understand why you think "The memory
> > >> > > > consumption
> > >> > > > > > can
> > >> > > > > > > > rise
> > >> > > > > > > > >> > given the total number of queued requests can go up
> > to
> > >> 2x"
> > >> > > in
> > >> > > > > the
> > >> > > > > > > > impact
> > >> > > > > > > > >> > section. Normally the requests from controller to a
> > >> Broker
> > >> > > are
> > >> > > > > not
> > >> > > > > > > > high
> > >> > > > > > > > >> > volume, right ?
> > >> > > > > > > > >> >
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Thanks,
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > Mayuresh
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > >> > > > > becket.qin@gmail.com>
> > >> > > > > > > > >> wrote:
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> > >> plane
> > >> > > from
> > >> > > > > the
> > >> > > > > > > > data
> > >> > > > > > > > >> > plane
> > >> > > > > > > > >> > > makes a lot of sense.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > In the KIP you mentioned that the controller
> > request
> > >> > queue
> > >> > > > may
> > >> > > > > > > have
> > >> > > > > > > > >> many
> > >> > > > > > > > >> > > requests in it. Will this be a common case? The
> > >> > controller
> > >> > > > > > > requests
> > >> > > > > > > > >> still
> > >> > > > > > > > >> > > goes through the SocketServer. The SocketServer
> > will
> > >> > mute
> > >> > > > the
> > >> > > > > > > > channel
> > >> > > > > > > > >> > once
> > >> > > > > > > > >> > > a request is read and put into the request
> channel.
> > >> So
> > >> > > > > assuming
> > >> > > > > > > > there
> > >> > > > > > > > >> is
> > >> > > > > > > > >> > > only one connection between controller and each
> > >> broker,
> > >> > on
> > >> > > > the
> > >> > > > > > > > broker
> > >> > > > > > > > >> > side,
> > >> > > > > > > > >> > > there should be only one controller request in
> the
> > >> > > > controller
> > >> > > > > > > > request
> > >> > > > > > > > >> > queue
> > >> > > > > > > > >> > > at any given time. If that is the case, do we
> need
> > a
> > >> > > > separate
> > >> > > > > > > > >> controller
> > >> > > > > > > > >> > > request queue capacity config? The default value
> 20
> > >> > means
> > >> > > > that
> > >> > > > > > we
> > >> > > > > > > > >> expect
> > >> > > > > > > > >> > > there are 20 controller switches to happen in a
> > short
> > >> > > period
> > >> > > > > of
> > >> > > > > > > > time.
> > >> > > > > > > > >> I
> > >> > > > > > > > >> > am
> > >> > > > > > > > >> > > not sure whether someone should increase the
> > >> controller
> > >> > > > > request
> > >> > > > > > > > queue
> > >> > > > > > > > >> > > capacity to handle such case, as it seems
> > indicating
> > >> > > > something
> > >> > > > > > > very
> > >> > > > > > > > >> wrong
> > >> > > > > > > > >> > > has happened.
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Thanks,
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > Jiangjie (Becket) Qin
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > >> > > > > lindong28@gmail.com>
> > >> > > > > > > > >> wrote:
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > > Thanks for the update Lucas.
> > >> > > > > > > > >> > > >
> > >> > > > > > > > >> > > > I think the motivation section is intuitive. It
> > >> will
> > >> > be
> > >> > > > good
> > >> > > > > > to
> > >> > > > > > > > >> learn
> > >> > > > > > > > >> > > more
> > >> > > > > > > > >> > > > about the comments from other reviewers.
> > >> > > > > > > > >> > > >
> > >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > >> > > > > > > > lucasatucla@gmail.com>
> > >> > > > > > > > >> > > wrote:
> > >> > > > > > > > >> > > >
> > >> > > > > > > > >> > > > > Hi Dong,
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > I've updated the motivation section of the
> KIP
> > by
> > >> > > > > explaining
> > >> > > > > > > the
> > >> > > > > > > > >> > cases
> > >> > > > > > > > >> > > > that
> > >> > > > > > > > >> > > > > would have user impacts.
> > >> > > > > > > > >> > > > > Please take a look at let me know your
> > comments.
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > Thanks,
> > >> > > > > > > > >> > > > > Lucas
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > >> > > > > > > > lucasatucla@gmail.com
> > >> > > > > > > > >> >
> > >> > > > > > > > >> > > > wrote:
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > > Hi Dong,
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > The simulation of disk being slow is merely
> > >> for me
> > >> > > to
> > >> > > > > > easily
> > >> > > > > > > > >> > > construct
> > >> > > > > > > > >> > > > a
> > >> > > > > > > > >> > > > > > testing scenario
> > >> > > > > > > > >> > > > > > with a backlog of produce requests. In
> > >> production,
> > >> > > > other
> > >> > > > > > > than
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > disk
> > >> > > > > > > > >> > > > > > being slow, a backlog of
> > >> > > > > > > > >> > > > > > produce requests may also be caused by high
> > >> > produce
> > >> > > > QPS.
> > >> > > > > > > > >> > > > > > In that case, we may not want to kill the
> > >> broker
> > >> > and
> > >> > > > > > that's
> > >> > > > > > > > when
> > >> > > > > > > > >> > this
> > >> > > > > > > > >> > > > KIP
> > >> > > > > > > > >> > > > > > can be useful, both for JBOD
> > >> > > > > > > > >> > > > > > and non-JBOD setup.
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > Going back to your previous question about
> > each
> > >> > > > > > > ProduceRequest
> > >> > > > > > > > >> > > covering
> > >> > > > > > > > >> > > > > 20
> > >> > > > > > > > >> > > > > > partitions that are randomly
> > >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr
> request
> > >> is
> > >> > > > > enqueued
> > >> > > > > > > that
> > >> > > > > > > > >> > tries
> > >> > > > > > > > >> > > to
> > >> > > > > > > > >> > > > > > switch the current broker, say broker0,
> from
> > >> > leader
> > >> > > to
> > >> > > > > > > > follower
> > >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*.
> > For
> > >> the
> > >> > > > sake
> > >> > > > > of
> > >> > > > > > > > >> > argument,
> > >> > > > > > > > >> > > > > > let's also assume the other brokers, say
> > >> broker1,
> > >> > > have
> > >> > > > > > > > *stopped*
> > >> > > > > > > > >> > > > fetching
> > >> > > > > > > > >> > > > > > from
> > >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have
> > acks =
> > >> > -1
> > >> > > > > (ALL)
> > >> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> > >> ahead
> > >> > of
> > >> > > > > > > > >> LeaderAndISR
> > >> > > > > > > > >> > > will
> > >> > > > > > > > >> > > > be
> > >> > > > > > > > >> > > > > > put into the purgatory,
> > >> > > > > > > > >> > > > > >         and since they'll never be
> replicated
> > >> to
> > >> > > other
> > >> > > > > > > brokers
> > >> > > > > > > > >> > > (because
> > >> > > > > > > > >> > > > > of
> > >> > > > > > > > >> > > > > > the assumption made above), they will
> > >> > > > > > > > >> > > > > >         be completed either when the
> > >> LeaderAndISR
> > >> > > > > request
> > >> > > > > > is
> > >> > > > > > > > >> > > processed
> > >> > > > > > > > >> > > > or
> > >> > > > > > > > >> > > > > > when the timeout happens.
> > >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will
> immediately
> > >> > > > transition
> > >> > > > > > the
> > >> > > > > > > > >> > > partition
> > >> > > > > > > > >> > > > > > test-0 to become a follower,
> > >> > > > > > > > >> > > > > >         after the current broker sees the
> > >> > > replication
> > >> > > > of
> > >> > > > > > the
> > >> > > > > > > > >> > > remaining
> > >> > > > > > > > >> > > > 19
> > >> > > > > > > > >> > > > > > partitions, it can send a response
> indicating
> > >> that
> > >> > > > > > > > >> > > > > >         it's no longer the leader for the
> > >> > "test-0".
> > >> > > > > > > > >> > > > > >   To see the latency difference between 1.1
> > and
> > >> > 1.2,
> > >> > > > > let's
> > >> > > > > > > say
> > >> > > > > > > > >> > there
> > >> > > > > > > > >> > > > are
> > >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> > LeaderAndISR,
> > >> > and
> > >> > > > > there
> > >> > > > > > > are
> > >> > > > > > > > 8
> > >> > > > > > > > >> io
> > >> > > > > > > > >> > > > > threads,
> > >> > > > > > > > >> > > > > >   so each io thread will process
> > approximately
> > >> > 3000
> > >> > > > > > produce
> > >> > > > > > > > >> > requests.
> > >> > > > > > > > >> > > > Now
> > >> > > > > > > > >> > > > > > let's investigate the io thread that
> finally
> > >> > > processed
> > >> > > > > the
> > >> > > > > > > > >> > > > LeaderAndISR.
> > >> > > > > > > > >> > > > > >   For the 3000 produce requests, if we
> model
> > >> the
> > >> > > time
> > >> > > > > when
> > >> > > > > > > > their
> > >> > > > > > > > >> > > > > remaining
> > >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999,
> > and
> > >> > the
> > >> > > > > > > > LeaderAndISR
> > >> > > > > > > > >> > > > request
> > >> > > > > > > > >> > > > > is
> > >> > > > > > > > >> > > > > > processed at time t3000.
> > >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> > >> would
> > >> > > have
> > >> > > > > > > waited
> > >> > > > > > > > an
> > >> > > > > > > > >> > > extra
> > >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd
> an
> > >> extra
> > >> > > > time
> > >> > > > > of
> > >> > > > > > > > >> t3000 -
> > >> > > > > > > > >> > > t1,
> > >> > > > > > > > >> > > > > etc.
> > >> > > > > > > > >> > > > > >   Roughly speaking, the latency difference
> is
> > >> > bigger
> > >> > > > for
> > >> > > > > > the
> > >> > > > > > > > >> > earlier
> > >> > > > > > > > >> > > > > > produce requests than for the later ones.
> For
> > >> the
> > >> > > same
> > >> > > > > > > reason,
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > more
> > >> > > > > > > > >> > > > > > ProduceRequests queued
> > >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger
> benefit
> > >> we
> > >> > get
> > >> > > > > > (capped
> > >> > > > > > > > by
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > > > > produce timeout).
> > >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
> > >> acks=0 or
> > >> > > > > acks=1
> > >> > > > > > > > >> > > > > >   There will be no latency differences in
> > this
> > >> > case,
> > >> > > > but
> > >> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> > >> partition
> > >> > > > test-0
> > >> > > > > in
> > >> > > > > > > the
> > >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR
> > will
> > >> be
> > >> > > > > appended
> > >> > > > > > > to
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > local
> > >> > > > > > > > >> > > > > log,
> > >> > > > > > > > >> > > > > >         and eventually be truncated after
> > >> > processing
> > >> > > > the
> > >> > > > > > > > >> > > LeaderAndISR.
> > >> > > > > > > > >> > > > > > This is what's referred to as
> > >> > > > > > > > >> > > > > >         "some unofficial definition of data
> > >> loss
> > >> > in
> > >> > > > > terms
> > >> > > > > > of
> > >> > > > > > > > >> > messages
> > >> > > > > > > > >> > > > > > beyond the high watermark".
> > >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the
> > effect
> > >> > > since
> > >> > > > if
> > >> > > > > > the
> > >> > > > > > > > >> > > > LeaderAndISR
> > >> > > > > > > > >> > > > > > is immediately processed, the response to
> > >> > producers
> > >> > > > will
> > >> > > > > > > have
> > >> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
> > >> causing
> > >> > > > > producers
> > >> > > > > > > to
> > >> > > > > > > > >> retry
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > This explanation above is the benefit for
> > >> reducing
> > >> > > the
> > >> > > > > > > latency
> > >> > > > > > > > >> of a
> > >> > > > > > > > >> > > > > broker
> > >> > > > > > > > >> > > > > > becoming the follower,
> > >> > > > > > > > >> > > > > > closely related is reducing the latency of
> a
> > >> > broker
> > >> > > > > > becoming
> > >> > > > > > > > the
> > >> > > > > > > > >> > > > leader.
> > >> > > > > > > > >> > > > > > In this case, the benefit is even more
> > >> obvious, if
> > >> > > > other
> > >> > > > > > > > brokers
> > >> > > > > > > > >> > have
> > >> > > > > > > > >> > > > > > resigned leadership, and the
> > >> > > > > > > > >> > > > > > current broker should take leadership. Any
> > >> delay
> > >> > in
> > >> > > > > > > processing
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > >> > > > > > > > >> > > > > > by clients as unavailability. In extreme
> > cases,
> > >> > this
> > >> > > > can
> > >> > > > > > > cause
> > >> > > > > > > > >> > failed
> > >> > > > > > > > >> > > > > > produce requests if the retries are
> > >> > > > > > > > >> > > > > > exhausted.
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > Another two types of controller requests
> are
> > >> > > > > > UpdateMetadata
> > >> > > > > > > > and
> > >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> > >> > follows:
> > >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> > processing
> > >> > > means
> > >> > > > > > > clients
> > >> > > > > > > > >> > > receiving
> > >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> > leadership
> > >> > info
> > >> > > > > > > > >> > > > > > for certain partitions, and the effect is
> > more
> > >> > > retries
> > >> > > > > or
> > >> > > > > > > even
> > >> > > > > > > > >> > fatal
> > >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing
> time
> > >> may
> > >> > > > > degrade
> > >> > > > > > > the
> > >> > > > > > > > >> > > > performance
> > >> > > > > > > > >> > > > > > of topic deletion.
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > Regarding your last question of the delay
> for
> > >> > > > > > > > >> > DescribeLogDirsRequest,
> > >> > > > > > > > >> > > > you
> > >> > > > > > > > >> > > > > > are right
> > >> > > > > > > > >> > > > > > that this KIP cannot help with the latency
> in
> > >> > > getting
> > >> > > > > the
> > >> > > > > > > log
> > >> > > > > > > > >> dirs
> > >> > > > > > > > >> > > > info,
> > >> > > > > > > > >> > > > > > and it's only relevant
> > >> > > > > > > > >> > > > > > when controller requests are involved.
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > Regards,
> > >> > > > > > > > >> > > > > > Lucas
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > >> > > > > > > lindong28@gmail.com
> > >> > > > > > > > >
> > >> > > > > > > > >> > > wrote:
> > >> > > > > > > > >> > > > > >
> > >> > > > > > > > >> > > > > >> Hey Jun,
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
> > >> point.
> > >> > So
> > >> > > > the
> > >> > > > > > > > feature
> > >> > > > > > > > >> may
> > >> > > > > > > > >> > > be
> > >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> > question
> > >> > > below.
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> Hey Lucas,
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> Do you think this feature is also useful
> for
> > >> > > non-JBOD
> > >> > > > > > setup
> > >> > > > > > > > or
> > >> > > > > > > > >> it
> > >> > > > > > > > >> > is
> > >> > > > > > > > >> > > > > only
> > >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be
> useful
> > to
> > >> > > > > understand
> > >> > > > > > > > this.
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in
> > order
> > >> to
> > >> > > move
> > >> > > > > > > leaders
> > >> > > > > > > > >> on
> > >> > > > > > > > >> > the
> > >> > > > > > > > >> > > > > >> failed
> > >> > > > > > > > >> > > > > >> disk to other disks, the system operator
> > first
> > >> > > needs
> > >> > > > to
> > >> > > > > > get
> > >> > > > > > > > the
> > >> > > > > > > > >> > list
> > >> > > > > > > > >> > > > of
> > >> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> > >> currently
> > >> > > > > achieved
> > >> > > > > > > > using
> > >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > >> > > > > > > > >> DescribeLogDirsRequest
> > >> > > > > > > > >> > to
> > >> > > > > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> broker. If we only prioritize the
> controller
> > >> > > > requests,
> > >> > > > > > then
> > >> > > > > > > > the
> > >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > >> > > > > > > > >> > > > > >> may still take a long time to be processed
> > by
> > >> the
> > >> > > > > broker.
> > >> > > > > > > So
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > > overall
> > >> > > > > > > > >> > > > > >> time to move leaders away from the failed
> > disk
> > >> > may
> > >> > > > > still
> > >> > > > > > be
> > >> > > > > > > > >> long
> > >> > > > > > > > >> > > even
> > >> > > > > > > > >> > > > > with
> > >> > > > > > > > >> > > > > >> this KIP. What do you think?
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> Thanks,
> > >> > > > > > > > >> > > > > >> Dong
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas
> Wang <
> > >> > > > > > > > >> lucasatucla@gmail.com
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > > > wrote:
> > >> > > > > > > > >> > > > > >>
> > >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > >> > > > > > > > >> > > > > >> >
> > >> > > > > > > > >> > > > > >> > @Dong,
> > >> > > > > > > > >> > > > > >> > Since both of the two comments in your
> > >> previous
> > >> > > > email
> > >> > > > > > are
> > >> > > > > > > > >> about
> > >> > > > > > > > >> > > the
> > >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
> > >> useful,
> > >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you
> > agree
> > >> > that
> > >> > > > > this
> > >> > > > > > > KIP
> > >> > > > > > > > >> can
> > >> > > > > > > > >> > be
> > >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > >> > > > > > > > >> > > > > >> >
> > >> > > > > > > > >> > > > > >> > Regards,
> > >> > > > > > > > >> > > > > >> > Lucas
> > >> > > > > > > > >> > > > > >> >
> > >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao
> <
> > >> > > > > > > jun@confluent.io>
> > >> > > > > > > > >> > wrote:
> > >> > > > > > > > >> > > > > >> >
> > >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > >> > > > > > > > >> > > > > >> > >
> > >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> > >> > probably
> > >> > > > > > should
> > >> > > > > > > > just
> > >> > > > > > > > >> > kill
> > >> > > > > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> > >> help.
> > >> > If
> > >> > > > > only
> > >> > > > > > > one
> > >> > > > > > > > of
> > >> > > > > > > > >> > the
> > >> > > > > > > > >> > > > > disks
> > >> > > > > > > > >> > > > > >> on
> > >> > > > > > > > >> > > > > >> > a
> > >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail
> > that
> > >> > disk
> > >> > > > and
> > >> > > > > > move
> > >> > > > > > > > the
> > >> > > > > > > > >> > > > leaders
> > >> > > > > > > > >> > > > > on
> > >> > > > > > > > >> > > > > >> > that
> > >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case,
> > being
> > >> > able
> > >> > > > to
> > >> > > > > > > > process
> > >> > > > > > > > >> the
> > >> > > > > > > > >> > > > > >> > LeaderAndIsr
> > >> > > > > > > > >> > > > > >> > > requests faster will potentially help
> > the
> > >> > > > producers
> > >> > > > > > > > recover
> > >> > > > > > > > >> > > > quicker.
> > >> > > > > > > > >> > > > > >> > >
> > >> > > > > > > > >> > > > > >> > > Thanks,
> > >> > > > > > > > >> > > > > >> > >
> > >> > > > > > > > >> > > > > >> > > Jun
> > >> > > > > > > > >> > > > > >> > >
> > >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong
> > Lin <
> > >> > > > > > > > >> lindong28@gmail.com
> > >> > > > > > > > >> > >
> > >> > > > > > > > >> > > > > wrote:
> > >> > > > > > > > >> > > > > >> > >
> > >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > >> > > questions
> > >> > > > > > below.
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> > >> covers
> > >> > 20
> > >> > > > > > > > partitions
> > >> > > > > > > > >> > that
> > >> > > > > > > > >> > > > are
> > >> > > > > > > > >> > > > > >> > > randomly
> > >> > > > > > > > >> > > > > >> > > > distributed across all partitions,
> > then
> > >> > each
> > >> > > > > > > > >> ProduceRequest
> > >> > > > > > > > >> > > will
> > >> > > > > > > > >> > > > > >> likely
> > >> > > > > > > > >> > > > > >> > > > cover some partitions for which the
> > >> broker
> > >> > is
> > >> > > > > still
> > >> > > > > > > > >> leader
> > >> > > > > > > > >> > > after
> > >> > > > > > > > >> > > > > it
> > >> > > > > > > > >> > > > > >> > > quickly
> > >> > > > > > > > >> > > > > >> > > > processes the
> > >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker
> will
> > >> still
> > >> > > be
> > >> > > > > slow
> > >> > > > > > > in
> > >> > > > > > > > >> > > > processing
> > >> > > > > > > > >> > > > > >> these
> > >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will
> still
> > be
> > >> > very
> > >> > > > > high
> > >> > > > > > > with
> > >> > > > > > > > >> this
> > >> > > > > > > > >> > > > KIP.
> > >> > > > > > > > >> > > > > It
> > >> > > > > > > > >> > > > > >> > > seems
> > >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> > >> timeout
> > >> > > > after
> > >> > > > > > 30
> > >> > > > > > > > >> > seconds.
> > >> > > > > > > > >> > > Is
> > >> > > > > > > > >> > > > > >> this
> > >> > > > > > > > >> > > > > >> > > > understanding correct?
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest
> > will
> > >> > > still
> > >> > > > > > > timeout
> > >> > > > > > > > >> after
> > >> > > > > > > > >> > > 30
> > >> > > > > > > > >> > > > > >> > seconds,
> > >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> > >> reduces
> > >> > > > > average
> > >> > > > > > > > >> produce
> > >> > > > > > > > >> > > > > latency.
> > >> > > > > > > > >> > > > > >> Can
> > >> > > > > > > > >> > > > > >> > > you
> > >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved
> > by
> > >> > this
> > >> > > > KIP?
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > Not sure why system operator
> directly
> > >> cares
> > >> > > > > number
> > >> > > > > > of
> > >> > > > > > > > >> > > truncated
> > >> > > > > > > > >> > > > > >> > messages.
> > >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> > average
> > >> > > > > throughput
> > >> > > > > > > or
> > >> > > > > > > > >> > reduce
> > >> > > > > > > > >> > > > > >> message
> > >> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> > >> understand
> > >> > > > this.
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > Thanks,
> > >> > > > > > > > >> > > > > >> > > > Dong
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
> > >> Wang <
> > >> > > > > > > > >> > > lucasatucla@gmail.com
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > wrote:
> > >> > > > > > > > >> > > > > >> > > >
> > >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> > >> Please
> > >> > > see
> > >> > > > > my
> > >> > > > > > > > reply
> > >> > > > > > > > >> > > below.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> > >> > partition.
> > >> > > > Now
> > >> > > > > > > let's
> > >> > > > > > > > >> > > consider
> > >> > > > > > > > >> > > > a
> > >> > > > > > > > >> > > > > >> more
> > >> > > > > > > > >> > > > > >> > > > common
> > >> > > > > > > > >> > > > > >> > > > > scenario
> > >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of
> many
> > >> > > > partitions.
> > >> > > > > > And
> > >> > > > > > > > >> let's
> > >> > > > > > > > >> > > say
> > >> > > > > > > > >> > > > > for
> > >> > > > > > > > >> > > > > >> > some
> > >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> > >> > broker0
> > >> > > is
> > >> > > > > so
> > >> > > > > > > > large,
> > >> > > > > > > > >> > say
> > >> > > > > > > > >> > > > 10K,
> > >> > > > > > > > >> > > > > >> that
> > >> > > > > > > > >> > > > > >> > > the
> > >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > >> > > > > > > > >> > > > > >> > > > > and the operator would like to
> shift
> > >> the
> > >> > > > > > leadership
> > >> > > > > > > > >> for a
> > >> > > > > > > > >> > > lot
> > >> > > > > > > > >> > > > of
> > >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> > brokers,
> > >> > > > > > > > >> > > > > >> > > > > either manually or through some
> > >> service
> > >> > > like
> > >> > > > > > cruise
> > >> > > > > > > > >> > control.
> > >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> > >> > leadership
> > >> > > > > > > > transitions
> > >> > > > > > > > >> > > finish
> > >> > > > > > > > >> > > > > >> more
> > >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster
> itself
> > >> > > becoming
> > >> > > > > more
> > >> > > > > > > > >> > balanced,
> > >> > > > > > > > >> > > > > >> > > > > but all existing producers
> > >> corresponding
> > >> > to
> > >> > > > the
> > >> > > > > > 9K
> > >> > > > > > > > >> > > partitions
> > >> > > > > > > > >> > > > > will
> > >> > > > > > > > >> > > > > >> > get
> > >> > > > > > > > >> > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> > timeout,
> > >> > > thanks
> > >> > > > to
> > >> > > > > > the
> > >> > > > > > > > >> > batched
> > >> > > > > > > > >> > > > > async
> > >> > > > > > > > >> > > > > >> ZK
> > >> > > > > > > > >> > > > > >> > > > > operations.
> > >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to
> have
> > >> > during
> > >> > > > such
> > >> > > > > > > > >> > troublesome
> > >> > > > > > > > >> > > > > times.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google
> Doc
> > >> have
> > >> > > > shown
> > >> > > > > > > that
> > >> > > > > > > > >> with
> > >> > > > > > > > >> > > this
> > >> > > > > > > > >> > > > > KIP
> > >> > > > > > > > >> > > > > >> > many
> > >> > > > > > > > >> > > > > >> > > > > producers
> > >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > >> > > > > NotLeaderForPartition,
> > >> > > > > > > > based
> > >> > > > > > > > >> on
> > >> > > > > > > > >> > > > which
> > >> > > > > > > > >> > > > > >> they
> > >> > > > > > > > >> > > > > >> > > > retry
> > >> > > > > > > > >> > > > > >> > > > > immediately.
> > >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > >> seconds+quick
> > >> > > > retry)
> > >> > > > > > for
> > >> > > > > > > > >> their
> > >> > > > > > > > >> > > > single
> > >> > > > > > > > >> > > > > >> > > message
> > >> > > > > > > > >> > > > > >> > > > is
> > >> > > > > > > > >> > > > > >> > > > > much smaller
> > >> > > > > > > > >> > > > > >> > > > > compared with the case of timing
> out
> > >> > > without
> > >> > > > > the
> > >> > > > > > > KIP
> > >> > > > > > > > >> (30
> > >> > > > > > > > >> > > > seconds
> > >> > > > > > > > >> > > > > >> for
> > >> > > > > > > > >> > > > > >> > > > timing
> > >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> > >> timing
> > >> > > out
> > >> > > > on
> > >> > > > > > the
> > >> > > > > > > > >> > producer
> > >> > > > > > > > >> > > > > side
> > >> > > > > > > > >> > > > > >> can
> > >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its
> own
> > >> > > > > > drawbacks[1].
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to
> > show
> > >> the
> > >> > > > > number
> > >> > > > > > of
> > >> > > > > > > > >> > > truncated
> > >> > > > > > > > >> > > > > >> > messages
> > >> > > > > > > > >> > > > > >> > > on
> > >> > > > > > > > >> > > > > >> > > > > brokers,
> > >> > > > > > > > >> > > > > >> > > > > with the experiments done in the
> > >> Google
> > >> > > Doc,
> > >> > > > it
> > >> > > > > > > > should
> > >> > > > > > > > >> be
> > >> > > > > > > > >> > > easy
> > >> > > > > > > > >> > > > > to
> > >> > > > > > > > >> > > > > >> see
> > >> > > > > > > > >> > > > > >> > > > that
> > >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since
> the
> > >> > > > up-to-date
> > >> > > > > > > > >> metadata
> > >> > > > > > > > >> > > > avoids
> > >> > > > > > > > >> > > > > >> > > appending
> > >> > > > > > > > >> > > > > >> > > > > of messages
> > >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If
> > we
> > >> > talk
> > >> > > > to a
> > >> > > > > > > > system
> > >> > > > > > > > >> > > > operator
> > >> > > > > > > > >> > > > > >> and
> > >> > > > > > > > >> > > > > >> > ask
> > >> > > > > > > > >> > > > > >> > > > > whether
> > >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I
> > bet
> > >> > most
> > >> > > > > likely
> > >> > > > > > > the
> > >> > > > > > > > >> > answer
> > >> > > > > > > > >> > > > is
> > >> > > > > > > > >> > > > > >> yes.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I
> think
> > it
> > >> > > might
> > >> > > > be
> > >> > > > > > > > >> helpful to
> > >> > > > > > > > >> > > > > >> construct
> > >> > > > > > > > >> > > > > >> > > some
> > >> > > > > > > > >> > > > > >> > > > > formulas.
> > >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm
> going
> > >> back
> > >> > to
> > >> > > > the
> > >> > > > > > > case
> > >> > > > > > > > >> where
> > >> > > > > > > > >> > > > there
> > >> > > > > > > > >> > > > > >> is
> > >> > > > > > > > >> > > > > >> > > only
> > >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > >> > > > > > > > >> > > > > >> > > > > Following the experiments in the
> > >> Google
> > >> > > Doc,
> > >> > > > > > let's
> > >> > > > > > > > say
> > >> > > > > > > > >> > > broker0
> > >> > > > > > > > >> > > > > >> > becomes
> > >> > > > > > > > >> > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
> > >> produce
> > >> > > > > requests
> > >> > > > > > in
> > >> > > > > > > > its
> > >> > > > > > > > >> > > > request
> > >> > > > > > > > >> > > > > >> > queue.
> > >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata
> brought
> > >> by
> > >> > > this
> > >> > > > > KIP,
> > >> > > > > > > > >> broker0
> > >> > > > > > > > >> > > can
> > >> > > > > > > > >> > > > > >> reply
> > >> > > > > > > > >> > > > > >> > > with
> > >> > > > > > > > >> > > > > >> > > > an
> > >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > >> > > processing
> > >> > > > > > time
> > >> > > > > > > of
> > >> > > > > > > > >> > > replying
> > >> > > > > > > > >> > > > > >> with
> > >> > > > > > > > >> > > > > >> > > such
> > >> > > > > > > > >> > > > > >> > > > an
> > >> > > > > > > > >> > > > > >> > > > > error message.
> > >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
> > >> need to
> > >> > > > > append
> > >> > > > > > > > >> messages
> > >> > > > > > > > >> > to
> > >> > > > > > > > >> > > > > >> > segments,
> > >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > >> > > processing
> > >> > > > > > time
> > >> > > > > > > > for
> > >> > > > > > > > >> > such
> > >> > > > > > > > >> > > > > logic.
> > >> > > > > > > > >> > > > > >> > > > > Then the average extra latency
> > >> incurred
> > >> > > > without
> > >> > > > > > > this
> > >> > > > > > > > >> KIP
> > >> > > > > > > > >> > is
> > >> > > > > > > > >> > > N
> > >> > > > > > > > >> > > > *
> > >> > > > > > > > >> > > > > >> (M2 -
> > >> > > > > > > > >> > > > > >> > > > M1) /
> > >> > > > > > > > >> > > > > >> > > > > 2.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
> > >> larger
> > >> > > than
> > >> > > > > M1,
> > >> > > > > > > > which
> > >> > > > > > > > >> > means
> > >> > > > > > > > >> > > > as
> > >> > > > > > > > >> > > > > >> long
> > >> > > > > > > > >> > > > > >> > > as N
> > >> > > > > > > > >> > > > > >> > > > > is positive,
> > >> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> > >> average
> > >> > > > > latency.
> > >> > > > > > > > >> > > > > >> > > > > There does not need to be
> > significant
> > >> > > backlog
> > >> > > > > of
> > >> > > > > > > > >> requests
> > >> > > > > > > > >> > in
> > >> > > > > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> > > request
> > >> > > > > > > > >> > > > > >> > > > > queue,
> > >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> > >> performance
> > >> > > to
> > >> > > > > have
> > >> > > > > > > the
> > >> > > > > > > > >> > > > > improvement.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > Regards,
> > >> > > > > > > > >> > > > > >> > > > > Lucas
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> > >> timeout on
> > >> > > the
> > >> > > > > > > > producer
> > >> > > > > > > > >> > side
> > >> > > > > > > > >> > > > can
> > >> > > > > > > > >> > > > > >> > trigger
> > >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > >> > > > > > > > >> > > > > >> > > > > when the corresponding leader
> broker
> > >> is
> > >> > > > > > overloaded,
> > >> > > > > > > > >> > > > exacerbating
> > >> > > > > > > > >> > > > > >> the
> > >> > > > > > > > >> > > > > >> > > > > situation.
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM,
> Dong
> > >> Lin
> > >> > <
> > >> > > > > > > > >> > > lindong28@gmail.com
> > >> > > > > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > wrote:
> > >> > > > > > > > >> > > > > >> > > > >
> > >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > >> > > > > > > > >> > > > > >> > > > > >
> > >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > >> > > documentation
> > >> > > > of
> > >> > > > > > the
> > >> > > > > > > > >> > > > experiment.
> > >> > > > > > > > >> > > > > >> > > > > >
> > >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> > >> > separate
> > >> > > > > queue
> > >> > > > > > > for
> > >> > > > > > > > >> > > > controller
> > >> > > > > > > > >> > > > > >> > > requests
> > >> > > > > > > > >> > > > > >> > > > is
> > >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned
> > in
> > >> the
> > >> > > > > summary
> > >> > > > > > > > >> section
> > >> > > > > > > > >> > of
> > >> > > > > > > > >> > > > the
> > >> > > > > > > > >> > > > > >> > Google
> > >> > > > > > > > >> > > > > >> > > > > doc,
> > >> > > > > > > > >> > > > > >> > > > > > controller requests are
> generally
> > >> more
> > >> > >



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Dong,
Great example and explanation, thanks!

@All
Regarding the example given by Dong, it seems even if we use a queue, and a
dedicated controller request handling thread,
the same result can still happen because R1_a will be sent on one
connection, and R1_b & R2 will be sent on a different connection,
and there is no ordering between different connections on the broker side.
I was discussing with Mayuresh offline, and it seems correlation id within
the same NetworkClient object is monotonically increasing and never reset,
hence a broker can leverage that to properly reject obsolete requests.
Thoughts?

Thanks,
Lucas

On Thu, Jul 19, 2018 at 12:11 PM, Mayuresh Gharat <
gharatmayuresh15@gmail.com> wrote:

> Actually nvm, correlationId is reset in case of connection loss, I think.
>
> Thanks,
>
> Mayuresh
>
> On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <
> gharatmayuresh15@gmail.com>
> wrote:
>
> > I agree with Dong that out-of-order processing can happen with having 2
> > separate queues as well and it can even happen today.
> > Can we use the correlationId in the request from the controller to the
> > broker to handle ordering ?
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com> wrote:
> >
> >> Good point, Joel. I agree that a dedicated controller request handling
> >> thread would be a better isolation. It also solves the reordering issue.
> >>
> >> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com>
> wrote:
> >>
> >> > Good example. I think this scenario can occur in the current code as
> >> well
> >> > but with even lower probability given that there are other
> >> non-controller
> >> > requests interleaved. It is still sketchy though and I think a safer
> >> > approach would be separate queues and pinning controller request
> >> handling
> >> > to one handler thread.
> >> >
> >> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com>
> wrote:
> >> >
> >> > > Hey Becket,
> >> > >
> >> > > I think you are right that there may be out-of-order processing.
> >> However,
> >> > > it seems that out-of-order processing may also happen even if we
> use a
> >> > > separate queue.
> >> > >
> >> > > Here is the example:
> >> > >
> >> > > - Controller sends R1 and got disconnected before receiving
> response.
> >> > Then
> >> > > it reconnects and sends R2. Both requests now stay in the controller
> >> > > request queue in the order they are sent.
> >> > > - thread1 takes R1_a from the request queue and then thread2 takes
> R2
> >> > from
> >> > > the request queue almost at the same time.
> >> > > - So R1_a and R2 are processed in parallel. There is chance that
> R2's
> >> > > processing is completed before R1.
> >> > >
> >> > > If out-of-order processing can happen for both approaches with very
> >> low
> >> > > probability, it may not be worthwhile to add the extra queue. What
> do
> >> you
> >> > > think?
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> > >
> >> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Mayuresh/Joel,
> >> > > >
> >> > > > Using the request channel as a dequeue was bright up some time ago
> >> when
> >> > > we
> >> > > > initially thinking of prioritizing the request. The concern was
> that
> >> > the
> >> > > > controller requests are supposed to be processed in order. If we
> can
> >> > > ensure
> >> > > > that there is one controller request in the request channel, the
> >> order
> >> > is
> >> > > > not a concern. But in cases that there are more than one
> controller
> >> > > request
> >> > > > inserted into the queue, the controller request order may change
> and
> >> > > cause
> >> > > > problem. For example, think about the following sequence:
> >> > > > 1. Controller successfully sent a request R1 to broker
> >> > > > 2. Broker receives R1 and put the request to the head of the
> request
> >> > > queue.
> >> > > > 3. Controller to broker connection failed and the controller
> >> > reconnected
> >> > > to
> >> > > > the broker.
> >> > > > 4. Controller sends a request R2 to the broker
> >> > > > 5. Broker receives R2 and add it to the head of the request queue.
> >> > > > Now on the broker side, R2 will be processed before R1 is
> processed,
> >> > > which
> >> > > > may cause problem.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> >> > invasive
> >> > > > > alternative and it should work. Jun/Becket/others, do you see
> any
> >> > > > pitfalls
> >> > > > > with this approach?
> >> > > > >
> >> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> >> lucasatucla@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > @Mayuresh,
> >> > > > > > That's a very interesting idea that I haven't thought before.
> >> > > > > > It seems to solve our problem at hand pretty well, and also
> >> > > > > > avoids the need to have a new size metric and capacity config
> >> > > > > > for the controller request queue. In fact, if we were to adopt
> >> > > > > > this design, there is no public interface change, and we
> >> > > > > > probably don't need a KIP.
> >> > > > > > Also implementation wise, it seems
> >> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> >> > > requirement
> >> > > > > > by supporting a capacity, and also allowing inserting at both
> >> ends.
> >> > > > > >
> >> > > > > > My only concern is that this design is tied to the coincidence
> >> that
> >> > > > > > we have two request priorities and there are two ends to a
> >> deque.
> >> > > > > > Hence by using the proposed design, it seems the network layer
> >> is
> >> > > > > > more tightly coupled with upper layer logic, e.g. if we were
> to
> >> add
> >> > > > > > an extra priority level in the future for some reason, we
> would
> >> > > > probably
> >> > > > > > need to go back to the design of separate queues, one for each
> >> > > priority
> >> > > > > > level.
> >> > > > > >
> >> > > > > > In summary, I'm ok with both designs and lean toward your
> >> suggested
> >> > > > > > approach.
> >> > > > > > Let's hear what others think.
> >> > > > > >
> >> > > > > > @Becket,
> >> > > > > > In light of Mayuresh's suggested new design, I'm answering
> your
> >> > > > question
> >> > > > > > only in the context
> >> > > > > > of the current KIP design: I think your suggestion makes
> sense,
> >> and
> >> > > I'm
> >> > > > > ok
> >> > > > > > with removing the capacity config and
> >> > > > > > just relying on the default value of 20 being sufficient
> enough.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Lucas
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> >> > > > > > gharatmayuresh15@gmail.com
> >> > > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Lucas,
> >> > > > > > >
> >> > > > > > > Seems like the main intent here is to prioritize the
> >> controller
> >> > > > request
> >> > > > > > > over any other requests.
> >> > > > > > > In that case, we can change the request queue to a dequeue,
> >> where
> >> > > you
> >> > > > > > > always insert the normal requests (produce, consume,..etc)
> to
> >> the
> >> > > end
> >> > > > > of
> >> > > > > > > the dequeue, but if its a controller request, you insert it
> to
> >> > the
> >> > > > head
> >> > > > > > of
> >> > > > > > > the queue. This ensures that the controller request will be
> >> given
> >> > > > > higher
> >> > > > > > > priority over other requests.
> >> > > > > > >
> >> > > > > > > Also since we only read one request from the socket and mute
> >> it
> >> > and
> >> > > > > only
> >> > > > > > > unmute it after handling the request, this would ensure that
> >> we
> >> > > don't
> >> > > > > > > handle controller requests out of order.
> >> > > > > > >
> >> > > > > > > With this approach we can avoid the second queue and the
> >> > additional
> >> > > > > > config
> >> > > > > > > for the size of the queue.
> >> > > > > > >
> >> > > > > > > What do you think ?
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Mayuresh
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> >> becket.qin@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hey Joel,
> >> > > > > > > >
> >> > > > > > > > Thank for the detail explanation. I agree the current
> design
> >> > > makes
> >> > > > > > sense.
> >> > > > > > > > My confusion is about whether the new config for the
> >> controller
> >> > > > queue
> >> > > > > > > > capacity is necessary. I cannot think of a case in which
> >> users
> >> > > > would
> >> > > > > > > change
> >> > > > > > > > it.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > >
> >> > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > >
> >> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> >> > > becket.qin@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Lucas,
> >> > > > > > > > >
> >> > > > > > > > > I guess my question can be rephrased to "do we expect
> >> user to
> >> > > > ever
> >> > > > > > > change
> >> > > > > > > > > the controller request queue capacity"? If we agree that
> >> 20
> >> > is
> >> > > > > > already
> >> > > > > > > a
> >> > > > > > > > > very generous default number and we do not expect user
> to
> >> > > change
> >> > > > > it,
> >> > > > > > is
> >> > > > > > > > it
> >> > > > > > > > > still necessary to expose this as a config?
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >
> >> > > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> >> > > > lucasatucla@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> @Becket
> >> > > > > > > > >> 1. Thanks for the comment. You are right that normally
> >> there
> >> > > > > should
> >> > > > > > be
> >> > > > > > > > >> just
> >> > > > > > > > >> one controller request because of muting,
> >> > > > > > > > >> and I had NOT intended to say there would be many
> >> enqueued
> >> > > > > > controller
> >> > > > > > > > >> requests.
> >> > > > > > > > >> I went through the KIP again, and I'm not sure which
> part
> >> > > > conveys
> >> > > > > > that
> >> > > > > > > > >> info.
> >> > > > > > > > >> I'd be happy to revise if you point it out the section.
> >> > > > > > > > >>
> >> > > > > > > > >> 2. Though it should not happen in normal conditions,
> the
> >> > > current
> >> > > > > > > design
> >> > > > > > > > >> does not preclude multiple controllers running
> >> > > > > > > > >> at the same time, hence if we don't have the controller
> >> > queue
> >> > > > > > capacity
> >> > > > > > > > >> config and simply make its capacity to be 1,
> >> > > > > > > > >> network threads handling requests from different
> >> controllers
> >> > > > will
> >> > > > > be
> >> > > > > > > > >> blocked during those troublesome times,
> >> > > > > > > > >> which is probably not what we want. On the other hand,
> >> > adding
> >> > > > the
> >> > > > > > > extra
> >> > > > > > > > >> config with a default value, say 20, guards us from
> >> issues
> >> > in
> >> > > > > those
> >> > > > > > > > >> troublesome times, and IMO there isn't much downside of
> >> > adding
> >> > > > the
> >> > > > > > > extra
> >> > > > > > > > >> config.
> >> > > > > > > > >>
> >> > > > > > > > >> @Mayuresh
> >> > > > > > > > >> Good catch, this sentence is an obsolete statement
> based
> >> on
> >> > a
> >> > > > > > previous
> >> > > > > > > > >> design. I've revised the wording in the KIP.
> >> > > > > > > > >>
> >> > > > > > > > >> Thanks,
> >> > > > > > > > >> Lucas
> >> > > > > > > > >>
> >> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> >> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> >> > > > > > > > >>
> >> > > > > > > > >> > Hi Lucas,
> >> > > > > > > > >> >
> >> > > > > > > > >> > Thanks for the KIP.
> >> > > > > > > > >> > I am trying to understand why you think "The memory
> >> > > > consumption
> >> > > > > > can
> >> > > > > > > > rise
> >> > > > > > > > >> > given the total number of queued requests can go up
> to
> >> 2x"
> >> > > in
> >> > > > > the
> >> > > > > > > > impact
> >> > > > > > > > >> > section. Normally the requests from controller to a
> >> Broker
> >> > > are
> >> > > > > not
> >> > > > > > > > high
> >> > > > > > > > >> > volume, right ?
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > Thanks,
> >> > > > > > > > >> >
> >> > > > > > > > >> > Mayuresh
> >> > > > > > > > >> >
> >> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> >> > > > > becket.qin@gmail.com>
> >> > > > > > > > >> wrote:
> >> > > > > > > > >> >
> >> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> >> plane
> >> > > from
> >> > > > > the
> >> > > > > > > > data
> >> > > > > > > > >> > plane
> >> > > > > > > > >> > > makes a lot of sense.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > In the KIP you mentioned that the controller
> request
> >> > queue
> >> > > > may
> >> > > > > > > have
> >> > > > > > > > >> many
> >> > > > > > > > >> > > requests in it. Will this be a common case? The
> >> > controller
> >> > > > > > > requests
> >> > > > > > > > >> still
> >> > > > > > > > >> > > goes through the SocketServer. The SocketServer
> will
> >> > mute
> >> > > > the
> >> > > > > > > > channel
> >> > > > > > > > >> > once
> >> > > > > > > > >> > > a request is read and put into the request channel.
> >> So
> >> > > > > assuming
> >> > > > > > > > there
> >> > > > > > > > >> is
> >> > > > > > > > >> > > only one connection between controller and each
> >> broker,
> >> > on
> >> > > > the
> >> > > > > > > > broker
> >> > > > > > > > >> > side,
> >> > > > > > > > >> > > there should be only one controller request in the
> >> > > > controller
> >> > > > > > > > request
> >> > > > > > > > >> > queue
> >> > > > > > > > >> > > at any given time. If that is the case, do we need
> a
> >> > > > separate
> >> > > > > > > > >> controller
> >> > > > > > > > >> > > request queue capacity config? The default value 20
> >> > means
> >> > > > that
> >> > > > > > we
> >> > > > > > > > >> expect
> >> > > > > > > > >> > > there are 20 controller switches to happen in a
> short
> >> > > period
> >> > > > > of
> >> > > > > > > > time.
> >> > > > > > > > >> I
> >> > > > > > > > >> > am
> >> > > > > > > > >> > > not sure whether someone should increase the
> >> controller
> >> > > > > request
> >> > > > > > > > queue
> >> > > > > > > > >> > > capacity to handle such case, as it seems
> indicating
> >> > > > something
> >> > > > > > > very
> >> > > > > > > > >> wrong
> >> > > > > > > > >> > > has happened.
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > Thanks,
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > Jiangjie (Becket) Qin
> >> > > > > > > > >> > >
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> >> > > > > lindong28@gmail.com>
> >> > > > > > > > >> wrote:
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > Thanks for the update Lucas.
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > I think the motivation section is intuitive. It
> >> will
> >> > be
> >> > > > good
> >> > > > > > to
> >> > > > > > > > >> learn
> >> > > > > > > > >> > > more
> >> > > > > > > > >> > > > about the comments from other reviewers.
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> >> > > > > > > > lucasatucla@gmail.com>
> >> > > > > > > > >> > > wrote:
> >> > > > > > > > >> > > >
> >> > > > > > > > >> > > > > Hi Dong,
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > I've updated the motivation section of the KIP
> by
> >> > > > > explaining
> >> > > > > > > the
> >> > > > > > > > >> > cases
> >> > > > > > > > >> > > > that
> >> > > > > > > > >> > > > > would have user impacts.
> >> > > > > > > > >> > > > > Please take a look at let me know your
> comments.
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > Thanks,
> >> > > > > > > > >> > > > > Lucas
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> >> > > > > > > > lucasatucla@gmail.com
> >> > > > > > > > >> >
> >> > > > > > > > >> > > > wrote:
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > > Hi Dong,
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > The simulation of disk being slow is merely
> >> for me
> >> > > to
> >> > > > > > easily
> >> > > > > > > > >> > > construct
> >> > > > > > > > >> > > > a
> >> > > > > > > > >> > > > > > testing scenario
> >> > > > > > > > >> > > > > > with a backlog of produce requests. In
> >> production,
> >> > > > other
> >> > > > > > > than
> >> > > > > > > > >> the
> >> > > > > > > > >> > > disk
> >> > > > > > > > >> > > > > > being slow, a backlog of
> >> > > > > > > > >> > > > > > produce requests may also be caused by high
> >> > produce
> >> > > > QPS.
> >> > > > > > > > >> > > > > > In that case, we may not want to kill the
> >> broker
> >> > and
> >> > > > > > that's
> >> > > > > > > > when
> >> > > > > > > > >> > this
> >> > > > > > > > >> > > > KIP
> >> > > > > > > > >> > > > > > can be useful, both for JBOD
> >> > > > > > > > >> > > > > > and non-JBOD setup.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Going back to your previous question about
> each
> >> > > > > > > ProduceRequest
> >> > > > > > > > >> > > covering
> >> > > > > > > > >> > > > > 20
> >> > > > > > > > >> > > > > > partitions that are randomly
> >> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request
> >> is
> >> > > > > enqueued
> >> > > > > > > that
> >> > > > > > > > >> > tries
> >> > > > > > > > >> > > to
> >> > > > > > > > >> > > > > > switch the current broker, say broker0, from
> >> > leader
> >> > > to
> >> > > > > > > > follower
> >> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*.
> For
> >> the
> >> > > > sake
> >> > > > > of
> >> > > > > > > > >> > argument,
> >> > > > > > > > >> > > > > > let's also assume the other brokers, say
> >> broker1,
> >> > > have
> >> > > > > > > > *stopped*
> >> > > > > > > > >> > > > fetching
> >> > > > > > > > >> > > > > > from
> >> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> >> > > > > > > > >> > > > > > 1. If the enqueued produce requests have
> acks =
> >> > -1
> >> > > > > (ALL)
> >> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> >> ahead
> >> > of
> >> > > > > > > > >> LeaderAndISR
> >> > > > > > > > >> > > will
> >> > > > > > > > >> > > > be
> >> > > > > > > > >> > > > > > put into the purgatory,
> >> > > > > > > > >> > > > > >         and since they'll never be replicated
> >> to
> >> > > other
> >> > > > > > > brokers
> >> > > > > > > > >> > > (because
> >> > > > > > > > >> > > > > of
> >> > > > > > > > >> > > > > > the assumption made above), they will
> >> > > > > > > > >> > > > > >         be completed either when the
> >> LeaderAndISR
> >> > > > > request
> >> > > > > > is
> >> > > > > > > > >> > > processed
> >> > > > > > > > >> > > > or
> >> > > > > > > > >> > > > > > when the timeout happens.
> >> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> >> > > > transition
> >> > > > > > the
> >> > > > > > > > >> > > partition
> >> > > > > > > > >> > > > > > test-0 to become a follower,
> >> > > > > > > > >> > > > > >         after the current broker sees the
> >> > > replication
> >> > > > of
> >> > > > > > the
> >> > > > > > > > >> > > remaining
> >> > > > > > > > >> > > > 19
> >> > > > > > > > >> > > > > > partitions, it can send a response indicating
> >> that
> >> > > > > > > > >> > > > > >         it's no longer the leader for the
> >> > "test-0".
> >> > > > > > > > >> > > > > >   To see the latency difference between 1.1
> and
> >> > 1.2,
> >> > > > > let's
> >> > > > > > > say
> >> > > > > > > > >> > there
> >> > > > > > > > >> > > > are
> >> > > > > > > > >> > > > > > 24K produce requests ahead of the
> LeaderAndISR,
> >> > and
> >> > > > > there
> >> > > > > > > are
> >> > > > > > > > 8
> >> > > > > > > > >> io
> >> > > > > > > > >> > > > > threads,
> >> > > > > > > > >> > > > > >   so each io thread will process
> approximately
> >> > 3000
> >> > > > > > produce
> >> > > > > > > > >> > requests.
> >> > > > > > > > >> > > > Now
> >> > > > > > > > >> > > > > > let's investigate the io thread that finally
> >> > > processed
> >> > > > > the
> >> > > > > > > > >> > > > LeaderAndISR.
> >> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model
> >> the
> >> > > time
> >> > > > > when
> >> > > > > > > > their
> >> > > > > > > > >> > > > > remaining
> >> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999,
> and
> >> > the
> >> > > > > > > > LeaderAndISR
> >> > > > > > > > >> > > > request
> >> > > > > > > > >> > > > > is
> >> > > > > > > > >> > > > > > processed at time t3000.
> >> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> >> would
> >> > > have
> >> > > > > > > waited
> >> > > > > > > > an
> >> > > > > > > > >> > > extra
> >> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
> >> extra
> >> > > > time
> >> > > > > of
> >> > > > > > > > >> t3000 -
> >> > > > > > > > >> > > t1,
> >> > > > > > > > >> > > > > etc.
> >> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> >> > bigger
> >> > > > for
> >> > > > > > the
> >> > > > > > > > >> > earlier
> >> > > > > > > > >> > > > > > produce requests than for the later ones. For
> >> the
> >> > > same
> >> > > > > > > reason,
> >> > > > > > > > >> the
> >> > > > > > > > >> > > more
> >> > > > > > > > >> > > > > > ProduceRequests queued
> >> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit
> >> we
> >> > get
> >> > > > > > (capped
> >> > > > > > > > by
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > > produce timeout).
> >> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
> >> acks=0 or
> >> > > > > acks=1
> >> > > > > > > > >> > > > > >   There will be no latency differences in
> this
> >> > case,
> >> > > > but
> >> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> >> partition
> >> > > > test-0
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR
> will
> >> be
> >> > > > > appended
> >> > > > > > > to
> >> > > > > > > > >> the
> >> > > > > > > > >> > > local
> >> > > > > > > > >> > > > > log,
> >> > > > > > > > >> > > > > >         and eventually be truncated after
> >> > processing
> >> > > > the
> >> > > > > > > > >> > > LeaderAndISR.
> >> > > > > > > > >> > > > > > This is what's referred to as
> >> > > > > > > > >> > > > > >         "some unofficial definition of data
> >> loss
> >> > in
> >> > > > > terms
> >> > > > > > of
> >> > > > > > > > >> > messages
> >> > > > > > > > >> > > > > > beyond the high watermark".
> >> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the
> effect
> >> > > since
> >> > > > if
> >> > > > > > the
> >> > > > > > > > >> > > > LeaderAndISR
> >> > > > > > > > >> > > > > > is immediately processed, the response to
> >> > producers
> >> > > > will
> >> > > > > > > have
> >> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
> >> causing
> >> > > > > producers
> >> > > > > > > to
> >> > > > > > > > >> retry
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > This explanation above is the benefit for
> >> reducing
> >> > > the
> >> > > > > > > latency
> >> > > > > > > > >> of a
> >> > > > > > > > >> > > > > broker
> >> > > > > > > > >> > > > > > becoming the follower,
> >> > > > > > > > >> > > > > > closely related is reducing the latency of a
> >> > broker
> >> > > > > > becoming
> >> > > > > > > > the
> >> > > > > > > > >> > > > leader.
> >> > > > > > > > >> > > > > > In this case, the benefit is even more
> >> obvious, if
> >> > > > other
> >> > > > > > > > brokers
> >> > > > > > > > >> > have
> >> > > > > > > > >> > > > > > resigned leadership, and the
> >> > > > > > > > >> > > > > > current broker should take leadership. Any
> >> delay
> >> > in
> >> > > > > > > processing
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> >> > > > > > > > >> > > > > > by clients as unavailability. In extreme
> cases,
> >> > this
> >> > > > can
> >> > > > > > > cause
> >> > > > > > > > >> > failed
> >> > > > > > > > >> > > > > > produce requests if the retries are
> >> > > > > > > > >> > > > > > exhausted.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Another two types of controller requests are
> >> > > > > > UpdateMetadata
> >> > > > > > > > and
> >> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> >> > follows:
> >> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> processing
> >> > > means
> >> > > > > > > clients
> >> > > > > > > > >> > > receiving
> >> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong
> leadership
> >> > info
> >> > > > > > > > >> > > > > > for certain partitions, and the effect is
> more
> >> > > retries
> >> > > > > or
> >> > > > > > > even
> >> > > > > > > > >> > fatal
> >> > > > > > > > >> > > > > > failure if the retries are exhausted.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
> >> may
> >> > > > > degrade
> >> > > > > > > the
> >> > > > > > > > >> > > > performance
> >> > > > > > > > >> > > > > > of topic deletion.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Regarding your last question of the delay for
> >> > > > > > > > >> > DescribeLogDirsRequest,
> >> > > > > > > > >> > > > you
> >> > > > > > > > >> > > > > > are right
> >> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
> >> > > getting
> >> > > > > the
> >> > > > > > > log
> >> > > > > > > > >> dirs
> >> > > > > > > > >> > > > info,
> >> > > > > > > > >> > > > > > and it's only relevant
> >> > > > > > > > >> > > > > > when controller requests are involved.
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > Regards,
> >> > > > > > > > >> > > > > > Lucas
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> >> > > > > > > lindong28@gmail.com
> >> > > > > > > > >
> >> > > > > > > > >> > > wrote:
> >> > > > > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> Hey Jun,
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
> >> point.
> >> > So
> >> > > > the
> >> > > > > > > > feature
> >> > > > > > > > >> may
> >> > > > > > > > >> > > be
> >> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one
> question
> >> > > below.
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Hey Lucas,
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Do you think this feature is also useful for
> >> > > non-JBOD
> >> > > > > > setup
> >> > > > > > > > or
> >> > > > > > > > >> it
> >> > > > > > > > >> > is
> >> > > > > > > > >> > > > > only
> >> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful
> to
> >> > > > > understand
> >> > > > > > > > this.
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in
> order
> >> to
> >> > > move
> >> > > > > > > leaders
> >> > > > > > > > >> on
> >> > > > > > > > >> > the
> >> > > > > > > > >> > > > > >> failed
> >> > > > > > > > >> > > > > >> disk to other disks, the system operator
> first
> >> > > needs
> >> > > > to
> >> > > > > > get
> >> > > > > > > > the
> >> > > > > > > > >> > list
> >> > > > > > > > >> > > > of
> >> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> >> currently
> >> > > > > achieved
> >> > > > > > > > using
> >> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> >> > > > > > > > >> DescribeLogDirsRequest
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
> >> > > > requests,
> >> > > > > > then
> >> > > > > > > > the
> >> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> >> > > > > > > > >> > > > > >> may still take a long time to be processed
> by
> >> the
> >> > > > > broker.
> >> > > > > > > So
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > overall
> >> > > > > > > > >> > > > > >> time to move leaders away from the failed
> disk
> >> > may
> >> > > > > still
> >> > > > > > be
> >> > > > > > > > >> long
> >> > > > > > > > >> > > even
> >> > > > > > > > >> > > > > with
> >> > > > > > > > >> > > > > >> this KIP. What do you think?
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> Thanks,
> >> > > > > > > > >> > > > > >> Dong
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> >> > > > > > > > >> lucasatucla@gmail.com
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >>
> >> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > @Dong,
> >> > > > > > > > >> > > > > >> > Since both of the two comments in your
> >> previous
> >> > > > email
> >> > > > > > are
> >> > > > > > > > >> about
> >> > > > > > > > >> > > the
> >> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
> >> useful,
> >> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you
> agree
> >> > that
> >> > > > > this
> >> > > > > > > KIP
> >> > > > > > > > >> can
> >> > > > > > > > >> > be
> >> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> >> > > > > > > > >> > > > > >> > Please let me know, thanks!
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > Regards,
> >> > > > > > > > >> > > > > >> > Lucas
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> >> > > > > > > jun@confluent.io>
> >> > > > > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> >
> >> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> >> > probably
> >> > > > > > should
> >> > > > > > > > just
> >> > > > > > > > >> > kill
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> >> help.
> >> > If
> >> > > > > only
> >> > > > > > > one
> >> > > > > > > > of
> >> > > > > > > > >> > the
> >> > > > > > > > >> > > > > disks
> >> > > > > > > > >> > > > > >> on
> >> > > > > > > > >> > > > > >> > a
> >> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail
> that
> >> > disk
> >> > > > and
> >> > > > > > move
> >> > > > > > > > the
> >> > > > > > > > >> > > > leaders
> >> > > > > > > > >> > > > > on
> >> > > > > > > > >> > > > > >> > that
> >> > > > > > > > >> > > > > >> > > disk to other brokers. In that case,
> being
> >> > able
> >> > > > to
> >> > > > > > > > process
> >> > > > > > > > >> the
> >> > > > > > > > >> > > > > >> > LeaderAndIsr
> >> > > > > > > > >> > > > > >> > > requests faster will potentially help
> the
> >> > > > producers
> >> > > > > > > > recover
> >> > > > > > > > >> > > > quicker.
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > Thanks,
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > Jun
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong
> Lin <
> >> > > > > > > > >> lindong28@gmail.com
> >> > > > > > > > >> > >
> >> > > > > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > > Hey Lucas,
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> >> > > questions
> >> > > > > > below.
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> >> covers
> >> > 20
> >> > > > > > > > partitions
> >> > > > > > > > >> > that
> >> > > > > > > > >> > > > are
> >> > > > > > > > >> > > > > >> > > randomly
> >> > > > > > > > >> > > > > >> > > > distributed across all partitions,
> then
> >> > each
> >> > > > > > > > >> ProduceRequest
> >> > > > > > > > >> > > will
> >> > > > > > > > >> > > > > >> likely
> >> > > > > > > > >> > > > > >> > > > cover some partitions for which the
> >> broker
> >> > is
> >> > > > > still
> >> > > > > > > > >> leader
> >> > > > > > > > >> > > after
> >> > > > > > > > >> > > > > it
> >> > > > > > > > >> > > > > >> > > quickly
> >> > > > > > > > >> > > > > >> > > > processes the
> >> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
> >> still
> >> > > be
> >> > > > > slow
> >> > > > > > > in
> >> > > > > > > > >> > > > processing
> >> > > > > > > > >> > > > > >> these
> >> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still
> be
> >> > very
> >> > > > > high
> >> > > > > > > with
> >> > > > > > > > >> this
> >> > > > > > > > >> > > > KIP.
> >> > > > > > > > >> > > > > It
> >> > > > > > > > >> > > > > >> > > seems
> >> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> >> timeout
> >> > > > after
> >> > > > > > 30
> >> > > > > > > > >> > seconds.
> >> > > > > > > > >> > > Is
> >> > > > > > > > >> > > > > >> this
> >> > > > > > > > >> > > > > >> > > > understanding correct?
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest
> will
> >> > > still
> >> > > > > > > timeout
> >> > > > > > > > >> after
> >> > > > > > > > >> > > 30
> >> > > > > > > > >> > > > > >> > seconds,
> >> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> >> reduces
> >> > > > > average
> >> > > > > > > > >> produce
> >> > > > > > > > >> > > > > latency.
> >> > > > > > > > >> > > > > >> Can
> >> > > > > > > > >> > > > > >> > > you
> >> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved
> by
> >> > this
> >> > > > KIP?
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
> >> cares
> >> > > > > number
> >> > > > > > of
> >> > > > > > > > >> > > truncated
> >> > > > > > > > >> > > > > >> > messages.
> >> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> average
> >> > > > > throughput
> >> > > > > > > or
> >> > > > > > > > >> > reduce
> >> > > > > > > > >> > > > > >> message
> >> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> >> understand
> >> > > > this.
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > Dong
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
> >> Wang <
> >> > > > > > > > >> > > lucasatucla@gmail.com
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> > > >
> >> > > > > > > > >> > > > > >> > > > > Hi Dong,
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> >> Please
> >> > > see
> >> > > > > my
> >> > > > > > > > reply
> >> > > > > > > > >> > > below.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> >> > partition.
> >> > > > Now
> >> > > > > > > let's
> >> > > > > > > > >> > > consider
> >> > > > > > > > >> > > > a
> >> > > > > > > > >> > > > > >> more
> >> > > > > > > > >> > > > > >> > > > common
> >> > > > > > > > >> > > > > >> > > > > scenario
> >> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> >> > > > partitions.
> >> > > > > > And
> >> > > > > > > > >> let's
> >> > > > > > > > >> > > say
> >> > > > > > > > >> > > > > for
> >> > > > > > > > >> > > > > >> > some
> >> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> >> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> >> > broker0
> >> > > is
> >> > > > > so
> >> > > > > > > > large,
> >> > > > > > > > >> > say
> >> > > > > > > > >> > > > 10K,
> >> > > > > > > > >> > > > > >> that
> >> > > > > > > > >> > > > > >> > > the
> >> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> >> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
> >> the
> >> > > > > > leadership
> >> > > > > > > > >> for a
> >> > > > > > > > >> > > lot
> >> > > > > > > > >> > > > of
> >> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other
> brokers,
> >> > > > > > > > >> > > > > >> > > > > either manually or through some
> >> service
> >> > > like
> >> > > > > > cruise
> >> > > > > > > > >> > control.
> >> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> >> > leadership
> >> > > > > > > > transitions
> >> > > > > > > > >> > > finish
> >> > > > > > > > >> > > > > >> more
> >> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> >> > > becoming
> >> > > > > more
> >> > > > > > > > >> > balanced,
> >> > > > > > > > >> > > > > >> > > > > but all existing producers
> >> corresponding
> >> > to
> >> > > > the
> >> > > > > > 9K
> >> > > > > > > > >> > > partitions
> >> > > > > > > > >> > > > > will
> >> > > > > > > > >> > > > > >> > get
> >> > > > > > > > >> > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> >> > > > > > > > >> > > > > >> > > > > rather than relying on their
> timeout,
> >> > > thanks
> >> > > > to
> >> > > > > > the
> >> > > > > > > > >> > batched
> >> > > > > > > > >> > > > > async
> >> > > > > > > > >> > > > > >> ZK
> >> > > > > > > > >> > > > > >> > > > > operations.
> >> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> >> > during
> >> > > > such
> >> > > > > > > > >> > troublesome
> >> > > > > > > > >> > > > > times.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
> >> have
> >> > > > shown
> >> > > > > > > that
> >> > > > > > > > >> with
> >> > > > > > > > >> > > this
> >> > > > > > > > >> > > > > KIP
> >> > > > > > > > >> > > > > >> > many
> >> > > > > > > > >> > > > > >> > > > > producers
> >> > > > > > > > >> > > > > >> > > > > receive an explicit error
> >> > > > > NotLeaderForPartition,
> >> > > > > > > > based
> >> > > > > > > > >> on
> >> > > > > > > > >> > > > which
> >> > > > > > > > >> > > > > >> they
> >> > > > > > > > >> > > > > >> > > > retry
> >> > > > > > > > >> > > > > >> > > > > immediately.
> >> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> >> seconds+quick
> >> > > > retry)
> >> > > > > > for
> >> > > > > > > > >> their
> >> > > > > > > > >> > > > single
> >> > > > > > > > >> > > > > >> > > message
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > much smaller
> >> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
> >> > > without
> >> > > > > the
> >> > > > > > > KIP
> >> > > > > > > > >> (30
> >> > > > > > > > >> > > > seconds
> >> > > > > > > > >> > > > > >> for
> >> > > > > > > > >> > > > > >> > > > timing
> >> > > > > > > > >> > > > > >> > > > > out + quick retry).
> >> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> >> timing
> >> > > out
> >> > > > on
> >> > > > > > the
> >> > > > > > > > >> > producer
> >> > > > > > > > >> > > > > side
> >> > > > > > > > >> > > > > >> can
> >> > > > > > > > >> > > > > >> > > > > achieve the same result,
> >> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> >> > > > > > drawbacks[1].
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to
> show
> >> the
> >> > > > > number
> >> > > > > > of
> >> > > > > > > > >> > > truncated
> >> > > > > > > > >> > > > > >> > messages
> >> > > > > > > > >> > > > > >> > > on
> >> > > > > > > > >> > > > > >> > > > > brokers,
> >> > > > > > > > >> > > > > >> > > > > with the experiments done in the
> >> Google
> >> > > Doc,
> >> > > > it
> >> > > > > > > > should
> >> > > > > > > > >> be
> >> > > > > > > > >> > > easy
> >> > > > > > > > >> > > > > to
> >> > > > > > > > >> > > > > >> see
> >> > > > > > > > >> > > > > >> > > > that
> >> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> >> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> >> > > > up-to-date
> >> > > > > > > > >> metadata
> >> > > > > > > > >> > > > avoids
> >> > > > > > > > >> > > > > >> > > appending
> >> > > > > > > > >> > > > > >> > > > > of messages
> >> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If
> we
> >> > talk
> >> > > > to a
> >> > > > > > > > system
> >> > > > > > > > >> > > > operator
> >> > > > > > > > >> > > > > >> and
> >> > > > > > > > >> > > > > >> > ask
> >> > > > > > > > >> > > > > >> > > > > whether
> >> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I
> bet
> >> > most
> >> > > > > likely
> >> > > > > > > the
> >> > > > > > > > >> > answer
> >> > > > > > > > >> > > > is
> >> > > > > > > > >> > > > > >> yes.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think
> it
> >> > > might
> >> > > > be
> >> > > > > > > > >> helpful to
> >> > > > > > > > >> > > > > >> construct
> >> > > > > > > > >> > > > > >> > > some
> >> > > > > > > > >> > > > > >> > > > > formulas.
> >> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
> >> back
> >> > to
> >> > > > the
> >> > > > > > > case
> >> > > > > > > > >> where
> >> > > > > > > > >> > > > there
> >> > > > > > > > >> > > > > >> is
> >> > > > > > > > >> > > > > >> > > only
> >> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> >> > > > > > > > >> > > > > >> > > > > Following the experiments in the
> >> Google
> >> > > Doc,
> >> > > > > > let's
> >> > > > > > > > say
> >> > > > > > > > >> > > broker0
> >> > > > > > > > >> > > > > >> > becomes
> >> > > > > > > > >> > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > > > follower at time t0,
> >> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
> >> produce
> >> > > > > requests
> >> > > > > > in
> >> > > > > > > > its
> >> > > > > > > > >> > > > request
> >> > > > > > > > >> > > > > >> > queue.
> >> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought
> >> by
> >> > > this
> >> > > > > KIP,
> >> > > > > > > > >> broker0
> >> > > > > > > > >> > > can
> >> > > > > > > > >> > > > > >> reply
> >> > > > > > > > >> > > > > >> > > with
> >> > > > > > > > >> > > > > >> > > > an
> >> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> >> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> >> > > processing
> >> > > > > > time
> >> > > > > > > of
> >> > > > > > > > >> > > replying
> >> > > > > > > > >> > > > > >> with
> >> > > > > > > > >> > > > > >> > > such
> >> > > > > > > > >> > > > > >> > > > an
> >> > > > > > > > >> > > > > >> > > > > error message.
> >> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
> >> need to
> >> > > > > append
> >> > > > > > > > >> messages
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > > >> > segments,
> >> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> >> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> >> > > processing
> >> > > > > > time
> >> > > > > > > > for
> >> > > > > > > > >> > such
> >> > > > > > > > >> > > > > logic.
> >> > > > > > > > >> > > > > >> > > > > Then the average extra latency
> >> incurred
> >> > > > without
> >> > > > > > > this
> >> > > > > > > > >> KIP
> >> > > > > > > > >> > is
> >> > > > > > > > >> > > N
> >> > > > > > > > >> > > > *
> >> > > > > > > > >> > > > > >> (M2 -
> >> > > > > > > > >> > > > > >> > > > M1) /
> >> > > > > > > > >> > > > > >> > > > > 2.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
> >> larger
> >> > > than
> >> > > > > M1,
> >> > > > > > > > which
> >> > > > > > > > >> > means
> >> > > > > > > > >> > > > as
> >> > > > > > > > >> > > > > >> long
> >> > > > > > > > >> > > > > >> > > as N
> >> > > > > > > > >> > > > > >> > > > > is positive,
> >> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> >> average
> >> > > > > latency.
> >> > > > > > > > >> > > > > >> > > > > There does not need to be
> significant
> >> > > backlog
> >> > > > > of
> >> > > > > > > > >> requests
> >> > > > > > > > >> > in
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > > request
> >> > > > > > > > >> > > > > >> > > > > queue,
> >> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> >> performance
> >> > > to
> >> > > > > have
> >> > > > > > > the
> >> > > > > > > > >> > > > > improvement.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > Regards,
> >> > > > > > > > >> > > > > >> > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> >> timeout on
> >> > > the
> >> > > > > > > > producer
> >> > > > > > > > >> > side
> >> > > > > > > > >> > > > can
> >> > > > > > > > >> > > > > >> > trigger
> >> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> >> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker
> >> is
> >> > > > > > overloaded,
> >> > > > > > > > >> > > > exacerbating
> >> > > > > > > > >> > > > > >> the
> >> > > > > > > > >> > > > > >> > > > > situation.
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
> >> Lin
> >> > <
> >> > > > > > > > >> > > lindong28@gmail.com
> >> > > > > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > wrote:
> >> > > > > > > > >> > > > > >> > > > >
> >> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> >> > > documentation
> >> > > > of
> >> > > > > > the
> >> > > > > > > > >> > > > experiment.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> >> > separate
> >> > > > > queue
> >> > > > > > > for
> >> > > > > > > > >> > > > controller
> >> > > > > > > > >> > > > > >> > > requests
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned
> in
> >> the
> >> > > > > summary
> >> > > > > > > > >> section
> >> > > > > > > > >> > of
> >> > > > > > > > >> > > > the
> >> > > > > > > > >> > > > > >> > Google
> >> > > > > > > > >> > > > > >> > > > > doc,
> >> > > > > > > > >> > > > > >> > > > > > controller requests are generally
> >> more
> >> > > > > > important
> >> > > > > > > > than
> >> > > > > > > > >> > data
> >> > > > > > > > >> > > > > >> requests
> >> > > > > > > > >> > > > > >> > > and
> >> > > > > > > > >> > > > > >> > > > > we
> >> > > > > > > > >> > > > > >> > > > > > probably want controller requests
> >> to be
> >> > > > > > processed
> >> > > > > > > > >> > sooner.
> >> > > > > > > > >> > > > But
> >> > > > > > > > >> > > > > >> then
> >> > > > > > > > >> > > > > >> > > Eno
> >> > > > > > > > >> > > > > >> > > > > has
> >> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
> >> not
> >> > > sure
> >> > > > > the
> >> > > > > > > > >> Google
> >> > > > > > > > >> > doc
> >> > > > > > > > >> > > > has
> >> > > > > > > > >> > > > > >> > > answered
> >> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with
> the
> >> > > > following
> >> > > > > > > > >> questions?
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is
> the
> >> > > actual
> >> > > > > > > benefit
> >> > > > > > > > of
> >> > > > > > > > >> > > > KIP-291
> >> > > > > > > > >> > > > > to
> >> > > > > > > > >> > > > > >> > > users.
> >> > > > > > > > >> > > > > >> > > > > The
> >> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> >> > > > simulates
> >> > > > > > the
> >> > > > > > > > >> > scenario
> >> > > > > > > > >> > > > that
> >> > > > > > > > >> > > > > >> > broker
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest
> >> due
> >> > to
> >> > > > e.g.
> >> > > > > > > slow
> >> > > > > > > > >> disk.
> >> > > > > > > > >> > > It
> >> > > > > > > > >> > > > > >> > currently
> >> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
> >> partition.
> >> > > But
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > >> > common
> >> > > > > > > > >> > > > > >> scenario,
> >> > > > > > > > >> > > > > >> > > it
> >> > > > > > > > >> > > > > >> > > > is
> >> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> >> > there
> >> > > > are
> >> > > > > > many
> >> > > > > > > > >> other
> >> > > > > > > > >> > > > > >> partitions
> >> > > > > > > > >> > > > > >> > > that
> >> > > > > > > > >> > > > > >> > > > > are
> >> > > > > > > > >> > > > > >> > > > > > also actively produced to and
> >> > > > ProduceRequest
> >> > > > > to
> >> > > > > > > > these
> >> > > > > > > > >> > > > > partition
> >> > > > > > > > >> > > > > >> > also
> >> > > > > > > > >> > > > > >> > > > > takes
> >> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
> >> even
> >> > > if
> >> > > > > > > broker0
> >> > > > > > > > >> can
> >> > > > > > > > >> > > > become
> >> > > > > > > > >> > > > > >> > > follower
> >> > > > > > > > >> > > > > >> > > > > for
> >> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
> >> still
> >> > > > needs
> >> > > > > > to
> >> > > > > > > > >> process
> >> > > > > > > > >> > > the
> >> > > > > > > > >> > > > > >> > > > > ProduceRequest
> >> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because
> these
> >> > > > > > > ProduceRequests
> >> > > > > > > > >> > cover
> >> > > > > > > > >> > > > > other
> >> > > > > > > > >> > > > > >> > > > > partitions.
> >> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will
> still
> >> > > timeout
> >> > > > > > after
> >> > > > > > > > 30
> >> > > > > > > > >> > > seconds
> >> > > > > > > > >> > > > > and
> >> > > > > > > > >> > > > > >> > most
> >> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
> >> after
> >> > > 30
> >> > > > > > > seconds.
> >> > > > > > > > >> Then
> >> > > > > > > > >> > > it
> >> > > > > > > > >> > > > is
> >> > > > > > > > >> > > > > >> not
> >> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
> >> client
> >> > > > since
> >> > > > > > > > client
> >> > > > > > > > >> > will
> >> > > > > > > > >> > > > > >> timeout
> >> > > > > > > > >> > > > > >> > > after
> >> > > > > > > > >> > > > > >> > > > > 30
> >> > > > > > > > >> > > > > >> > > > > > seconds before possibly
> >> re-connecting
> >> > to
> >> > > > > > broker1,
> >> > > > > > > > >> with
> >> > > > > > > > >> > or
> >> > > > > > > > >> > > > > >> without
> >> > > > > > > > >> > > > > >> > > > > KIP-291.
> >> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> >> > > specific
> >> > > > > > > > benefits
> >> > > > > > > > >> of
> >> > > > > > > > >> > > this
> >> > > > > > > > >> > > > > >> KIP to
> >> > > > > > > > >> > > > > >> > > > user
> >> > > > > > > > >> > > > > >> > > > > or
> >> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
> >> this
> >> > > KIP
> >> > > > > > > > decreases
> >> > > > > > > > >> > > > average
> >> > > > > > > > >> > > > > >> > > latency,
> >> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably
> >> of
> >> > > > > exception
> >> > > > > > > > >> exposed
> >> > > > > > > > >> > to
> >> > > > > > > > >> > > > > >> client
> >> > > > > > > > >> > > > > >> > > etc.
> >> > > > > > > > >> > > > > >> > > > It
> >> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify
> this.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> >> > > > experience
> >> > > > > > > only
> >> > > > > > > > >> when
> >> > > > > > > > >> > > > there
> >> > > > > > > > >> > > > > is
> >> > > > > > > > >> > > > > >> > > issue
> >> > > > > > > > >> > > > > >> > > > > with
> >> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog
> in
> >> the
> >> > > > > request
> >> > > > > > > > queue
> >> > > > > > > > >> > due
> >> > > > > > > > >> > > to
> >> > > > > > > > >> > > > > >> slow
> >> > > > > > > > >> > > > > >> > > disk
> >> > > > > > > > >> > > > > >> > > > as
> >> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
> >> this
> >> > > KIP
> >> > > > > > also
> >> > > > > > > > >> useful
> >> > > > > > > > >> > > when
> >> > > > > > > > >> > > > > >> there
> >> > > > > > > > >> > > > > >> > is
> >> > > > > > > > >> > > > > >> > > > no
> >> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
> >> might
> >> > be
> >> > > > > > helpful
> >> > > > > > > > to
> >> > > > > > > > >> > > clarify
> >> > > > > > > > >> > > > > >> this
> >> > > > > > > > >> > > > > >> > to
> >> > > > > > > > >> > > > > >> > > > > > understand the benefit of this
> KIP.
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > Thanks much,
> >> > > > > > > > >> > > > > >> > > > > > Dong
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
> >> Lucas
> >> > > > Wang <
> >> > > > > > > > >> > > > > >> lucasatucla@gmail.com
> >> > > > > > > > >> > > > > >> > >
> >> > > > > > > > >> > > > > >> > > > > wrote:
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting
> the
> >> > > > > experiment
> >> > > > > > > > >> results.
> >> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
> >> impact
> >> > > > > > achieved
> >> > > > > > > by
> >> > > > > > > > >> > > > > implementing
> >> > > > > > > > >> > > > > >> > the
> >> > > > > > > > >> > > > > >> > > > > > proposed
> >> > > > > > > > >> > > > > >> > > > > > > change:
> >> > > > > > > > >> > > > > >> > > > > > >
> >> https://docs.google.com/document/d/
> >> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> >> > > > > > > > >> > > > > >> > > > > > >
> >> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> >> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
> >> time
> >> > > and
> >> > > > > let
> >> > > > > > > me
> >> > > > > > > > >> know
> >> > > > > > > > >> > > your
> >> > > > > > > > >> > > > > >> > > feedback.
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > Regards,
> >> > > > > > > > >> > > > > >> > > > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> >> > > Harsha <
> >> > > > > > > > >> > > kafka@harsha.io>
> >> > > > > > > > >> > > > > >> wrote:
> >> > > > > > > > >> > > > > >> > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
> >> take a
> >> > > > look
> >> > > > > > > might
> >> > > > > > > > >> suit
> >> > > > > > > > >> > > our
> >> > > > > > > > >> > > > > >> > > > requirements
> >> > > > > > > > >> > > > > >> > > > > > > > better.
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > > > > > Harsha
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
> >> PM,
> >> > > > Lucas
> >> > > > > > > Wang <
> >> > > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> >> > > > > > > > >> > > > > >> > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > wrote:
> >> > > > > > > > >> > > > > >> > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly,
> the
> >> > > > > > replication
> >> > > > > > > > >> quota
> >> > > > > > > > >> > > > > mechanism
> >> > > > > > > > >> > > > > >> > > > proposed
> >> > > > > > > > >> > > > > >> > > > > > in
> >> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in
> that
> >> > > > scenario.
> >> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
> >> > > > > > > > >> > > > > >> > > > > > > > > Lucas
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >> > > > > > > > >> > > > > >> > > > > > > > >
> >>
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

Actually nvm, correlationId is reset in case of connection loss, I think.

Thanks,

Mayuresh

On Thu, Jul 19, 2018 at 11:11 AM Mayuresh Gharat <gh...@gmail.com>
wrote:

> I agree with Dong that out-of-order processing can happen with having 2
> separate queues as well and it can even happen today.
> Can we use the correlationId in the request from the controller to the
> broker to handle ordering ?
>
> Thanks,
>
> Mayuresh
>
>
> On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com> wrote:
>
>> Good point, Joel. I agree that a dedicated controller request handling
>> thread would be a better isolation. It also solves the reordering issue.
>>
>> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com> wrote:
>>
>> > Good example. I think this scenario can occur in the current code as
>> well
>> > but with even lower probability given that there are other
>> non-controller
>> > requests interleaved. It is still sketchy though and I think a safer
>> > approach would be separate queues and pinning controller request
>> handling
>> > to one handler thread.
>> >
>> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com> wrote:
>> >
>> > > Hey Becket,
>> > >
>> > > I think you are right that there may be out-of-order processing.
>> However,
>> > > it seems that out-of-order processing may also happen even if we use a
>> > > separate queue.
>> > >
>> > > Here is the example:
>> > >
>> > > - Controller sends R1 and got disconnected before receiving response.
>> > Then
>> > > it reconnects and sends R2. Both requests now stay in the controller
>> > > request queue in the order they are sent.
>> > > - thread1 takes R1_a from the request queue and then thread2 takes R2
>> > from
>> > > the request queue almost at the same time.
>> > > - So R1_a and R2 are processed in parallel. There is chance that R2's
>> > > processing is completed before R1.
>> > >
>> > > If out-of-order processing can happen for both approaches with very
>> low
>> > > probability, it may not be worthwhile to add the extra queue. What do
>> you
>> > > think?
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> > >
>> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Mayuresh/Joel,
>> > > >
>> > > > Using the request channel as a dequeue was bright up some time ago
>> when
>> > > we
>> > > > initially thinking of prioritizing the request. The concern was that
>> > the
>> > > > controller requests are supposed to be processed in order. If we can
>> > > ensure
>> > > > that there is one controller request in the request channel, the
>> order
>> > is
>> > > > not a concern. But in cases that there are more than one controller
>> > > request
>> > > > inserted into the queue, the controller request order may change and
>> > > cause
>> > > > problem. For example, think about the following sequence:
>> > > > 1. Controller successfully sent a request R1 to broker
>> > > > 2. Broker receives R1 and put the request to the head of the request
>> > > queue.
>> > > > 3. Controller to broker connection failed and the controller
>> > reconnected
>> > > to
>> > > > the broker.
>> > > > 4. Controller sends a request R2 to the broker
>> > > > 5. Broker receives R2 and add it to the head of the request queue.
>> > > > Now on the broker side, R2 will be processed before R1 is processed,
>> > > which
>> > > > may cause problem.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
>> > wrote:
>> > > >
>> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
>> > invasive
>> > > > > alternative and it should work. Jun/Becket/others, do you see any
>> > > > pitfalls
>> > > > > with this approach?
>> > > > >
>> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
>> lucasatucla@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > @Mayuresh,
>> > > > > > That's a very interesting idea that I haven't thought before.
>> > > > > > It seems to solve our problem at hand pretty well, and also
>> > > > > > avoids the need to have a new size metric and capacity config
>> > > > > > for the controller request queue. In fact, if we were to adopt
>> > > > > > this design, there is no public interface change, and we
>> > > > > > probably don't need a KIP.
>> > > > > > Also implementation wise, it seems
>> > > > > > the java class LinkedBlockingQueue can readily satisfy the
>> > > requirement
>> > > > > > by supporting a capacity, and also allowing inserting at both
>> ends.
>> > > > > >
>> > > > > > My only concern is that this design is tied to the coincidence
>> that
>> > > > > > we have two request priorities and there are two ends to a
>> deque.
>> > > > > > Hence by using the proposed design, it seems the network layer
>> is
>> > > > > > more tightly coupled with upper layer logic, e.g. if we were to
>> add
>> > > > > > an extra priority level in the future for some reason, we would
>> > > > probably
>> > > > > > need to go back to the design of separate queues, one for each
>> > > priority
>> > > > > > level.
>> > > > > >
>> > > > > > In summary, I'm ok with both designs and lean toward your
>> suggested
>> > > > > > approach.
>> > > > > > Let's hear what others think.
>> > > > > >
>> > > > > > @Becket,
>> > > > > > In light of Mayuresh's suggested new design, I'm answering your
>> > > > question
>> > > > > > only in the context
>> > > > > > of the current KIP design: I think your suggestion makes sense,
>> and
>> > > I'm
>> > > > > ok
>> > > > > > with removing the capacity config and
>> > > > > > just relying on the default value of 20 being sufficient enough.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Lucas
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>> > > > > > gharatmayuresh15@gmail.com
>> > > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Lucas,
>> > > > > > >
>> > > > > > > Seems like the main intent here is to prioritize the
>> controller
>> > > > request
>> > > > > > > over any other requests.
>> > > > > > > In that case, we can change the request queue to a dequeue,
>> where
>> > > you
>> > > > > > > always insert the normal requests (produce, consume,..etc) to
>> the
>> > > end
>> > > > > of
>> > > > > > > the dequeue, but if its a controller request, you insert it to
>> > the
>> > > > head
>> > > > > > of
>> > > > > > > the queue. This ensures that the controller request will be
>> given
>> > > > > higher
>> > > > > > > priority over other requests.
>> > > > > > >
>> > > > > > > Also since we only read one request from the socket and mute
>> it
>> > and
>> > > > > only
>> > > > > > > unmute it after handling the request, this would ensure that
>> we
>> > > don't
>> > > > > > > handle controller requests out of order.
>> > > > > > >
>> > > > > > > With this approach we can avoid the second queue and the
>> > additional
>> > > > > > config
>> > > > > > > for the size of the queue.
>> > > > > > >
>> > > > > > > What do you think ?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Mayuresh
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
>> becket.qin@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey Joel,
>> > > > > > > >
>> > > > > > > > Thank for the detail explanation. I agree the current design
>> > > makes
>> > > > > > sense.
>> > > > > > > > My confusion is about whether the new config for the
>> controller
>> > > > queue
>> > > > > > > > capacity is necessary. I cannot think of a case in which
>> users
>> > > > would
>> > > > > > > change
>> > > > > > > > it.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >
>> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
>> > > becket.qin@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Lucas,
>> > > > > > > > >
>> > > > > > > > > I guess my question can be rephrased to "do we expect
>> user to
>> > > > ever
>> > > > > > > change
>> > > > > > > > > the controller request queue capacity"? If we agree that
>> 20
>> > is
>> > > > > > already
>> > > > > > > a
>> > > > > > > > > very generous default number and we do not expect user to
>> > > change
>> > > > > it,
>> > > > > > is
>> > > > > > > > it
>> > > > > > > > > still necessary to expose this as a config?
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > > >
>> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>> > > > lucasatucla@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> @Becket
>> > > > > > > > >> 1. Thanks for the comment. You are right that normally
>> there
>> > > > > should
>> > > > > > be
>> > > > > > > > >> just
>> > > > > > > > >> one controller request because of muting,
>> > > > > > > > >> and I had NOT intended to say there would be many
>> enqueued
>> > > > > > controller
>> > > > > > > > >> requests.
>> > > > > > > > >> I went through the KIP again, and I'm not sure which part
>> > > > conveys
>> > > > > > that
>> > > > > > > > >> info.
>> > > > > > > > >> I'd be happy to revise if you point it out the section.
>> > > > > > > > >>
>> > > > > > > > >> 2. Though it should not happen in normal conditions, the
>> > > current
>> > > > > > > design
>> > > > > > > > >> does not preclude multiple controllers running
>> > > > > > > > >> at the same time, hence if we don't have the controller
>> > queue
>> > > > > > capacity
>> > > > > > > > >> config and simply make its capacity to be 1,
>> > > > > > > > >> network threads handling requests from different
>> controllers
>> > > > will
>> > > > > be
>> > > > > > > > >> blocked during those troublesome times,
>> > > > > > > > >> which is probably not what we want. On the other hand,
>> > adding
>> > > > the
>> > > > > > > extra
>> > > > > > > > >> config with a default value, say 20, guards us from
>> issues
>> > in
>> > > > > those
>> > > > > > > > >> troublesome times, and IMO there isn't much downside of
>> > adding
>> > > > the
>> > > > > > > extra
>> > > > > > > > >> config.
>> > > > > > > > >>
>> > > > > > > > >> @Mayuresh
>> > > > > > > > >> Good catch, this sentence is an obsolete statement based
>> on
>> > a
>> > > > > > previous
>> > > > > > > > >> design. I've revised the wording in the KIP.
>> > > > > > > > >>
>> > > > > > > > >> Thanks,
>> > > > > > > > >> Lucas
>> > > > > > > > >>
>> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Hi Lucas,
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks for the KIP.
>> > > > > > > > >> > I am trying to understand why you think "The memory
>> > > > consumption
>> > > > > > can
>> > > > > > > > rise
>> > > > > > > > >> > given the total number of queued requests can go up to
>> 2x"
>> > > in
>> > > > > the
>> > > > > > > > impact
>> > > > > > > > >> > section. Normally the requests from controller to a
>> Broker
>> > > are
>> > > > > not
>> > > > > > > > high
>> > > > > > > > >> > volume, right ?
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks,
>> > > > > > > > >> >
>> > > > > > > > >> > Mayuresh
>> > > > > > > > >> >
>> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>> > > > > becket.qin@gmail.com>
>> > > > > > > > >> wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
>> plane
>> > > from
>> > > > > the
>> > > > > > > > data
>> > > > > > > > >> > plane
>> > > > > > > > >> > > makes a lot of sense.
>> > > > > > > > >> > >
>> > > > > > > > >> > > In the KIP you mentioned that the controller request
>> > queue
>> > > > may
>> > > > > > > have
>> > > > > > > > >> many
>> > > > > > > > >> > > requests in it. Will this be a common case? The
>> > controller
>> > > > > > > requests
>> > > > > > > > >> still
>> > > > > > > > >> > > goes through the SocketServer. The SocketServer will
>> > mute
>> > > > the
>> > > > > > > > channel
>> > > > > > > > >> > once
>> > > > > > > > >> > > a request is read and put into the request channel.
>> So
>> > > > > assuming
>> > > > > > > > there
>> > > > > > > > >> is
>> > > > > > > > >> > > only one connection between controller and each
>> broker,
>> > on
>> > > > the
>> > > > > > > > broker
>> > > > > > > > >> > side,
>> > > > > > > > >> > > there should be only one controller request in the
>> > > > controller
>> > > > > > > > request
>> > > > > > > > >> > queue
>> > > > > > > > >> > > at any given time. If that is the case, do we need a
>> > > > separate
>> > > > > > > > >> controller
>> > > > > > > > >> > > request queue capacity config? The default value 20
>> > means
>> > > > that
>> > > > > > we
>> > > > > > > > >> expect
>> > > > > > > > >> > > there are 20 controller switches to happen in a short
>> > > period
>> > > > > of
>> > > > > > > > time.
>> > > > > > > > >> I
>> > > > > > > > >> > am
>> > > > > > > > >> > > not sure whether someone should increase the
>> controller
>> > > > > request
>> > > > > > > > queue
>> > > > > > > > >> > > capacity to handle such case, as it seems indicating
>> > > > something
>> > > > > > > very
>> > > > > > > > >> wrong
>> > > > > > > > >> > > has happened.
>> > > > > > > > >> > >
>> > > > > > > > >> > > Thanks,
>> > > > > > > > >> > >
>> > > > > > > > >> > > Jiangjie (Becket) Qin
>> > > > > > > > >> > >
>> > > > > > > > >> > >
>> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>> > > > > lindong28@gmail.com>
>> > > > > > > > >> wrote:
>> > > > > > > > >> > >
>> > > > > > > > >> > > > Thanks for the update Lucas.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > I think the motivation section is intuitive. It
>> will
>> > be
>> > > > good
>> > > > > > to
>> > > > > > > > >> learn
>> > > > > > > > >> > > more
>> > > > > > > > >> > > > about the comments from other reviewers.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
>> > > > > > > > lucasatucla@gmail.com>
>> > > > > > > > >> > > wrote:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > Hi Dong,
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > I've updated the motivation section of the KIP by
>> > > > > explaining
>> > > > > > > the
>> > > > > > > > >> > cases
>> > > > > > > > >> > > > that
>> > > > > > > > >> > > > > would have user impacts.
>> > > > > > > > >> > > > > Please take a look at let me know your comments.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > Thanks,
>> > > > > > > > >> > > > > Lucas
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
>> > > > > > > > lucasatucla@gmail.com
>> > > > > > > > >> >
>> > > > > > > > >> > > > wrote:
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > > Hi Dong,
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > The simulation of disk being slow is merely
>> for me
>> > > to
>> > > > > > easily
>> > > > > > > > >> > > construct
>> > > > > > > > >> > > > a
>> > > > > > > > >> > > > > > testing scenario
>> > > > > > > > >> > > > > > with a backlog of produce requests. In
>> production,
>> > > > other
>> > > > > > > than
>> > > > > > > > >> the
>> > > > > > > > >> > > disk
>> > > > > > > > >> > > > > > being slow, a backlog of
>> > > > > > > > >> > > > > > produce requests may also be caused by high
>> > produce
>> > > > QPS.
>> > > > > > > > >> > > > > > In that case, we may not want to kill the
>> broker
>> > and
>> > > > > > that's
>> > > > > > > > when
>> > > > > > > > >> > this
>> > > > > > > > >> > > > KIP
>> > > > > > > > >> > > > > > can be useful, both for JBOD
>> > > > > > > > >> > > > > > and non-JBOD setup.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Going back to your previous question about each
>> > > > > > > ProduceRequest
>> > > > > > > > >> > > covering
>> > > > > > > > >> > > > > 20
>> > > > > > > > >> > > > > > partitions that are randomly
>> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request
>> is
>> > > > > enqueued
>> > > > > > > that
>> > > > > > > > >> > tries
>> > > > > > > > >> > > to
>> > > > > > > > >> > > > > > switch the current broker, say broker0, from
>> > leader
>> > > to
>> > > > > > > > follower
>> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For
>> the
>> > > > sake
>> > > > > of
>> > > > > > > > >> > argument,
>> > > > > > > > >> > > > > > let's also assume the other brokers, say
>> broker1,
>> > > have
>> > > > > > > > *stopped*
>> > > > > > > > >> > > > fetching
>> > > > > > > > >> > > > > > from
>> > > > > > > > >> > > > > > the current broker, i.e. broker0.
>> > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
>> > -1
>> > > > > (ALL)
>> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
>> ahead
>> > of
>> > > > > > > > >> LeaderAndISR
>> > > > > > > > >> > > will
>> > > > > > > > >> > > > be
>> > > > > > > > >> > > > > > put into the purgatory,
>> > > > > > > > >> > > > > >         and since they'll never be replicated
>> to
>> > > other
>> > > > > > > brokers
>> > > > > > > > >> > > (because
>> > > > > > > > >> > > > > of
>> > > > > > > > >> > > > > > the assumption made above), they will
>> > > > > > > > >> > > > > >         be completed either when the
>> LeaderAndISR
>> > > > > request
>> > > > > > is
>> > > > > > > > >> > > processed
>> > > > > > > > >> > > > or
>> > > > > > > > >> > > > > > when the timeout happens.
>> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
>> > > > transition
>> > > > > > the
>> > > > > > > > >> > > partition
>> > > > > > > > >> > > > > > test-0 to become a follower,
>> > > > > > > > >> > > > > >         after the current broker sees the
>> > > replication
>> > > > of
>> > > > > > the
>> > > > > > > > >> > > remaining
>> > > > > > > > >> > > > 19
>> > > > > > > > >> > > > > > partitions, it can send a response indicating
>> that
>> > > > > > > > >> > > > > >         it's no longer the leader for the
>> > "test-0".
>> > > > > > > > >> > > > > >   To see the latency difference between 1.1 and
>> > 1.2,
>> > > > > let's
>> > > > > > > say
>> > > > > > > > >> > there
>> > > > > > > > >> > > > are
>> > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
>> > and
>> > > > > there
>> > > > > > > are
>> > > > > > > > 8
>> > > > > > > > >> io
>> > > > > > > > >> > > > > threads,
>> > > > > > > > >> > > > > >   so each io thread will process approximately
>> > 3000
>> > > > > > produce
>> > > > > > > > >> > requests.
>> > > > > > > > >> > > > Now
>> > > > > > > > >> > > > > > let's investigate the io thread that finally
>> > > processed
>> > > > > the
>> > > > > > > > >> > > > LeaderAndISR.
>> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model
>> the
>> > > time
>> > > > > when
>> > > > > > > > their
>> > > > > > > > >> > > > > remaining
>> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
>> > the
>> > > > > > > > LeaderAndISR
>> > > > > > > > >> > > > request
>> > > > > > > > >> > > > > is
>> > > > > > > > >> > > > > > processed at time t3000.
>> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
>> would
>> > > have
>> > > > > > > waited
>> > > > > > > > an
>> > > > > > > > >> > > extra
>> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
>> extra
>> > > > time
>> > > > > of
>> > > > > > > > >> t3000 -
>> > > > > > > > >> > > t1,
>> > > > > > > > >> > > > > etc.
>> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
>> > bigger
>> > > > for
>> > > > > > the
>> > > > > > > > >> > earlier
>> > > > > > > > >> > > > > > produce requests than for the later ones. For
>> the
>> > > same
>> > > > > > > reason,
>> > > > > > > > >> the
>> > > > > > > > >> > > more
>> > > > > > > > >> > > > > > ProduceRequests queued
>> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit
>> we
>> > get
>> > > > > > (capped
>> > > > > > > > by
>> > > > > > > > >> the
>> > > > > > > > >> > > > > > produce timeout).
>> > > > > > > > >> > > > > > 2. If the enqueued produce requests have
>> acks=0 or
>> > > > > acks=1
>> > > > > > > > >> > > > > >   There will be no latency differences in this
>> > case,
>> > > > but
>> > > > > > > > >> > > > > >   2.1 without this KIP, the records of
>> partition
>> > > > test-0
>> > > > > in
>> > > > > > > the
>> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will
>> be
>> > > > > appended
>> > > > > > > to
>> > > > > > > > >> the
>> > > > > > > > >> > > local
>> > > > > > > > >> > > > > log,
>> > > > > > > > >> > > > > >         and eventually be truncated after
>> > processing
>> > > > the
>> > > > > > > > >> > > LeaderAndISR.
>> > > > > > > > >> > > > > > This is what's referred to as
>> > > > > > > > >> > > > > >         "some unofficial definition of data
>> loss
>> > in
>> > > > > terms
>> > > > > > of
>> > > > > > > > >> > messages
>> > > > > > > > >> > > > > > beyond the high watermark".
>> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
>> > > since
>> > > > if
>> > > > > > the
>> > > > > > > > >> > > > LeaderAndISR
>> > > > > > > > >> > > > > > is immediately processed, the response to
>> > producers
>> > > > will
>> > > > > > > have
>> > > > > > > > >> > > > > >         the NotLeaderForPartition error,
>> causing
>> > > > > producers
>> > > > > > > to
>> > > > > > > > >> retry
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > This explanation above is the benefit for
>> reducing
>> > > the
>> > > > > > > latency
>> > > > > > > > >> of a
>> > > > > > > > >> > > > > broker
>> > > > > > > > >> > > > > > becoming the follower,
>> > > > > > > > >> > > > > > closely related is reducing the latency of a
>> > broker
>> > > > > > becoming
>> > > > > > > > the
>> > > > > > > > >> > > > leader.
>> > > > > > > > >> > > > > > In this case, the benefit is even more
>> obvious, if
>> > > > other
>> > > > > > > > brokers
>> > > > > > > > >> > have
>> > > > > > > > >> > > > > > resigned leadership, and the
>> > > > > > > > >> > > > > > current broker should take leadership. Any
>> delay
>> > in
>> > > > > > > processing
>> > > > > > > > >> the
>> > > > > > > > >> > > > > > LeaderAndISR will be perceived
>> > > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
>> > this
>> > > > can
>> > > > > > > cause
>> > > > > > > > >> > failed
>> > > > > > > > >> > > > > > produce requests if the retries are
>> > > > > > > > >> > > > > > exhausted.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Another two types of controller requests are
>> > > > > > UpdateMetadata
>> > > > > > > > and
>> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
>> > follows:
>> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
>> > > means
>> > > > > > > clients
>> > > > > > > > >> > > receiving
>> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
>> > info
>> > > > > > > > >> > > > > > for certain partitions, and the effect is more
>> > > retries
>> > > > > or
>> > > > > > > even
>> > > > > > > > >> > fatal
>> > > > > > > > >> > > > > > failure if the retries are exhausted.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
>> may
>> > > > > degrade
>> > > > > > > the
>> > > > > > > > >> > > > performance
>> > > > > > > > >> > > > > > of topic deletion.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Regarding your last question of the delay for
>> > > > > > > > >> > DescribeLogDirsRequest,
>> > > > > > > > >> > > > you
>> > > > > > > > >> > > > > > are right
>> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
>> > > getting
>> > > > > the
>> > > > > > > log
>> > > > > > > > >> dirs
>> > > > > > > > >> > > > info,
>> > > > > > > > >> > > > > > and it's only relevant
>> > > > > > > > >> > > > > > when controller requests are involved.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Regards,
>> > > > > > > > >> > > > > > Lucas
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
>> > > > > > > lindong28@gmail.com
>> > > > > > > > >
>> > > > > > > > >> > > wrote:
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >> Hey Jun,
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Thanks much for the comments. It is good
>> point.
>> > So
>> > > > the
>> > > > > > > > feature
>> > > > > > > > >> may
>> > > > > > > > >> > > be
>> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
>> > > below.
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Hey Lucas,
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Do you think this feature is also useful for
>> > > non-JBOD
>> > > > > > setup
>> > > > > > > > or
>> > > > > > > > >> it
>> > > > > > > > >> > is
>> > > > > > > > >> > > > > only
>> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
>> > > > > understand
>> > > > > > > > this.
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order
>> to
>> > > move
>> > > > > > > leaders
>> > > > > > > > >> on
>> > > > > > > > >> > the
>> > > > > > > > >> > > > > >> failed
>> > > > > > > > >> > > > > >> disk to other disks, the system operator first
>> > > needs
>> > > > to
>> > > > > > get
>> > > > > > > > the
>> > > > > > > > >> > list
>> > > > > > > > >> > > > of
>> > > > > > > > >> > > > > >> partitions on the failed disk. This is
>> currently
>> > > > > achieved
>> > > > > > > > using
>> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
>> > > > > > > > >> DescribeLogDirsRequest
>> > > > > > > > >> > to
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
>> > > > requests,
>> > > > > > then
>> > > > > > > > the
>> > > > > > > > >> > > > > >> DescribeLogDirsRequest
>> > > > > > > > >> > > > > >> may still take a long time to be processed by
>> the
>> > > > > broker.
>> > > > > > > So
>> > > > > > > > >> the
>> > > > > > > > >> > > > overall
>> > > > > > > > >> > > > > >> time to move leaders away from the failed disk
>> > may
>> > > > > still
>> > > > > > be
>> > > > > > > > >> long
>> > > > > > > > >> > > even
>> > > > > > > > >> > > > > with
>> > > > > > > > >> > > > > >> this KIP. What do you think?
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> Thanks,
>> > > > > > > > >> > > > > >> Dong
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> > > > > > > > >> lucasatucla@gmail.com
>> > > > > > > > >> > >
>> > > > > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >>
>> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > @Dong,
>> > > > > > > > >> > > > > >> > Since both of the two comments in your
>> previous
>> > > > email
>> > > > > > are
>> > > > > > > > >> about
>> > > > > > > > >> > > the
>> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
>> useful,
>> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
>> > that
>> > > > > this
>> > > > > > > KIP
>> > > > > > > > >> can
>> > > > > > > > >> > be
>> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > > > > >> > > > > >> > Please let me know, thanks!
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > Regards,
>> > > > > > > > >> > > > > >> > Lucas
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
>> > > > > > > jun@confluent.io>
>> > > > > > > > >> > wrote:
>> > > > > > > > >> > > > > >> >
>> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
>> > probably
>> > > > > > should
>> > > > > > > > just
>> > > > > > > > >> > kill
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
>> help.
>> > If
>> > > > > only
>> > > > > > > one
>> > > > > > > > of
>> > > > > > > > >> > the
>> > > > > > > > >> > > > > disks
>> > > > > > > > >> > > > > >> on
>> > > > > > > > >> > > > > >> > a
>> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
>> > disk
>> > > > and
>> > > > > > move
>> > > > > > > > the
>> > > > > > > > >> > > > leaders
>> > > > > > > > >> > > > > on
>> > > > > > > > >> > > > > >> > that
>> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
>> > able
>> > > > to
>> > > > > > > > process
>> > > > > > > > >> the
>> > > > > > > > >> > > > > >> > LeaderAndIsr
>> > > > > > > > >> > > > > >> > > requests faster will potentially help the
>> > > > producers
>> > > > > > > > recover
>> > > > > > > > >> > > > quicker.
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > Thanks,
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > Jun
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> > > > > > > > >> lindong28@gmail.com
>> > > > > > > > >> > >
>> > > > > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > > Hey Lucas,
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
>> > > questions
>> > > > > > below.
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
>> covers
>> > 20
>> > > > > > > > partitions
>> > > > > > > > >> > that
>> > > > > > > > >> > > > are
>> > > > > > > > >> > > > > >> > > randomly
>> > > > > > > > >> > > > > >> > > > distributed across all partitions, then
>> > each
>> > > > > > > > >> ProduceRequest
>> > > > > > > > >> > > will
>> > > > > > > > >> > > > > >> likely
>> > > > > > > > >> > > > > >> > > > cover some partitions for which the
>> broker
>> > is
>> > > > > still
>> > > > > > > > >> leader
>> > > > > > > > >> > > after
>> > > > > > > > >> > > > > it
>> > > > > > > > >> > > > > >> > > quickly
>> > > > > > > > >> > > > > >> > > > processes the
>> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
>> still
>> > > be
>> > > > > slow
>> > > > > > > in
>> > > > > > > > >> > > > processing
>> > > > > > > > >> > > > > >> these
>> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
>> > very
>> > > > > high
>> > > > > > > with
>> > > > > > > > >> this
>> > > > > > > > >> > > > KIP.
>> > > > > > > > >> > > > > It
>> > > > > > > > >> > > > > >> > > seems
>> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
>> timeout
>> > > > after
>> > > > > > 30
>> > > > > > > > >> > seconds.
>> > > > > > > > >> > > Is
>> > > > > > > > >> > > > > >> this
>> > > > > > > > >> > > > > >> > > > understanding correct?
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
>> > > still
>> > > > > > > timeout
>> > > > > > > > >> after
>> > > > > > > > >> > > 30
>> > > > > > > > >> > > > > >> > seconds,
>> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
>> reduces
>> > > > > average
>> > > > > > > > >> produce
>> > > > > > > > >> > > > > latency.
>> > > > > > > > >> > > > > >> Can
>> > > > > > > > >> > > > > >> > > you
>> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
>> > this
>> > > > KIP?
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
>> cares
>> > > > > number
>> > > > > > of
>> > > > > > > > >> > > truncated
>> > > > > > > > >> > > > > >> > messages.
>> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
>> > > > > throughput
>> > > > > > > or
>> > > > > > > > >> > reduce
>> > > > > > > > >> > > > > >> message
>> > > > > > > > >> > > > > >> > > > duplication? It will be good to
>> understand
>> > > > this.
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > Thanks,
>> > > > > > > > >> > > > > >> > > > Dong
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
>> Wang <
>> > > > > > > > >> > > lucasatucla@gmail.com
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > >> > wrote:
>> > > > > > > > >> > > > > >> > > >
>> > > > > > > > >> > > > > >> > > > > Hi Dong,
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
>> Please
>> > > see
>> > > > > my
>> > > > > > > > reply
>> > > > > > > > >> > > below.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
>> > partition.
>> > > > Now
>> > > > > > > let's
>> > > > > > > > >> > > consider
>> > > > > > > > >> > > > a
>> > > > > > > > >> > > > > >> more
>> > > > > > > > >> > > > > >> > > > common
>> > > > > > > > >> > > > > >> > > > > scenario
>> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
>> > > > partitions.
>> > > > > > And
>> > > > > > > > >> let's
>> > > > > > > > >> > > say
>> > > > > > > > >> > > > > for
>> > > > > > > > >> > > > > >> > some
>> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
>> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
>> > broker0
>> > > is
>> > > > > so
>> > > > > > > > large,
>> > > > > > > > >> > say
>> > > > > > > > >> > > > 10K,
>> > > > > > > > >> > > > > >> that
>> > > > > > > > >> > > > > >> > > the
>> > > > > > > > >> > > > > >> > > > > cluster is skewed,
>> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
>> the
>> > > > > > leadership
>> > > > > > > > >> for a
>> > > > > > > > >> > > lot
>> > > > > > > > >> > > > of
>> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > > > > >> > > > > >> > > > > either manually or through some
>> service
>> > > like
>> > > > > > cruise
>> > > > > > > > >> > control.
>> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
>> > leadership
>> > > > > > > > transitions
>> > > > > > > > >> > > finish
>> > > > > > > > >> > > > > >> more
>> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
>> > > becoming
>> > > > > more
>> > > > > > > > >> > balanced,
>> > > > > > > > >> > > > > >> > > > > but all existing producers
>> corresponding
>> > to
>> > > > the
>> > > > > > 9K
>> > > > > > > > >> > > partitions
>> > > > > > > > >> > > > > will
>> > > > > > > > >> > > > > >> > get
>> > > > > > > > >> > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > > > errors relatively quickly
>> > > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
>> > > thanks
>> > > > to
>> > > > > > the
>> > > > > > > > >> > batched
>> > > > > > > > >> > > > > async
>> > > > > > > > >> > > > > >> ZK
>> > > > > > > > >> > > > > >> > > > > operations.
>> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
>> > during
>> > > > such
>> > > > > > > > >> > troublesome
>> > > > > > > > >> > > > > times.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
>> have
>> > > > shown
>> > > > > > > that
>> > > > > > > > >> with
>> > > > > > > > >> > > this
>> > > > > > > > >> > > > > KIP
>> > > > > > > > >> > > > > >> > many
>> > > > > > > > >> > > > > >> > > > > producers
>> > > > > > > > >> > > > > >> > > > > receive an explicit error
>> > > > > NotLeaderForPartition,
>> > > > > > > > based
>> > > > > > > > >> on
>> > > > > > > > >> > > > which
>> > > > > > > > >> > > > > >> they
>> > > > > > > > >> > > > > >> > > > retry
>> > > > > > > > >> > > > > >> > > > > immediately.
>> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
>> seconds+quick
>> > > > retry)
>> > > > > > for
>> > > > > > > > >> their
>> > > > > > > > >> > > > single
>> > > > > > > > >> > > > > >> > > message
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > much smaller
>> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
>> > > without
>> > > > > the
>> > > > > > > KIP
>> > > > > > > > >> (30
>> > > > > > > > >> > > > seconds
>> > > > > > > > >> > > > > >> for
>> > > > > > > > >> > > > > >> > > > timing
>> > > > > > > > >> > > > > >> > > > > out + quick retry).
>> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
>> timing
>> > > out
>> > > > on
>> > > > > > the
>> > > > > > > > >> > producer
>> > > > > > > > >> > > > > side
>> > > > > > > > >> > > > > >> can
>> > > > > > > > >> > > > > >> > > > > achieve the same result,
>> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
>> > > > > > drawbacks[1].
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show
>> the
>> > > > > number
>> > > > > > of
>> > > > > > > > >> > > truncated
>> > > > > > > > >> > > > > >> > messages
>> > > > > > > > >> > > > > >> > > on
>> > > > > > > > >> > > > > >> > > > > brokers,
>> > > > > > > > >> > > > > >> > > > > with the experiments done in the
>> Google
>> > > Doc,
>> > > > it
>> > > > > > > > should
>> > > > > > > > >> be
>> > > > > > > > >> > > easy
>> > > > > > > > >> > > > > to
>> > > > > > > > >> > > > > >> see
>> > > > > > > > >> > > > > >> > > > that
>> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
>> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
>> > > > up-to-date
>> > > > > > > > >> metadata
>> > > > > > > > >> > > > avoids
>> > > > > > > > >> > > > > >> > > appending
>> > > > > > > > >> > > > > >> > > > > of messages
>> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
>> > talk
>> > > > to a
>> > > > > > > > system
>> > > > > > > > >> > > > operator
>> > > > > > > > >> > > > > >> and
>> > > > > > > > >> > > > > >> > ask
>> > > > > > > > >> > > > > >> > > > > whether
>> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
>> > most
>> > > > > likely
>> > > > > > > the
>> > > > > > > > >> > answer
>> > > > > > > > >> > > > is
>> > > > > > > > >> > > > > >> yes.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
>> > > might
>> > > > be
>> > > > > > > > >> helpful to
>> > > > > > > > >> > > > > >> construct
>> > > > > > > > >> > > > > >> > > some
>> > > > > > > > >> > > > > >> > > > > formulas.
>> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
>> back
>> > to
>> > > > the
>> > > > > > > case
>> > > > > > > > >> where
>> > > > > > > > >> > > > there
>> > > > > > > > >> > > > > >> is
>> > > > > > > > >> > > > > >> > > only
>> > > > > > > > >> > > > > >> > > > > ONE partition involved.
>> > > > > > > > >> > > > > >> > > > > Following the experiments in the
>> Google
>> > > Doc,
>> > > > > > let's
>> > > > > > > > say
>> > > > > > > > >> > > broker0
>> > > > > > > > >> > > > > >> > becomes
>> > > > > > > > >> > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > > > follower at time t0,
>> > > > > > > > >> > > > > >> > > > > and after t0 there were still N
>> produce
>> > > > > requests
>> > > > > > in
>> > > > > > > > its
>> > > > > > > > >> > > > request
>> > > > > > > > >> > > > > >> > queue.
>> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought
>> by
>> > > this
>> > > > > KIP,
>> > > > > > > > >> broker0
>> > > > > > > > >> > > can
>> > > > > > > > >> > > > > >> reply
>> > > > > > > > >> > > > > >> > > with
>> > > > > > > > >> > > > > >> > > > an
>> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
>> > > processing
>> > > > > > time
>> > > > > > > of
>> > > > > > > > >> > > replying
>> > > > > > > > >> > > > > >> with
>> > > > > > > > >> > > > > >> > > such
>> > > > > > > > >> > > > > >> > > > an
>> > > > > > > > >> > > > > >> > > > > error message.
>> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
>> need to
>> > > > > append
>> > > > > > > > >> messages
>> > > > > > > > >> > to
>> > > > > > > > >> > > > > >> > segments,
>> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
>> > > processing
>> > > > > > time
>> > > > > > > > for
>> > > > > > > > >> > such
>> > > > > > > > >> > > > > logic.
>> > > > > > > > >> > > > > >> > > > > Then the average extra latency
>> incurred
>> > > > without
>> > > > > > > this
>> > > > > > > > >> KIP
>> > > > > > > > >> > is
>> > > > > > > > >> > > N
>> > > > > > > > >> > > > *
>> > > > > > > > >> > > > > >> (M2 -
>> > > > > > > > >> > > > > >> > > > M1) /
>> > > > > > > > >> > > > > >> > > > > 2.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
>> larger
>> > > than
>> > > > > M1,
>> > > > > > > > which
>> > > > > > > > >> > means
>> > > > > > > > >> > > > as
>> > > > > > > > >> > > > > >> long
>> > > > > > > > >> > > > > >> > > as N
>> > > > > > > > >> > > > > >> > > > > is positive,
>> > > > > > > > >> > > > > >> > > > > we would see improvements on the
>> average
>> > > > > latency.
>> > > > > > > > >> > > > > >> > > > > There does not need to be significant
>> > > backlog
>> > > > > of
>> > > > > > > > >> requests
>> > > > > > > > >> > in
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > > request
>> > > > > > > > >> > > > > >> > > > > queue,
>> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
>> performance
>> > > to
>> > > > > have
>> > > > > > > the
>> > > > > > > > >> > > > > improvement.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > Regards,
>> > > > > > > > >> > > > > >> > > > > Lucas
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
>> timeout on
>> > > the
>> > > > > > > > producer
>> > > > > > > > >> > side
>> > > > > > > > >> > > > can
>> > > > > > > > >> > > > > >> > trigger
>> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
>> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker
>> is
>> > > > > > overloaded,
>> > > > > > > > >> > > > exacerbating
>> > > > > > > > >> > > > > >> the
>> > > > > > > > >> > > > > >> > > > > situation.
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
>> Lin
>> > <
>> > > > > > > > >> > > lindong28@gmail.com
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > >> > wrote:
>> > > > > > > > >> > > > > >> > > > >
>> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
>> > > documentation
>> > > > of
>> > > > > > the
>> > > > > > > > >> > > > experiment.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
>> > separate
>> > > > > queue
>> > > > > > > for
>> > > > > > > > >> > > > controller
>> > > > > > > > >> > > > > >> > > requests
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in
>> the
>> > > > > summary
>> > > > > > > > >> section
>> > > > > > > > >> > of
>> > > > > > > > >> > > > the
>> > > > > > > > >> > > > > >> > Google
>> > > > > > > > >> > > > > >> > > > > doc,
>> > > > > > > > >> > > > > >> > > > > > controller requests are generally
>> more
>> > > > > > important
>> > > > > > > > than
>> > > > > > > > >> > data
>> > > > > > > > >> > > > > >> requests
>> > > > > > > > >> > > > > >> > > and
>> > > > > > > > >> > > > > >> > > > > we
>> > > > > > > > >> > > > > >> > > > > > probably want controller requests
>> to be
>> > > > > > processed
>> > > > > > > > >> > sooner.
>> > > > > > > > >> > > > But
>> > > > > > > > >> > > > > >> then
>> > > > > > > > >> > > > > >> > > Eno
>> > > > > > > > >> > > > > >> > > > > has
>> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
>> not
>> > > sure
>> > > > > the
>> > > > > > > > >> Google
>> > > > > > > > >> > doc
>> > > > > > > > >> > > > has
>> > > > > > > > >> > > > > >> > > answered
>> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
>> > > > following
>> > > > > > > > >> questions?
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
>> > > actual
>> > > > > > > benefit
>> > > > > > > > of
>> > > > > > > > >> > > > KIP-291
>> > > > > > > > >> > > > > to
>> > > > > > > > >> > > > > >> > > users.
>> > > > > > > > >> > > > > >> > > > > The
>> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
>> > > > simulates
>> > > > > > the
>> > > > > > > > >> > scenario
>> > > > > > > > >> > > > that
>> > > > > > > > >> > > > > >> > broker
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest
>> due
>> > to
>> > > > e.g.
>> > > > > > > slow
>> > > > > > > > >> disk.
>> > > > > > > > >> > > It
>> > > > > > > > >> > > > > >> > currently
>> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
>> partition.
>> > > But
>> > > > > in
>> > > > > > > the
>> > > > > > > > >> > common
>> > > > > > > > >> > > > > >> scenario,
>> > > > > > > > >> > > > > >> > > it
>> > > > > > > > >> > > > > >> > > > is
>> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
>> > there
>> > > > are
>> > > > > > many
>> > > > > > > > >> other
>> > > > > > > > >> > > > > >> partitions
>> > > > > > > > >> > > > > >> > > that
>> > > > > > > > >> > > > > >> > > > > are
>> > > > > > > > >> > > > > >> > > > > > also actively produced to and
>> > > > ProduceRequest
>> > > > > to
>> > > > > > > > these
>> > > > > > > > >> > > > > partition
>> > > > > > > > >> > > > > >> > also
>> > > > > > > > >> > > > > >> > > > > takes
>> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
>> even
>> > > if
>> > > > > > > broker0
>> > > > > > > > >> can
>> > > > > > > > >> > > > become
>> > > > > > > > >> > > > > >> > > follower
>> > > > > > > > >> > > > > >> > > > > for
>> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
>> still
>> > > > needs
>> > > > > > to
>> > > > > > > > >> process
>> > > > > > > > >> > > the
>> > > > > > > > >> > > > > >> > > > > ProduceRequest
>> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
>> > > > > > > ProduceRequests
>> > > > > > > > >> > cover
>> > > > > > > > >> > > > > other
>> > > > > > > > >> > > > > >> > > > > partitions.
>> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
>> > > timeout
>> > > > > > after
>> > > > > > > > 30
>> > > > > > > > >> > > seconds
>> > > > > > > > >> > > > > and
>> > > > > > > > >> > > > > >> > most
>> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
>> after
>> > > 30
>> > > > > > > seconds.
>> > > > > > > > >> Then
>> > > > > > > > >> > > it
>> > > > > > > > >> > > > is
>> > > > > > > > >> > > > > >> not
>> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
>> client
>> > > > since
>> > > > > > > > client
>> > > > > > > > >> > will
>> > > > > > > > >> > > > > >> timeout
>> > > > > > > > >> > > > > >> > > after
>> > > > > > > > >> > > > > >> > > > > 30
>> > > > > > > > >> > > > > >> > > > > > seconds before possibly
>> re-connecting
>> > to
>> > > > > > broker1,
>> > > > > > > > >> with
>> > > > > > > > >> > or
>> > > > > > > > >> > > > > >> without
>> > > > > > > > >> > > > > >> > > > > KIP-291.
>> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
>> > > specific
>> > > > > > > > benefits
>> > > > > > > > >> of
>> > > > > > > > >> > > this
>> > > > > > > > >> > > > > >> KIP to
>> > > > > > > > >> > > > > >> > > > user
>> > > > > > > > >> > > > > >> > > > > or
>> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
>> this
>> > > KIP
>> > > > > > > > decreases
>> > > > > > > > >> > > > average
>> > > > > > > > >> > > > > >> > > latency,
>> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably
>> of
>> > > > > exception
>> > > > > > > > >> exposed
>> > > > > > > > >> > to
>> > > > > > > > >> > > > > >> client
>> > > > > > > > >> > > > > >> > > etc.
>> > > > > > > > >> > > > > >> > > > It
>> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
>> > > > experience
>> > > > > > > only
>> > > > > > > > >> when
>> > > > > > > > >> > > > there
>> > > > > > > > >> > > > > is
>> > > > > > > > >> > > > > >> > > issue
>> > > > > > > > >> > > > > >> > > > > with
>> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in
>> the
>> > > > > request
>> > > > > > > > queue
>> > > > > > > > >> > due
>> > > > > > > > >> > > to
>> > > > > > > > >> > > > > >> slow
>> > > > > > > > >> > > > > >> > > disk
>> > > > > > > > >> > > > > >> > > > as
>> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
>> this
>> > > KIP
>> > > > > > also
>> > > > > > > > >> useful
>> > > > > > > > >> > > when
>> > > > > > > > >> > > > > >> there
>> > > > > > > > >> > > > > >> > is
>> > > > > > > > >> > > > > >> > > > no
>> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
>> might
>> > be
>> > > > > > helpful
>> > > > > > > > to
>> > > > > > > > >> > > clarify
>> > > > > > > > >> > > > > >> this
>> > > > > > > > >> > > > > >> > to
>> > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > Thanks much,
>> > > > > > > > >> > > > > >> > > > > > Dong
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
>> Lucas
>> > > > Wang <
>> > > > > > > > >> > > > > >> lucasatucla@gmail.com
>> > > > > > > > >> > > > > >> > >
>> > > > > > > > >> > > > > >> > > > > wrote:
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
>> > > > > experiment
>> > > > > > > > >> results.
>> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
>> impact
>> > > > > > achieved
>> > > > > > > by
>> > > > > > > > >> > > > > implementing
>> > > > > > > > >> > > > > >> > the
>> > > > > > > > >> > > > > >> > > > > > proposed
>> > > > > > > > >> > > > > >> > > > > > > change:
>> > > > > > > > >> > > > > >> > > > > > >
>> https://docs.google.com/document/d/
>> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > > > > >> > > > > >> > > > > > >
>> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
>> time
>> > > and
>> > > > > let
>> > > > > > > me
>> > > > > > > > >> know
>> > > > > > > > >> > > your
>> > > > > > > > >> > > > > >> > > feedback.
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > Regards,
>> > > > > > > > >> > > > > >> > > > > > > Lucas
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
>> > > Harsha <
>> > > > > > > > >> > > kafka@harsha.io>
>> > > > > > > > >> > > > > >> wrote:
>> > > > > > > > >> > > > > >> > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
>> take a
>> > > > look
>> > > > > > > might
>> > > > > > > > >> suit
>> > > > > > > > >> > > our
>> > > > > > > > >> > > > > >> > > > requirements
>> > > > > > > > >> > > > > >> > > > > > > > better.
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > Thanks,
>> > > > > > > > >> > > > > >> > > > > > > > Harsha
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
>> PM,
>> > > > Lucas
>> > > > > > > Wang <
>> > > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
>> > > > > > > > >> > > > > >> > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > wrote:
>> > > > > > > > >> > > > > >> > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
>> > > > > > replication
>> > > > > > > > >> quota
>> > > > > > > > >> > > > > mechanism
>> > > > > > > > >> > > > > >> > > > proposed
>> > > > > > > > >> > > > > >> > > > > > in
>> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
>> > > > scenario.
>> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
>> > > > > > > > >> > > > > >> > > > > > > > > Lucas
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > >> > > > > > > > >
>>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

I agree with Dong that out-of-order processing can happen with having 2
separate queues as well and it can even happen today.
Can we use the correlationId in the request from the controller to the
broker to handle ordering ?

Thanks,

Mayuresh


On Thu, Jul 19, 2018 at 6:41 AM Becket Qin <be...@gmail.com> wrote:

> Good point, Joel. I agree that a dedicated controller request handling
> thread would be a better isolation. It also solves the reordering issue.
>
> On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com> wrote:
>
> > Good example. I think this scenario can occur in the current code as well
> > but with even lower probability given that there are other non-controller
> > requests interleaved. It is still sketchy though and I think a safer
> > approach would be separate queues and pinning controller request handling
> > to one handler thread.
> >
> > On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Becket,
> > >
> > > I think you are right that there may be out-of-order processing.
> However,
> > > it seems that out-of-order processing may also happen even if we use a
> > > separate queue.
> > >
> > > Here is the example:
> > >
> > > - Controller sends R1 and got disconnected before receiving response.
> > Then
> > > it reconnects and sends R2. Both requests now stay in the controller
> > > request queue in the order they are sent.
> > > - thread1 takes R1_a from the request queue and then thread2 takes R2
> > from
> > > the request queue almost at the same time.
> > > - So R1_a and R2 are processed in parallel. There is chance that R2's
> > > processing is completed before R1.
> > >
> > > If out-of-order processing can happen for both approaches with very low
> > > probability, it may not be worthwhile to add the extra queue. What do
> you
> > > think?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > > > Hi Mayuresh/Joel,
> > > >
> > > > Using the request channel as a dequeue was bright up some time ago
> when
> > > we
> > > > initially thinking of prioritizing the request. The concern was that
> > the
> > > > controller requests are supposed to be processed in order. If we can
> > > ensure
> > > > that there is one controller request in the request channel, the
> order
> > is
> > > > not a concern. But in cases that there are more than one controller
> > > request
> > > > inserted into the queue, the controller request order may change and
> > > cause
> > > > problem. For example, think about the following sequence:
> > > > 1. Controller successfully sent a request R1 to broker
> > > > 2. Broker receives R1 and put the request to the head of the request
> > > queue.
> > > > 3. Controller to broker connection failed and the controller
> > reconnected
> > > to
> > > > the broker.
> > > > 4. Controller sends a request R2 to the broker
> > > > 5. Broker receives R2 and add it to the head of the request queue.
> > > > Now on the broker side, R2 will be processed before R1 is processed,
> > > which
> > > > may cause problem.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > > >
> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> > invasive
> > > > > alternative and it should work. Jun/Becket/others, do you see any
> > > > pitfalls
> > > > > with this approach?
> > > > >
> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > @Mayuresh,
> > > > > > That's a very interesting idea that I haven't thought before.
> > > > > > It seems to solve our problem at hand pretty well, and also
> > > > > > avoids the need to have a new size metric and capacity config
> > > > > > for the controller request queue. In fact, if we were to adopt
> > > > > > this design, there is no public interface change, and we
> > > > > > probably don't need a KIP.
> > > > > > Also implementation wise, it seems
> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> > > requirement
> > > > > > by supporting a capacity, and also allowing inserting at both
> ends.
> > > > > >
> > > > > > My only concern is that this design is tied to the coincidence
> that
> > > > > > we have two request priorities and there are two ends to a deque.
> > > > > > Hence by using the proposed design, it seems the network layer is
> > > > > > more tightly coupled with upper layer logic, e.g. if we were to
> add
> > > > > > an extra priority level in the future for some reason, we would
> > > > probably
> > > > > > need to go back to the design of separate queues, one for each
> > > priority
> > > > > > level.
> > > > > >
> > > > > > In summary, I'm ok with both designs and lean toward your
> suggested
> > > > > > approach.
> > > > > > Let's hear what others think.
> > > > > >
> > > > > > @Becket,
> > > > > > In light of Mayuresh's suggested new design, I'm answering your
> > > > question
> > > > > > only in the context
> > > > > > of the current KIP design: I think your suggestion makes sense,
> and
> > > I'm
> > > > > ok
> > > > > > with removing the capacity config and
> > > > > > just relying on the default value of 20 being sufficient enough.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > > gharatmayuresh15@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > >
> > > > > > > Seems like the main intent here is to prioritize the controller
> > > > request
> > > > > > > over any other requests.
> > > > > > > In that case, we can change the request queue to a dequeue,
> where
> > > you
> > > > > > > always insert the normal requests (produce, consume,..etc) to
> the
> > > end
> > > > > of
> > > > > > > the dequeue, but if its a controller request, you insert it to
> > the
> > > > head
> > > > > > of
> > > > > > > the queue. This ensures that the controller request will be
> given
> > > > > higher
> > > > > > > priority over other requests.
> > > > > > >
> > > > > > > Also since we only read one request from the socket and mute it
> > and
> > > > > only
> > > > > > > unmute it after handling the request, this would ensure that we
> > > don't
> > > > > > > handle controller requests out of order.
> > > > > > >
> > > > > > > With this approach we can avoid the second queue and the
> > additional
> > > > > > config
> > > > > > > for the size of the queue.
> > > > > > >
> > > > > > > What do you think ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hey Joel,
> > > > > > > >
> > > > > > > > Thank for the detail explanation. I agree the current design
> > > makes
> > > > > > sense.
> > > > > > > > My confusion is about whether the new config for the
> controller
> > > > queue
> > > > > > > > capacity is necessary. I cannot think of a case in which
> users
> > > > would
> > > > > > > change
> > > > > > > > it.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > I guess my question can be rephrased to "do we expect user
> to
> > > > ever
> > > > > > > change
> > > > > > > > > the controller request queue capacity"? If we agree that 20
> > is
> > > > > > already
> > > > > > > a
> > > > > > > > > very generous default number and we do not expect user to
> > > change
> > > > > it,
> > > > > > is
> > > > > > > > it
> > > > > > > > > still necessary to expose this as a config?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> @Becket
> > > > > > > > >> 1. Thanks for the comment. You are right that normally
> there
> > > > > should
> > > > > > be
> > > > > > > > >> just
> > > > > > > > >> one controller request because of muting,
> > > > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > > > controller
> > > > > > > > >> requests.
> > > > > > > > >> I went through the KIP again, and I'm not sure which part
> > > > conveys
> > > > > > that
> > > > > > > > >> info.
> > > > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > > > >>
> > > > > > > > >> 2. Though it should not happen in normal conditions, the
> > > current
> > > > > > > design
> > > > > > > > >> does not preclude multiple controllers running
> > > > > > > > >> at the same time, hence if we don't have the controller
> > queue
> > > > > > capacity
> > > > > > > > >> config and simply make its capacity to be 1,
> > > > > > > > >> network threads handling requests from different
> controllers
> > > > will
> > > > > be
> > > > > > > > >> blocked during those troublesome times,
> > > > > > > > >> which is probably not what we want. On the other hand,
> > adding
> > > > the
> > > > > > > extra
> > > > > > > > >> config with a default value, say 20, guards us from issues
> > in
> > > > > those
> > > > > > > > >> troublesome times, and IMO there isn't much downside of
> > adding
> > > > the
> > > > > > > extra
> > > > > > > > >> config.
> > > > > > > > >>
> > > > > > > > >> @Mayuresh
> > > > > > > > >> Good catch, this sentence is an obsolete statement based
> on
> > a
> > > > > > previous
> > > > > > > > >> design. I've revised the wording in the KIP.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Lucas
> > > > > > > > >>
> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Lucas,
> > > > > > > > >> >
> > > > > > > > >> > Thanks for the KIP.
> > > > > > > > >> > I am trying to understand why you think "The memory
> > > > consumption
> > > > > > can
> > > > > > > > rise
> > > > > > > > >> > given the total number of queued requests can go up to
> 2x"
> > > in
> > > > > the
> > > > > > > > impact
> > > > > > > > >> > section. Normally the requests from controller to a
> Broker
> > > are
> > > > > not
> > > > > > > > high
> > > > > > > > >> > volume, right ?
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> >
> > > > > > > > >> > Mayuresh
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > becket.qin@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> plane
> > > from
> > > > > the
> > > > > > > > data
> > > > > > > > >> > plane
> > > > > > > > >> > > makes a lot of sense.
> > > > > > > > >> > >
> > > > > > > > >> > > In the KIP you mentioned that the controller request
> > queue
> > > > may
> > > > > > > have
> > > > > > > > >> many
> > > > > > > > >> > > requests in it. Will this be a common case? The
> > controller
> > > > > > > requests
> > > > > > > > >> still
> > > > > > > > >> > > goes through the SocketServer. The SocketServer will
> > mute
> > > > the
> > > > > > > > channel
> > > > > > > > >> > once
> > > > > > > > >> > > a request is read and put into the request channel. So
> > > > > assuming
> > > > > > > > there
> > > > > > > > >> is
> > > > > > > > >> > > only one connection between controller and each
> broker,
> > on
> > > > the
> > > > > > > > broker
> > > > > > > > >> > side,
> > > > > > > > >> > > there should be only one controller request in the
> > > > controller
> > > > > > > > request
> > > > > > > > >> > queue
> > > > > > > > >> > > at any given time. If that is the case, do we need a
> > > > separate
> > > > > > > > >> controller
> > > > > > > > >> > > request queue capacity config? The default value 20
> > means
> > > > that
> > > > > > we
> > > > > > > > >> expect
> > > > > > > > >> > > there are 20 controller switches to happen in a short
> > > period
> > > > > of
> > > > > > > > time.
> > > > > > > > >> I
> > > > > > > > >> > am
> > > > > > > > >> > > not sure whether someone should increase the
> controller
> > > > > request
> > > > > > > > queue
> > > > > > > > >> > > capacity to handle such case, as it seems indicating
> > > > something
> > > > > > > very
> > > > > > > > >> wrong
> > > > > > > > >> > > has happened.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > >
> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > lindong28@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Thanks for the update Lucas.
> > > > > > > > >> > > >
> > > > > > > > >> > > > I think the motivation section is intuitive. It will
> > be
> > > > good
> > > > > > to
> > > > > > > > >> learn
> > > > > > > > >> > > more
> > > > > > > > >> > > > about the comments from other reviewers.
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Hi Dong,
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > > > explaining
> > > > > > > the
> > > > > > > > >> > cases
> > > > > > > > >> > > > that
> > > > > > > > >> > > > > would have user impacts.
> > > > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Thanks,
> > > > > > > > >> > > > > Lucas
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com
> > > > > > > > >> >
> > > > > > > > >> > > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > Hi Dong,
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > The simulation of disk being slow is merely for
> me
> > > to
> > > > > > easily
> > > > > > > > >> > > construct
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > > testing scenario
> > > > > > > > >> > > > > > with a backlog of produce requests. In
> production,
> > > > other
> > > > > > > than
> > > > > > > > >> the
> > > > > > > > >> > > disk
> > > > > > > > >> > > > > > being slow, a backlog of
> > > > > > > > >> > > > > > produce requests may also be caused by high
> > produce
> > > > QPS.
> > > > > > > > >> > > > > > In that case, we may not want to kill the broker
> > and
> > > > > > that's
> > > > > > > > when
> > > > > > > > >> > this
> > > > > > > > >> > > > KIP
> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > > > >> > > > > > and non-JBOD setup.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Going back to your previous question about each
> > > > > > > ProduceRequest
> > > > > > > > >> > > covering
> > > > > > > > >> > > > > 20
> > > > > > > > >> > > > > > partitions that are randomly
> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > > > enqueued
> > > > > > > that
> > > > > > > > >> > tries
> > > > > > > > >> > > to
> > > > > > > > >> > > > > > switch the current broker, say broker0, from
> > leader
> > > to
> > > > > > > > follower
> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For
> the
> > > > sake
> > > > > of
> > > > > > > > >> > argument,
> > > > > > > > >> > > > > > let's also assume the other brokers, say
> broker1,
> > > have
> > > > > > > > *stopped*
> > > > > > > > >> > > > fetching
> > > > > > > > >> > > > > > from
> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
> > -1
> > > > > (ALL)
> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> ahead
> > of
> > > > > > > > >> LeaderAndISR
> > > > > > > > >> > > will
> > > > > > > > >> > > > be
> > > > > > > > >> > > > > > put into the purgatory,
> > > > > > > > >> > > > > >         and since they'll never be replicated to
> > > other
> > > > > > > brokers
> > > > > > > > >> > > (because
> > > > > > > > >> > > > > of
> > > > > > > > >> > > > > > the assumption made above), they will
> > > > > > > > >> > > > > >         be completed either when the
> LeaderAndISR
> > > > > request
> > > > > > is
> > > > > > > > >> > > processed
> > > > > > > > >> > > > or
> > > > > > > > >> > > > > > when the timeout happens.
> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > > > transition
> > > > > > the
> > > > > > > > >> > > partition
> > > > > > > > >> > > > > > test-0 to become a follower,
> > > > > > > > >> > > > > >         after the current broker sees the
> > > replication
> > > > of
> > > > > > the
> > > > > > > > >> > > remaining
> > > > > > > > >> > > > 19
> > > > > > > > >> > > > > > partitions, it can send a response indicating
> that
> > > > > > > > >> > > > > >         it's no longer the leader for the
> > "test-0".
> > > > > > > > >> > > > > >   To see the latency difference between 1.1 and
> > 1.2,
> > > > > let's
> > > > > > > say
> > > > > > > > >> > there
> > > > > > > > >> > > > are
> > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
> > and
> > > > > there
> > > > > > > are
> > > > > > > > 8
> > > > > > > > >> io
> > > > > > > > >> > > > > threads,
> > > > > > > > >> > > > > >   so each io thread will process approximately
> > 3000
> > > > > > produce
> > > > > > > > >> > requests.
> > > > > > > > >> > > > Now
> > > > > > > > >> > > > > > let's investigate the io thread that finally
> > > processed
> > > > > the
> > > > > > > > >> > > > LeaderAndISR.
> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> > > time
> > > > > when
> > > > > > > > their
> > > > > > > > >> > > > > remaining
> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
> > the
> > > > > > > > LeaderAndISR
> > > > > > > > >> > > > request
> > > > > > > > >> > > > > is
> > > > > > > > >> > > > > > processed at time t3000.
> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> would
> > > have
> > > > > > > waited
> > > > > > > > an
> > > > > > > > >> > > extra
> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
> extra
> > > > time
> > > > > of
> > > > > > > > >> t3000 -
> > > > > > > > >> > > t1,
> > > > > > > > >> > > > > etc.
> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> > bigger
> > > > for
> > > > > > the
> > > > > > > > >> > earlier
> > > > > > > > >> > > > > > produce requests than for the later ones. For
> the
> > > same
> > > > > > > reason,
> > > > > > > > >> the
> > > > > > > > >> > > more
> > > > > > > > >> > > > > > ProduceRequests queued
> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we
> > get
> > > > > > (capped
> > > > > > > > by
> > > > > > > > >> the
> > > > > > > > >> > > > > > produce timeout).
> > > > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0
> or
> > > > > acks=1
> > > > > > > > >> > > > > >   There will be no latency differences in this
> > case,
> > > > but
> > > > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > > > test-0
> > > > > in
> > > > > > > the
> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will
> be
> > > > > appended
> > > > > > > to
> > > > > > > > >> the
> > > > > > > > >> > > local
> > > > > > > > >> > > > > log,
> > > > > > > > >> > > > > >         and eventually be truncated after
> > processing
> > > > the
> > > > > > > > >> > > LeaderAndISR.
> > > > > > > > >> > > > > > This is what's referred to as
> > > > > > > > >> > > > > >         "some unofficial definition of data loss
> > in
> > > > > terms
> > > > > > of
> > > > > > > > >> > messages
> > > > > > > > >> > > > > > beyond the high watermark".
> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> > > since
> > > > if
> > > > > > the
> > > > > > > > >> > > > LeaderAndISR
> > > > > > > > >> > > > > > is immediately processed, the response to
> > producers
> > > > will
> > > > > > > have
> > > > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > > > producers
> > > > > > > to
> > > > > > > > >> retry
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > This explanation above is the benefit for
> reducing
> > > the
> > > > > > > latency
> > > > > > > > >> of a
> > > > > > > > >> > > > > broker
> > > > > > > > >> > > > > > becoming the follower,
> > > > > > > > >> > > > > > closely related is reducing the latency of a
> > broker
> > > > > > becoming
> > > > > > > > the
> > > > > > > > >> > > > leader.
> > > > > > > > >> > > > > > In this case, the benefit is even more obvious,
> if
> > > > other
> > > > > > > > brokers
> > > > > > > > >> > have
> > > > > > > > >> > > > > > resigned leadership, and the
> > > > > > > > >> > > > > > current broker should take leadership. Any delay
> > in
> > > > > > > processing
> > > > > > > > >> the
> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
> > this
> > > > can
> > > > > > > cause
> > > > > > > > >> > failed
> > > > > > > > >> > > > > > produce requests if the retries are
> > > > > > > > >> > > > > > exhausted.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Another two types of controller requests are
> > > > > > UpdateMetadata
> > > > > > > > and
> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> > follows:
> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> > > means
> > > > > > > clients
> > > > > > > > >> > > receiving
> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
> > info
> > > > > > > > >> > > > > > for certain partitions, and the effect is more
> > > retries
> > > > > or
> > > > > > > even
> > > > > > > > >> > fatal
> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
> may
> > > > > degrade
> > > > > > > the
> > > > > > > > >> > > > performance
> > > > > > > > >> > > > > > of topic deletion.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > > > >> > DescribeLogDirsRequest,
> > > > > > > > >> > > > you
> > > > > > > > >> > > > > > are right
> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
> > > getting
> > > > > the
> > > > > > > log
> > > > > > > > >> dirs
> > > > > > > > >> > > > info,
> > > > > > > > >> > > > > > and it's only relevant
> > > > > > > > >> > > > > > when controller requests are involved.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regards,
> > > > > > > > >> > > > > > Lucas
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > > > lindong28@gmail.com
> > > > > > > > >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > >> Hey Jun,
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Thanks much for the comments. It is good point.
> > So
> > > > the
> > > > > > > > feature
> > > > > > > > >> may
> > > > > > > > >> > > be
> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> > > below.
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Hey Lucas,
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Do you think this feature is also useful for
> > > non-JBOD
> > > > > > setup
> > > > > > > > or
> > > > > > > > >> it
> > > > > > > > >> > is
> > > > > > > > >> > > > > only
> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > > > understand
> > > > > > > > this.
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order
> to
> > > move
> > > > > > > leaders
> > > > > > > > >> on
> > > > > > > > >> > the
> > > > > > > > >> > > > > >> failed
> > > > > > > > >> > > > > >> disk to other disks, the system operator first
> > > needs
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > >> > list
> > > > > > > > >> > > > of
> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> currently
> > > > > achieved
> > > > > > > > using
> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > > > >> DescribeLogDirsRequest
> > > > > > > > >> > to
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
> > > > requests,
> > > > > > then
> > > > > > > > the
> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > > > >> > > > > >> may still take a long time to be processed by
> the
> > > > > broker.
> > > > > > > So
> > > > > > > > >> the
> > > > > > > > >> > > > overall
> > > > > > > > >> > > > > >> time to move leaders away from the failed disk
> > may
> > > > > still
> > > > > > be
> > > > > > > > >> long
> > > > > > > > >> > > even
> > > > > > > > >> > > > > with
> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Thanks,
> > > > > > > > >> > > > > >> Dong
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > > > >> lucasatucla@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > @Dong,
> > > > > > > > >> > > > > >> > Since both of the two comments in your
> previous
> > > > email
> > > > > > are
> > > > > > > > >> about
> > > > > > > > >> > > the
> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
> > that
> > > > > this
> > > > > > > KIP
> > > > > > > > >> can
> > > > > > > > >> > be
> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > Regards,
> > > > > > > > >> > > > > >> > Lucas
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > > > jun@confluent.io>
> > > > > > > > >> > wrote:
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> > probably
> > > > > > should
> > > > > > > > just
> > > > > > > > >> > kill
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> help.
> > If
> > > > > only
> > > > > > > one
> > > > > > > > of
> > > > > > > > >> > the
> > > > > > > > >> > > > > disks
> > > > > > > > >> > > > > >> on
> > > > > > > > >> > > > > >> > a
> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
> > disk
> > > > and
> > > > > > move
> > > > > > > > the
> > > > > > > > >> > > > leaders
> > > > > > > > >> > > > > on
> > > > > > > > >> > > > > >> > that
> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
> > able
> > > > to
> > > > > > > > process
> > > > > > > > >> the
> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > > > >> > > > > >> > > requests faster will potentially help the
> > > > producers
> > > > > > > > recover
> > > > > > > > >> > > > quicker.
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > Thanks,
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > Jun
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > > > >> lindong28@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > > questions
> > > > > > below.
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> covers
> > 20
> > > > > > > > partitions
> > > > > > > > >> > that
> > > > > > > > >> > > > are
> > > > > > > > >> > > > > >> > > randomly
> > > > > > > > >> > > > > >> > > > distributed across all partitions, then
> > each
> > > > > > > > >> ProduceRequest
> > > > > > > > >> > > will
> > > > > > > > >> > > > > >> likely
> > > > > > > > >> > > > > >> > > > cover some partitions for which the
> broker
> > is
> > > > > still
> > > > > > > > >> leader
> > > > > > > > >> > > after
> > > > > > > > >> > > > > it
> > > > > > > > >> > > > > >> > > quickly
> > > > > > > > >> > > > > >> > > > processes the
> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
> still
> > > be
> > > > > slow
> > > > > > > in
> > > > > > > > >> > > > processing
> > > > > > > > >> > > > > >> these
> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
> > very
> > > > > high
> > > > > > > with
> > > > > > > > >> this
> > > > > > > > >> > > > KIP.
> > > > > > > > >> > > > > It
> > > > > > > > >> > > > > >> > > seems
> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> timeout
> > > > after
> > > > > > 30
> > > > > > > > >> > seconds.
> > > > > > > > >> > > Is
> > > > > > > > >> > > > > >> this
> > > > > > > > >> > > > > >> > > > understanding correct?
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> > > still
> > > > > > > timeout
> > > > > > > > >> after
> > > > > > > > >> > > 30
> > > > > > > > >> > > > > >> > seconds,
> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> reduces
> > > > > average
> > > > > > > > >> produce
> > > > > > > > >> > > > > latency.
> > > > > > > > >> > > > > >> Can
> > > > > > > > >> > > > > >> > > you
> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
> > this
> > > > KIP?
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
> cares
> > > > > number
> > > > > > of
> > > > > > > > >> > > truncated
> > > > > > > > >> > > > > >> > messages.
> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > > > throughput
> > > > > > > or
> > > > > > > > >> > reduce
> > > > > > > > >> > > > > >> message
> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> understand
> > > > this.
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Thanks,
> > > > > > > > >> > > > > >> > > > Dong
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang
> <
> > > > > > > > >> > > lucasatucla@gmail.com
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > >> > wrote:
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> Please
> > > see
> > > > > my
> > > > > > > > reply
> > > > > > > > >> > > below.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> > partition.
> > > > Now
> > > > > > > let's
> > > > > > > > >> > > consider
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > >> more
> > > > > > > > >> > > > > >> > > > common
> > > > > > > > >> > > > > >> > > > > scenario
> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > > > partitions.
> > > > > > And
> > > > > > > > >> let's
> > > > > > > > >> > > say
> > > > > > > > >> > > > > for
> > > > > > > > >> > > > > >> > some
> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> > broker0
> > > is
> > > > > so
> > > > > > > > large,
> > > > > > > > >> > say
> > > > > > > > >> > > > 10K,
> > > > > > > > >> > > > > >> that
> > > > > > > > >> > > > > >> > > the
> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
> the
> > > > > > leadership
> > > > > > > > >> for a
> > > > > > > > >> > > lot
> > > > > > > > >> > > > of
> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > > > >> > > > > >> > > > > either manually or through some service
> > > like
> > > > > > cruise
> > > > > > > > >> > control.
> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> > leadership
> > > > > > > > transitions
> > > > > > > > >> > > finish
> > > > > > > > >> > > > > >> more
> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> > > becoming
> > > > > more
> > > > > > > > >> > balanced,
> > > > > > > > >> > > > > >> > > > > but all existing producers
> corresponding
> > to
> > > > the
> > > > > > 9K
> > > > > > > > >> > > partitions
> > > > > > > > >> > > > > will
> > > > > > > > >> > > > > >> > get
> > > > > > > > >> > > > > >> > > > the
> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> > > thanks
> > > > to
> > > > > > the
> > > > > > > > >> > batched
> > > > > > > > >> > > > > async
> > > > > > > > >> > > > > >> ZK
> > > > > > > > >> > > > > >> > > > > operations.
> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> > during
> > > > such
> > > > > > > > >> > troublesome
> > > > > > > > >> > > > > times.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
> have
> > > > shown
> > > > > > > that
> > > > > > > > >> with
> > > > > > > > >> > > this
> > > > > > > > >> > > > > KIP
> > > > > > > > >> > > > > >> > many
> > > > > > > > >> > > > > >> > > > > producers
> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > > NotLeaderForPartition,
> > > > > > > > based
> > > > > > > > >> on
> > > > > > > > >> > > > which
> > > > > > > > >> > > > > >> they
> > > > > > > > >> > > > > >> > > > retry
> > > > > > > > >> > > > > >> > > > > immediately.
> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> seconds+quick
> > > > retry)
> > > > > > for
> > > > > > > > >> their
> > > > > > > > >> > > > single
> > > > > > > > >> > > > > >> > > message
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > much smaller
> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
> > > without
> > > > > the
> > > > > > > KIP
> > > > > > > > >> (30
> > > > > > > > >> > > > seconds
> > > > > > > > >> > > > > >> for
> > > > > > > > >> > > > > >> > > > timing
> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> timing
> > > out
> > > > on
> > > > > > the
> > > > > > > > >> > producer
> > > > > > > > >> > > > > side
> > > > > > > > >> > > > > >> can
> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > > > drawbacks[1].
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show
> the
> > > > > number
> > > > > > of
> > > > > > > > >> > > truncated
> > > > > > > > >> > > > > >> > messages
> > > > > > > > >> > > > > >> > > on
> > > > > > > > >> > > > > >> > > > > brokers,
> > > > > > > > >> > > > > >> > > > > with the experiments done in the Google
> > > Doc,
> > > > it
> > > > > > > > should
> > > > > > > > >> be
> > > > > > > > >> > > easy
> > > > > > > > >> > > > > to
> > > > > > > > >> > > > > >> see
> > > > > > > > >> > > > > >> > > > that
> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > > > up-to-date
> > > > > > > > >> metadata
> > > > > > > > >> > > > avoids
> > > > > > > > >> > > > > >> > > appending
> > > > > > > > >> > > > > >> > > > > of messages
> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
> > talk
> > > > to a
> > > > > > > > system
> > > > > > > > >> > > > operator
> > > > > > > > >> > > > > >> and
> > > > > > > > >> > > > > >> > ask
> > > > > > > > >> > > > > >> > > > > whether
> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
> > most
> > > > > likely
> > > > > > > the
> > > > > > > > >> > answer
> > > > > > > > >> > > > is
> > > > > > > > >> > > > > >> yes.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> > > might
> > > > be
> > > > > > > > >> helpful to
> > > > > > > > >> > > > > >> construct
> > > > > > > > >> > > > > >> > > some
> > > > > > > > >> > > > > >> > > > > formulas.
> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
> back
> > to
> > > > the
> > > > > > > case
> > > > > > > > >> where
> > > > > > > > >> > > > there
> > > > > > > > >> > > > > >> is
> > > > > > > > >> > > > > >> > > only
> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > > > >> > > > > >> > > > > Following the experiments in the Google
> > > Doc,
> > > > > > let's
> > > > > > > > say
> > > > > > > > >> > > broker0
> > > > > > > > >> > > > > >> > becomes
> > > > > > > > >> > > > > >> > > > the
> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > > > requests
> > > > > > in
> > > > > > > > its
> > > > > > > > >> > > > request
> > > > > > > > >> > > > > >> > queue.
> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> > > this
> > > > > KIP,
> > > > > > > > >> broker0
> > > > > > > > >> > > can
> > > > > > > > >> > > > > >> reply
> > > > > > > > >> > > > > >> > > with
> > > > > > > > >> > > > > >> > > > an
> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > > processing
> > > > > > time
> > > > > > > of
> > > > > > > > >> > > replying
> > > > > > > > >> > > > > >> with
> > > > > > > > >> > > > > >> > > such
> > > > > > > > >> > > > > >> > > > an
> > > > > > > > >> > > > > >> > > > > error message.
> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will need
> to
> > > > > append
> > > > > > > > >> messages
> > > > > > > > >> > to
> > > > > > > > >> > > > > >> > segments,
> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > > processing
> > > > > > time
> > > > > > > > for
> > > > > > > > >> > such
> > > > > > > > >> > > > > logic.
> > > > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > > > without
> > > > > > > this
> > > > > > > > >> KIP
> > > > > > > > >> > is
> > > > > > > > >> > > N
> > > > > > > > >> > > > *
> > > > > > > > >> > > > > >> (M2 -
> > > > > > > > >> > > > > >> > > > M1) /
> > > > > > > > >> > > > > >> > > > > 2.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> > > than
> > > > > M1,
> > > > > > > > which
> > > > > > > > >> > means
> > > > > > > > >> > > > as
> > > > > > > > >> > > > > >> long
> > > > > > > > >> > > > > >> > > as N
> > > > > > > > >> > > > > >> > > > > is positive,
> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> average
> > > > > latency.
> > > > > > > > >> > > > > >> > > > > There does not need to be significant
> > > backlog
> > > > > of
> > > > > > > > >> requests
> > > > > > > > >> > in
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > > request
> > > > > > > > >> > > > > >> > > > > queue,
> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> performance
> > > to
> > > > > have
> > > > > > > the
> > > > > > > > >> > > > > improvement.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Regards,
> > > > > > > > >> > > > > >> > > > > Lucas
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout
> on
> > > the
> > > > > > > > producer
> > > > > > > > >> > side
> > > > > > > > >> > > > can
> > > > > > > > >> > > > > >> > trigger
> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > > > overloaded,
> > > > > > > > >> > > > exacerbating
> > > > > > > > >> > > > > >> the
> > > > > > > > >> > > > > >> > > > > situation.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
> Lin
> > <
> > > > > > > > >> > > lindong28@gmail.com
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > >> > wrote:
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > > documentation
> > > > of
> > > > > > the
> > > > > > > > >> > > > experiment.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> > separate
> > > > > queue
> > > > > > > for
> > > > > > > > >> > > > controller
> > > > > > > > >> > > > > >> > > requests
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in
> the
> > > > > summary
> > > > > > > > >> section
> > > > > > > > >> > of
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > Google
> > > > > > > > >> > > > > >> > > > > doc,
> > > > > > > > >> > > > > >> > > > > > controller requests are generally
> more
> > > > > > important
> > > > > > > > than
> > > > > > > > >> > data
> > > > > > > > >> > > > > >> requests
> > > > > > > > >> > > > > >> > > and
> > > > > > > > >> > > > > >> > > > > we
> > > > > > > > >> > > > > >> > > > > > probably want controller requests to
> be
> > > > > > processed
> > > > > > > > >> > sooner.
> > > > > > > > >> > > > But
> > > > > > > > >> > > > > >> then
> > > > > > > > >> > > > > >> > > Eno
> > > > > > > > >> > > > > >> > > > > has
> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
> not
> > > sure
> > > > > the
> > > > > > > > >> Google
> > > > > > > > >> > doc
> > > > > > > > >> > > > has
> > > > > > > > >> > > > > >> > > answered
> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > > > following
> > > > > > > > >> questions?
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> > > actual
> > > > > > > benefit
> > > > > > > > of
> > > > > > > > >> > > > KIP-291
> > > > > > > > >> > > > > to
> > > > > > > > >> > > > > >> > > users.
> > > > > > > > >> > > > > >> > > > > The
> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > > > simulates
> > > > > > the
> > > > > > > > >> > scenario
> > > > > > > > >> > > > that
> > > > > > > > >> > > > > >> > broker
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due
> > to
> > > > e.g.
> > > > > > > slow
> > > > > > > > >> disk.
> > > > > > > > >> > > It
> > > > > > > > >> > > > > >> > currently
> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
> partition.
> > > But
> > > > > in
> > > > > > > the
> > > > > > > > >> > common
> > > > > > > > >> > > > > >> scenario,
> > > > > > > > >> > > > > >> > > it
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> > there
> > > > are
> > > > > > many
> > > > > > > > >> other
> > > > > > > > >> > > > > >> partitions
> > > > > > > > >> > > > > >> > > that
> > > > > > > > >> > > > > >> > > > > are
> > > > > > > > >> > > > > >> > > > > > also actively produced to and
> > > > ProduceRequest
> > > > > to
> > > > > > > > these
> > > > > > > > >> > > > > partition
> > > > > > > > >> > > > > >> > also
> > > > > > > > >> > > > > >> > > > > takes
> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
> even
> > > if
> > > > > > > broker0
> > > > > > > > >> can
> > > > > > > > >> > > > become
> > > > > > > > >> > > > > >> > > follower
> > > > > > > > >> > > > > >> > > > > for
> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
> still
> > > > needs
> > > > > > to
> > > > > > > > >> process
> > > > > > > > >> > > the
> > > > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > > > ProduceRequests
> > > > > > > > >> > cover
> > > > > > > > >> > > > > other
> > > > > > > > >> > > > > >> > > > > partitions.
> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> > > timeout
> > > > > > after
> > > > > > > > 30
> > > > > > > > >> > > seconds
> > > > > > > > >> > > > > and
> > > > > > > > >> > > > > >> > most
> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
> after
> > > 30
> > > > > > > seconds.
> > > > > > > > >> Then
> > > > > > > > >> > > it
> > > > > > > > >> > > > is
> > > > > > > > >> > > > > >> not
> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
> client
> > > > since
> > > > > > > > client
> > > > > > > > >> > will
> > > > > > > > >> > > > > >> timeout
> > > > > > > > >> > > > > >> > > after
> > > > > > > > >> > > > > >> > > > > 30
> > > > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting
> > to
> > > > > > broker1,
> > > > > > > > >> with
> > > > > > > > >> > or
> > > > > > > > >> > > > > >> without
> > > > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> > > specific
> > > > > > > > benefits
> > > > > > > > >> of
> > > > > > > > >> > > this
> > > > > > > > >> > > > > >> KIP to
> > > > > > > > >> > > > > >> > > > user
> > > > > > > > >> > > > > >> > > > > or
> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
> this
> > > KIP
> > > > > > > > decreases
> > > > > > > > >> > > > average
> > > > > > > > >> > > > > >> > > latency,
> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > > > exception
> > > > > > > > >> exposed
> > > > > > > > >> > to
> > > > > > > > >> > > > > >> client
> > > > > > > > >> > > > > >> > > etc.
> > > > > > > > >> > > > > >> > > > It
> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > > > experience
> > > > > > > only
> > > > > > > > >> when
> > > > > > > > >> > > > there
> > > > > > > > >> > > > > is
> > > > > > > > >> > > > > >> > > issue
> > > > > > > > >> > > > > >> > > > > with
> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in
> the
> > > > > request
> > > > > > > > queue
> > > > > > > > >> > due
> > > > > > > > >> > > to
> > > > > > > > >> > > > > >> slow
> > > > > > > > >> > > > > >> > > disk
> > > > > > > > >> > > > > >> > > > as
> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
> this
> > > KIP
> > > > > > also
> > > > > > > > >> useful
> > > > > > > > >> > > when
> > > > > > > > >> > > > > >> there
> > > > > > > > >> > > > > >> > is
> > > > > > > > >> > > > > >> > > > no
> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
> might
> > be
> > > > > > helpful
> > > > > > > > to
> > > > > > > > >> > > clarify
> > > > > > > > >> > > > > >> this
> > > > > > > > >> > > > > >> > to
> > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > > > >> > > > > >> > > > > > Dong
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
> Lucas
> > > > Wang <
> > > > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > > > experiment
> > > > > > > > >> results.
> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
> impact
> > > > > > achieved
> > > > > > > by
> > > > > > > > >> > > > > implementing
> > > > > > > > >> > > > > >> > the
> > > > > > > > >> > > > > >> > > > > > proposed
> > > > > > > > >> > > > > >> > > > > > > change:
> > > > > > > > >> > > > > >> > > > > > >
> https://docs.google.com/document/d/
> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > > >> > > > > >> > > > > > >
> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
> time
> > > and
> > > > > let
> > > > > > > me
> > > > > > > > >> know
> > > > > > > > >> > > your
> > > > > > > > >> > > > > >> > > feedback.
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> > > Harsha <
> > > > > > > > >> > > kafka@harsha.io>
> > > > > > > > >> > > > > >> wrote:
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
> take a
> > > > look
> > > > > > > might
> > > > > > > > >> suit
> > > > > > > > >> > > our
> > > > > > > > >> > > > > >> > > > requirements
> > > > > > > > >> > > > > >> > > > > > > > better.
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
> PM,
> > > > Lucas
> > > > > > > Wang <
> > > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > > > replication
> > > > > > > > >> quota
> > > > > > > > >> > > > > mechanism
> > > > > > > > >> > > > > >> > > > proposed
> > > > > > > > >> > > > > >> > > > > > in
> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > > > scenario.
> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Good point, Joel. I agree that a dedicated controller request handling
thread would be a better isolation. It also solves the reordering issue.

On Thu, Jul 19, 2018 at 2:23 PM, Joel Koshy <jj...@gmail.com> wrote:

> Good example. I think this scenario can occur in the current code as well
> but with even lower probability given that there are other non-controller
> requests interleaved. It is still sketchy though and I think a safer
> approach would be separate queues and pinning controller request handling
> to one handler thread.
>
> On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Becket,
> >
> > I think you are right that there may be out-of-order processing. However,
> > it seems that out-of-order processing may also happen even if we use a
> > separate queue.
> >
> > Here is the example:
> >
> > - Controller sends R1 and got disconnected before receiving response.
> Then
> > it reconnects and sends R2. Both requests now stay in the controller
> > request queue in the order they are sent.
> > - thread1 takes R1_a from the request queue and then thread2 takes R2
> from
> > the request queue almost at the same time.
> > - So R1_a and R2 are processed in parallel. There is chance that R2's
> > processing is completed before R1.
> >
> > If out-of-order processing can happen for both approaches with very low
> > probability, it may not be worthwhile to add the extra queue. What do you
> > think?
> >
> > Thanks,
> > Dong
> >
> >
> > On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Hi Mayuresh/Joel,
> > >
> > > Using the request channel as a dequeue was bright up some time ago when
> > we
> > > initially thinking of prioritizing the request. The concern was that
> the
> > > controller requests are supposed to be processed in order. If we can
> > ensure
> > > that there is one controller request in the request channel, the order
> is
> > > not a concern. But in cases that there are more than one controller
> > request
> > > inserted into the queue, the controller request order may change and
> > cause
> > > problem. For example, think about the following sequence:
> > > 1. Controller successfully sent a request R1 to broker
> > > 2. Broker receives R1 and put the request to the head of the request
> > queue.
> > > 3. Controller to broker connection failed and the controller
> reconnected
> > to
> > > the broker.
> > > 4. Controller sends a request R2 to the broker
> > > 5. Broker receives R2 and add it to the head of the request queue.
> > > Now on the broker side, R2 will be processed before R1 is processed,
> > which
> > > may cause problem.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > @Mayuresh - I like your idea. It appears to be a simpler less
> invasive
> > > > alternative and it should work. Jun/Becket/others, do you see any
> > > pitfalls
> > > > with this approach?
> > > >
> > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > @Mayuresh,
> > > > > That's a very interesting idea that I haven't thought before.
> > > > > It seems to solve our problem at hand pretty well, and also
> > > > > avoids the need to have a new size metric and capacity config
> > > > > for the controller request queue. In fact, if we were to adopt
> > > > > this design, there is no public interface change, and we
> > > > > probably don't need a KIP.
> > > > > Also implementation wise, it seems
> > > > > the java class LinkedBlockingQueue can readily satisfy the
> > requirement
> > > > > by supporting a capacity, and also allowing inserting at both ends.
> > > > >
> > > > > My only concern is that this design is tied to the coincidence that
> > > > > we have two request priorities and there are two ends to a deque.
> > > > > Hence by using the proposed design, it seems the network layer is
> > > > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > > > an extra priority level in the future for some reason, we would
> > > probably
> > > > > need to go back to the design of separate queues, one for each
> > priority
> > > > > level.
> > > > >
> > > > > In summary, I'm ok with both designs and lean toward your suggested
> > > > > approach.
> > > > > Let's hear what others think.
> > > > >
> > > > > @Becket,
> > > > > In light of Mayuresh's suggested new design, I'm answering your
> > > question
> > > > > only in the context
> > > > > of the current KIP design: I think your suggestion makes sense, and
> > I'm
> > > > ok
> > > > > with removing the capacity config and
> > > > > just relying on the default value of 20 being sufficient enough.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > >
> > > > > > Seems like the main intent here is to prioritize the controller
> > > request
> > > > > > over any other requests.
> > > > > > In that case, we can change the request queue to a dequeue, where
> > you
> > > > > > always insert the normal requests (produce, consume,..etc) to the
> > end
> > > > of
> > > > > > the dequeue, but if its a controller request, you insert it to
> the
> > > head
> > > > > of
> > > > > > the queue. This ensures that the controller request will be given
> > > > higher
> > > > > > priority over other requests.
> > > > > >
> > > > > > Also since we only read one request from the socket and mute it
> and
> > > > only
> > > > > > unmute it after handling the request, this would ensure that we
> > don't
> > > > > > handle controller requests out of order.
> > > > > >
> > > > > > With this approach we can avoid the second queue and the
> additional
> > > > > config
> > > > > > for the size of the queue.
> > > > > >
> > > > > > What do you think ?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hey Joel,
> > > > > > >
> > > > > > > Thank for the detail explanation. I agree the current design
> > makes
> > > > > sense.
> > > > > > > My confusion is about whether the new config for the controller
> > > queue
> > > > > > > capacity is necessary. I cannot think of a case in which users
> > > would
> > > > > > change
> > > > > > > it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Lucas,
> > > > > > > >
> > > > > > > > I guess my question can be rephrased to "do we expect user to
> > > ever
> > > > > > change
> > > > > > > > the controller request queue capacity"? If we agree that 20
> is
> > > > > already
> > > > > > a
> > > > > > > > very generous default number and we do not expect user to
> > change
> > > > it,
> > > > > is
> > > > > > > it
> > > > > > > > still necessary to expose this as a config?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> @Becket
> > > > > > > >> 1. Thanks for the comment. You are right that normally there
> > > > should
> > > > > be
> > > > > > > >> just
> > > > > > > >> one controller request because of muting,
> > > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > > controller
> > > > > > > >> requests.
> > > > > > > >> I went through the KIP again, and I'm not sure which part
> > > conveys
> > > > > that
> > > > > > > >> info.
> > > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > > >>
> > > > > > > >> 2. Though it should not happen in normal conditions, the
> > current
> > > > > > design
> > > > > > > >> does not preclude multiple controllers running
> > > > > > > >> at the same time, hence if we don't have the controller
> queue
> > > > > capacity
> > > > > > > >> config and simply make its capacity to be 1,
> > > > > > > >> network threads handling requests from different controllers
> > > will
> > > > be
> > > > > > > >> blocked during those troublesome times,
> > > > > > > >> which is probably not what we want. On the other hand,
> adding
> > > the
> > > > > > extra
> > > > > > > >> config with a default value, say 20, guards us from issues
> in
> > > > those
> > > > > > > >> troublesome times, and IMO there isn't much downside of
> adding
> > > the
> > > > > > extra
> > > > > > > >> config.
> > > > > > > >>
> > > > > > > >> @Mayuresh
> > > > > > > >> Good catch, this sentence is an obsolete statement based on
> a
> > > > > previous
> > > > > > > >> design. I've revised the wording in the KIP.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Lucas
> > > > > > > >>
> > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Lucas,
> > > > > > > >> >
> > > > > > > >> > Thanks for the KIP.
> > > > > > > >> > I am trying to understand why you think "The memory
> > > consumption
> > > > > can
> > > > > > > rise
> > > > > > > >> > given the total number of queued requests can go up to 2x"
> > in
> > > > the
> > > > > > > impact
> > > > > > > >> > section. Normally the requests from controller to a Broker
> > are
> > > > not
> > > > > > > high
> > > > > > > >> > volume, right ?
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > Mayuresh
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
> > from
> > > > the
> > > > > > > data
> > > > > > > >> > plane
> > > > > > > >> > > makes a lot of sense.
> > > > > > > >> > >
> > > > > > > >> > > In the KIP you mentioned that the controller request
> queue
> > > may
> > > > > > have
> > > > > > > >> many
> > > > > > > >> > > requests in it. Will this be a common case? The
> controller
> > > > > > requests
> > > > > > > >> still
> > > > > > > >> > > goes through the SocketServer. The SocketServer will
> mute
> > > the
> > > > > > > channel
> > > > > > > >> > once
> > > > > > > >> > > a request is read and put into the request channel. So
> > > > assuming
> > > > > > > there
> > > > > > > >> is
> > > > > > > >> > > only one connection between controller and each broker,
> on
> > > the
> > > > > > > broker
> > > > > > > >> > side,
> > > > > > > >> > > there should be only one controller request in the
> > > controller
> > > > > > > request
> > > > > > > >> > queue
> > > > > > > >> > > at any given time. If that is the case, do we need a
> > > separate
> > > > > > > >> controller
> > > > > > > >> > > request queue capacity config? The default value 20
> means
> > > that
> > > > > we
> > > > > > > >> expect
> > > > > > > >> > > there are 20 controller switches to happen in a short
> > period
> > > > of
> > > > > > > time.
> > > > > > > >> I
> > > > > > > >> > am
> > > > > > > >> > > not sure whether someone should increase the controller
> > > > request
> > > > > > > queue
> > > > > > > >> > > capacity to handle such case, as it seems indicating
> > > something
> > > > > > very
> > > > > > > >> wrong
> > > > > > > >> > > has happened.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > >
> > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > lindong28@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Thanks for the update Lucas.
> > > > > > > >> > > >
> > > > > > > >> > > > I think the motivation section is intuitive. It will
> be
> > > good
> > > > > to
> > > > > > > >> learn
> > > > > > > >> > > more
> > > > > > > >> > > > about the comments from other reviewers.
> > > > > > > >> > > >
> > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi Dong,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > > explaining
> > > > > > the
> > > > > > > >> > cases
> > > > > > > >> > > > that
> > > > > > > >> > > > > would have user impacts.
> > > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thanks,
> > > > > > > >> > > > > Lucas
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > >> >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Hi Dong,
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > The simulation of disk being slow is merely for me
> > to
> > > > > easily
> > > > > > > >> > > construct
> > > > > > > >> > > > a
> > > > > > > >> > > > > > testing scenario
> > > > > > > >> > > > > > with a backlog of produce requests. In production,
> > > other
> > > > > > than
> > > > > > > >> the
> > > > > > > >> > > disk
> > > > > > > >> > > > > > being slow, a backlog of
> > > > > > > >> > > > > > produce requests may also be caused by high
> produce
> > > QPS.
> > > > > > > >> > > > > > In that case, we may not want to kill the broker
> and
> > > > > that's
> > > > > > > when
> > > > > > > >> > this
> > > > > > > >> > > > KIP
> > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > > >> > > > > > and non-JBOD setup.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Going back to your previous question about each
> > > > > > ProduceRequest
> > > > > > > >> > > covering
> > > > > > > >> > > > > 20
> > > > > > > >> > > > > > partitions that are randomly
> > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > > enqueued
> > > > > > that
> > > > > > > >> > tries
> > > > > > > >> > > to
> > > > > > > >> > > > > > switch the current broker, say broker0, from
> leader
> > to
> > > > > > > follower
> > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> > > sake
> > > > of
> > > > > > > >> > argument,
> > > > > > > >> > > > > > let's also assume the other brokers, say broker1,
> > have
> > > > > > > *stopped*
> > > > > > > >> > > > fetching
> > > > > > > >> > > > > > from
> > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
> -1
> > > > (ALL)
> > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead
> of
> > > > > > > >> LeaderAndISR
> > > > > > > >> > > will
> > > > > > > >> > > > be
> > > > > > > >> > > > > > put into the purgatory,
> > > > > > > >> > > > > >         and since they'll never be replicated to
> > other
> > > > > > brokers
> > > > > > > >> > > (because
> > > > > > > >> > > > > of
> > > > > > > >> > > > > > the assumption made above), they will
> > > > > > > >> > > > > >         be completed either when the LeaderAndISR
> > > > request
> > > > > is
> > > > > > > >> > > processed
> > > > > > > >> > > > or
> > > > > > > >> > > > > > when the timeout happens.
> > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > > transition
> > > > > the
> > > > > > > >> > > partition
> > > > > > > >> > > > > > test-0 to become a follower,
> > > > > > > >> > > > > >         after the current broker sees the
> > replication
> > > of
> > > > > the
> > > > > > > >> > > remaining
> > > > > > > >> > > > 19
> > > > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > > > >> > > > > >         it's no longer the leader for the
> "test-0".
> > > > > > > >> > > > > >   To see the latency difference between 1.1 and
> 1.2,
> > > > let's
> > > > > > say
> > > > > > > >> > there
> > > > > > > >> > > > are
> > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
> and
> > > > there
> > > > > > are
> > > > > > > 8
> > > > > > > >> io
> > > > > > > >> > > > > threads,
> > > > > > > >> > > > > >   so each io thread will process approximately
> 3000
> > > > > produce
> > > > > > > >> > requests.
> > > > > > > >> > > > Now
> > > > > > > >> > > > > > let's investigate the io thread that finally
> > processed
> > > > the
> > > > > > > >> > > > LeaderAndISR.
> > > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> > time
> > > > when
> > > > > > > their
> > > > > > > >> > > > > remaining
> > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
> the
> > > > > > > LeaderAndISR
> > > > > > > >> > > > request
> > > > > > > >> > > > > is
> > > > > > > >> > > > > > processed at time t3000.
> > > > > > > >> > > > > >   Without this KIP, the 1st produce request would
> > have
> > > > > > waited
> > > > > > > an
> > > > > > > >> > > extra
> > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> > > time
> > > > of
> > > > > > > >> t3000 -
> > > > > > > >> > > t1,
> > > > > > > >> > > > > etc.
> > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> bigger
> > > for
> > > > > the
> > > > > > > >> > earlier
> > > > > > > >> > > > > > produce requests than for the later ones. For the
> > same
> > > > > > reason,
> > > > > > > >> the
> > > > > > > >> > > more
> > > > > > > >> > > > > > ProduceRequests queued
> > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we
> get
> > > > > (capped
> > > > > > > by
> > > > > > > >> the
> > > > > > > >> > > > > > produce timeout).
> > > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > > > acks=1
> > > > > > > >> > > > > >   There will be no latency differences in this
> case,
> > > but
> > > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > > test-0
> > > > in
> > > > > > the
> > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > > > appended
> > > > > > to
> > > > > > > >> the
> > > > > > > >> > > local
> > > > > > > >> > > > > log,
> > > > > > > >> > > > > >         and eventually be truncated after
> processing
> > > the
> > > > > > > >> > > LeaderAndISR.
> > > > > > > >> > > > > > This is what's referred to as
> > > > > > > >> > > > > >         "some unofficial definition of data loss
> in
> > > > terms
> > > > > of
> > > > > > > >> > messages
> > > > > > > >> > > > > > beyond the high watermark".
> > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> > since
> > > if
> > > > > the
> > > > > > > >> > > > LeaderAndISR
> > > > > > > >> > > > > > is immediately processed, the response to
> producers
> > > will
> > > > > > have
> > > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > > producers
> > > > > > to
> > > > > > > >> retry
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > This explanation above is the benefit for reducing
> > the
> > > > > > latency
> > > > > > > >> of a
> > > > > > > >> > > > > broker
> > > > > > > >> > > > > > becoming the follower,
> > > > > > > >> > > > > > closely related is reducing the latency of a
> broker
> > > > > becoming
> > > > > > > the
> > > > > > > >> > > > leader.
> > > > > > > >> > > > > > In this case, the benefit is even more obvious, if
> > > other
> > > > > > > brokers
> > > > > > > >> > have
> > > > > > > >> > > > > > resigned leadership, and the
> > > > > > > >> > > > > > current broker should take leadership. Any delay
> in
> > > > > > processing
> > > > > > > >> the
> > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
> this
> > > can
> > > > > > cause
> > > > > > > >> > failed
> > > > > > > >> > > > > > produce requests if the retries are
> > > > > > > >> > > > > > exhausted.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Another two types of controller requests are
> > > > > UpdateMetadata
> > > > > > > and
> > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> follows:
> > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> > means
> > > > > > clients
> > > > > > > >> > > receiving
> > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
> info
> > > > > > > >> > > > > > for certain partitions, and the effect is more
> > retries
> > > > or
> > > > > > even
> > > > > > > >> > fatal
> > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > > > degrade
> > > > > > the
> > > > > > > >> > > > performance
> > > > > > > >> > > > > > of topic deletion.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > > >> > DescribeLogDirsRequest,
> > > > > > > >> > > > you
> > > > > > > >> > > > > > are right
> > > > > > > >> > > > > > that this KIP cannot help with the latency in
> > getting
> > > > the
> > > > > > log
> > > > > > > >> dirs
> > > > > > > >> > > > info,
> > > > > > > >> > > > > > and it's only relevant
> > > > > > > >> > > > > > when controller requests are involved.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regards,
> > > > > > > >> > > > > > Lucas
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > > lindong28@gmail.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >> Hey Jun,
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Thanks much for the comments. It is good point.
> So
> > > the
> > > > > > > feature
> > > > > > > >> may
> > > > > > > >> > > be
> > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> > below.
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Hey Lucas,
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Do you think this feature is also useful for
> > non-JBOD
> > > > > setup
> > > > > > > or
> > > > > > > >> it
> > > > > > > >> > is
> > > > > > > >> > > > > only
> > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > > understand
> > > > > > > this.
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> When the broker is setup using JBOD, in order to
> > move
> > > > > > leaders
> > > > > > > >> on
> > > > > > > >> > the
> > > > > > > >> > > > > >> failed
> > > > > > > >> > > > > >> disk to other disks, the system operator first
> > needs
> > > to
> > > > > get
> > > > > > > the
> > > > > > > >> > list
> > > > > > > >> > > > of
> > > > > > > >> > > > > >> partitions on the failed disk. This is currently
> > > > achieved
> > > > > > > using
> > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > > >> DescribeLogDirsRequest
> > > > > > > >> > to
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> broker. If we only prioritize the controller
> > > requests,
> > > > > then
> > > > > > > the
> > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > > >> > > > > >> may still take a long time to be processed by the
> > > > broker.
> > > > > > So
> > > > > > > >> the
> > > > > > > >> > > > overall
> > > > > > > >> > > > > >> time to move leaders away from the failed disk
> may
> > > > still
> > > > > be
> > > > > > > >> long
> > > > > > > >> > > even
> > > > > > > >> > > > > with
> > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Thanks,
> > > > > > > >> > > > > >> Dong
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > > >> lucasatucla@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > @Dong,
> > > > > > > >> > > > > >> > Since both of the two comments in your previous
> > > email
> > > > > are
> > > > > > > >> about
> > > > > > > >> > > the
> > > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
> that
> > > > this
> > > > > > KIP
> > > > > > > >> can
> > > > > > > >> > be
> > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > Regards,
> > > > > > > >> > > > > >> > Lucas
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > > jun@confluent.io>
> > > > > > > >> > wrote:
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> probably
> > > > > should
> > > > > > > just
> > > > > > > >> > kill
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > broker. In that case, this KIP may not help.
> If
> > > > only
> > > > > > one
> > > > > > > of
> > > > > > > >> > the
> > > > > > > >> > > > > disks
> > > > > > > >> > > > > >> on
> > > > > > > >> > > > > >> > a
> > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
> disk
> > > and
> > > > > move
> > > > > > > the
> > > > > > > >> > > > leaders
> > > > > > > >> > > > > on
> > > > > > > >> > > > > >> > that
> > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
> able
> > > to
> > > > > > > process
> > > > > > > >> the
> > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > > >> > > > > >> > > requests faster will potentially help the
> > > producers
> > > > > > > recover
> > > > > > > >> > > > quicker.
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Thanks,
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Jun
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > > >> lindong28@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > questions
> > > > > below.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers
> 20
> > > > > > > partitions
> > > > > > > >> > that
> > > > > > > >> > > > are
> > > > > > > >> > > > > >> > > randomly
> > > > > > > >> > > > > >> > > > distributed across all partitions, then
> each
> > > > > > > >> ProduceRequest
> > > > > > > >> > > will
> > > > > > > >> > > > > >> likely
> > > > > > > >> > > > > >> > > > cover some partitions for which the broker
> is
> > > > still
> > > > > > > >> leader
> > > > > > > >> > > after
> > > > > > > >> > > > > it
> > > > > > > >> > > > > >> > > quickly
> > > > > > > >> > > > > >> > > > processes the
> > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still
> > be
> > > > slow
> > > > > > in
> > > > > > > >> > > > processing
> > > > > > > >> > > > > >> these
> > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
> very
> > > > high
> > > > > > with
> > > > > > > >> this
> > > > > > > >> > > > KIP.
> > > > > > > >> > > > > It
> > > > > > > >> > > > > >> > > seems
> > > > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> > > after
> > > > > 30
> > > > > > > >> > seconds.
> > > > > > > >> > > Is
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > > > understanding correct?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> > still
> > > > > > timeout
> > > > > > > >> after
> > > > > > > >> > > 30
> > > > > > > >> > > > > >> > seconds,
> > > > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > > > average
> > > > > > > >> produce
> > > > > > > >> > > > > latency.
> > > > > > > >> > > > > >> Can
> > > > > > > >> > > > > >> > > you
> > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
> this
> > > KIP?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > > > number
> > > > > of
> > > > > > > >> > > truncated
> > > > > > > >> > > > > >> > messages.
> > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > > throughput
> > > > > > or
> > > > > > > >> > reduce
> > > > > > > >> > > > > >> message
> > > > > > > >> > > > > >> > > > duplication? It will be good to understand
> > > this.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks,
> > > > > > > >> > > > > >> > > > Dong
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > > > >> > > lucasatucla@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > >> > wrote:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
> > see
> > > > my
> > > > > > > reply
> > > > > > > >> > > below.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> partition.
> > > Now
> > > > > > let's
> > > > > > > >> > > consider
> > > > > > > >> > > > a
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > common
> > > > > > > >> > > > > >> > > > > scenario
> > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > > partitions.
> > > > > And
> > > > > > > >> let's
> > > > > > > >> > > say
> > > > > > > >> > > > > for
> > > > > > > >> > > > > >> > some
> > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > > >> > > > > >> > > > > The number of leader partitions on
> broker0
> > is
> > > > so
> > > > > > > large,
> > > > > > > >> > say
> > > > > > > >> > > > 10K,
> > > > > > > >> > > > > >> that
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > > > leadership
> > > > > > > >> for a
> > > > > > > >> > > lot
> > > > > > > >> > > > of
> > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > > >> > > > > >> > > > > either manually or through some service
> > like
> > > > > cruise
> > > > > > > >> > control.
> > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> leadership
> > > > > > > transitions
> > > > > > > >> > > finish
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> > becoming
> > > > more
> > > > > > > >> > balanced,
> > > > > > > >> > > > > >> > > > > but all existing producers corresponding
> to
> > > the
> > > > > 9K
> > > > > > > >> > > partitions
> > > > > > > >> > > > > will
> > > > > > > >> > > > > >> > get
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> > thanks
> > > to
> > > > > the
> > > > > > > >> > batched
> > > > > > > >> > > > > async
> > > > > > > >> > > > > >> ZK
> > > > > > > >> > > > > >> > > > > operations.
> > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> during
> > > such
> > > > > > > >> > troublesome
> > > > > > > >> > > > > times.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> > > shown
> > > > > > that
> > > > > > > >> with
> > > > > > > >> > > this
> > > > > > > >> > > > > KIP
> > > > > > > >> > > > > >> > many
> > > > > > > >> > > > > >> > > > > producers
> > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > NotLeaderForPartition,
> > > > > > > based
> > > > > > > >> on
> > > > > > > >> > > > which
> > > > > > > >> > > > > >> they
> > > > > > > >> > > > > >> > > > retry
> > > > > > > >> > > > > >> > > > > immediately.
> > > > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> > > retry)
> > > > > for
> > > > > > > >> their
> > > > > > > >> > > > single
> > > > > > > >> > > > > >> > > message
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > much smaller
> > > > > > > >> > > > > >> > > > > compared with the case of timing out
> > without
> > > > the
> > > > > > KIP
> > > > > > > >> (30
> > > > > > > >> > > > seconds
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > > timing
> > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > > >> > > > > >> > > > > One might argue that reducing the timing
> > out
> > > on
> > > > > the
> > > > > > > >> > producer
> > > > > > > >> > > > > side
> > > > > > > >> > > > > >> can
> > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > > drawbacks[1].
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > > > number
> > > > > of
> > > > > > > >> > > truncated
> > > > > > > >> > > > > >> > messages
> > > > > > > >> > > > > >> > > on
> > > > > > > >> > > > > >> > > > > brokers,
> > > > > > > >> > > > > >> > > > > with the experiments done in the Google
> > Doc,
> > > it
> > > > > > > should
> > > > > > > >> be
> > > > > > > >> > > easy
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> see
> > > > > > > >> > > > > >> > > > that
> > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > > up-to-date
> > > > > > > >> metadata
> > > > > > > >> > > > avoids
> > > > > > > >> > > > > >> > > appending
> > > > > > > >> > > > > >> > > > > of messages
> > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
> talk
> > > to a
> > > > > > > system
> > > > > > > >> > > > operator
> > > > > > > >> > > > > >> and
> > > > > > > >> > > > > >> > ask
> > > > > > > >> > > > > >> > > > > whether
> > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
> most
> > > > likely
> > > > > > the
> > > > > > > >> > answer
> > > > > > > >> > > > is
> > > > > > > >> > > > > >> yes.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> > might
> > > be
> > > > > > > >> helpful to
> > > > > > > >> > > > > >> construct
> > > > > > > >> > > > > >> > > some
> > > > > > > >> > > > > >> > > > > formulas.
> > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back
> to
> > > the
> > > > > > case
> > > > > > > >> where
> > > > > > > >> > > > there
> > > > > > > >> > > > > >> is
> > > > > > > >> > > > > >> > > only
> > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > > >> > > > > >> > > > > Following the experiments in the Google
> > Doc,
> > > > > let's
> > > > > > > say
> > > > > > > >> > > broker0
> > > > > > > >> > > > > >> > becomes
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > > requests
> > > > > in
> > > > > > > its
> > > > > > > >> > > > request
> > > > > > > >> > > > > >> > queue.
> > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> > this
> > > > KIP,
> > > > > > > >> broker0
> > > > > > > >> > > can
> > > > > > > >> > > > > >> reply
> > > > > > > >> > > > > >> > > with
> > > > > > > >> > > > > >> > > > an
> > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > processing
> > > > > time
> > > > > > of
> > > > > > > >> > > replying
> > > > > > > >> > > > > >> with
> > > > > > > >> > > > > >> > > such
> > > > > > > >> > > > > >> > > > an
> > > > > > > >> > > > > >> > > > > error message.
> > > > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > > > append
> > > > > > > >> messages
> > > > > > > >> > to
> > > > > > > >> > > > > >> > segments,
> > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > processing
> > > > > time
> > > > > > > for
> > > > > > > >> > such
> > > > > > > >> > > > > logic.
> > > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > > without
> > > > > > this
> > > > > > > >> KIP
> > > > > > > >> > is
> > > > > > > >> > > N
> > > > > > > >> > > > *
> > > > > > > >> > > > > >> (M2 -
> > > > > > > >> > > > > >> > > > M1) /
> > > > > > > >> > > > > >> > > > > 2.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> > than
> > > > M1,
> > > > > > > which
> > > > > > > >> > means
> > > > > > > >> > > > as
> > > > > > > >> > > > > >> long
> > > > > > > >> > > > > >> > > as N
> > > > > > > >> > > > > >> > > > > is positive,
> > > > > > > >> > > > > >> > > > > we would see improvements on the average
> > > > latency.
> > > > > > > >> > > > > >> > > > > There does not need to be significant
> > backlog
> > > > of
> > > > > > > >> requests
> > > > > > > >> > in
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > request
> > > > > > > >> > > > > >> > > > > queue,
> > > > > > > >> > > > > >> > > > > or severe degradation of disk performance
> > to
> > > > have
> > > > > > the
> > > > > > > >> > > > > improvement.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Regards,
> > > > > > > >> > > > > >> > > > > Lucas
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
> > the
> > > > > > > producer
> > > > > > > >> > side
> > > > > > > >> > > > can
> > > > > > > >> > > > > >> > trigger
> > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > > overloaded,
> > > > > > > >> > > > exacerbating
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > situation.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin
> <
> > > > > > > >> > > lindong28@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > >> > wrote:
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > documentation
> > > of
> > > > > the
> > > > > > > >> > > > experiment.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Initially I also think having a
> separate
> > > > queue
> > > > > > for
> > > > > > > >> > > > controller
> > > > > > > >> > > > > >> > > requests
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > > > summary
> > > > > > > >> section
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > Google
> > > > > > > >> > > > > >> > > > > doc,
> > > > > > > >> > > > > >> > > > > > controller requests are generally more
> > > > > important
> > > > > > > than
> > > > > > > >> > data
> > > > > > > >> > > > > >> requests
> > > > > > > >> > > > > >> > > and
> > > > > > > >> > > > > >> > > > > we
> > > > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > > > processed
> > > > > > > >> > sooner.
> > > > > > > >> > > > But
> > > > > > > >> > > > > >> then
> > > > > > > >> > > > > >> > > Eno
> > > > > > > >> > > > > >> > > > > has
> > > > > > > >> > > > > >> > > > > > two very good questions which I am not
> > sure
> > > > the
> > > > > > > >> Google
> > > > > > > >> > doc
> > > > > > > >> > > > has
> > > > > > > >> > > > > >> > > answered
> > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > > following
> > > > > > > >> questions?
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> > actual
> > > > > > benefit
> > > > > > > of
> > > > > > > >> > > > KIP-291
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > > users.
> > > > > > > >> > > > > >> > > > > The
> > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > > simulates
> > > > > the
> > > > > > > >> > scenario
> > > > > > > >> > > > that
> > > > > > > >> > > > > >> > broker
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due
> to
> > > e.g.
> > > > > > slow
> > > > > > > >> disk.
> > > > > > > >> > > It
> > > > > > > >> > > > > >> > currently
> > > > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
> > But
> > > > in
> > > > > > the
> > > > > > > >> > common
> > > > > > > >> > > > > >> scenario,
> > > > > > > >> > > > > >> > > it
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> there
> > > are
> > > > > many
> > > > > > > >> other
> > > > > > > >> > > > > >> partitions
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > are
> > > > > > > >> > > > > >> > > > > > also actively produced to and
> > > ProduceRequest
> > > > to
> > > > > > > these
> > > > > > > >> > > > > partition
> > > > > > > >> > > > > >> > also
> > > > > > > >> > > > > >> > > > > takes
> > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even
> > if
> > > > > > broker0
> > > > > > > >> can
> > > > > > > >> > > > become
> > > > > > > >> > > > > >> > > follower
> > > > > > > >> > > > > >> > > > > for
> > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> > > needs
> > > > > to
> > > > > > > >> process
> > > > > > > >> > > the
> > > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > > ProduceRequests
> > > > > > > >> > cover
> > > > > > > >> > > > > other
> > > > > > > >> > > > > >> > > > > partitions.
> > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> > timeout
> > > > > after
> > > > > > > 30
> > > > > > > >> > > seconds
> > > > > > > >> > > > > and
> > > > > > > >> > > > > >> > most
> > > > > > > >> > > > > >> > > > > > clients will still likely timeout after
> > 30
> > > > > > seconds.
> > > > > > > >> Then
> > > > > > > >> > > it
> > > > > > > >> > > > is
> > > > > > > >> > > > > >> not
> > > > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> > > since
> > > > > > > client
> > > > > > > >> > will
> > > > > > > >> > > > > >> timeout
> > > > > > > >> > > > > >> > > after
> > > > > > > >> > > > > >> > > > > 30
> > > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting
> to
> > > > > broker1,
> > > > > > > >> with
> > > > > > > >> > or
> > > > > > > >> > > > > >> without
> > > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> > specific
> > > > > > > benefits
> > > > > > > >> of
> > > > > > > >> > > this
> > > > > > > >> > > > > >> KIP to
> > > > > > > >> > > > > >> > > > user
> > > > > > > >> > > > > >> > > > > or
> > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
> > KIP
> > > > > > > decreases
> > > > > > > >> > > > average
> > > > > > > >> > > > > >> > > latency,
> > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > > exception
> > > > > > > >> exposed
> > > > > > > >> > to
> > > > > > > >> > > > > >> client
> > > > > > > >> > > > > >> > > etc.
> > > > > > > >> > > > > >> > > > It
> > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > > experience
> > > > > > only
> > > > > > > >> when
> > > > > > > >> > > > there
> > > > > > > >> > > > > is
> > > > > > > >> > > > > >> > > issue
> > > > > > > >> > > > > >> > > > > with
> > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > > > request
> > > > > > > queue
> > > > > > > >> > due
> > > > > > > >> > > to
> > > > > > > >> > > > > >> slow
> > > > > > > >> > > > > >> > > disk
> > > > > > > >> > > > > >> > > > as
> > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
> > KIP
> > > > > also
> > > > > > > >> useful
> > > > > > > >> > > when
> > > > > > > >> > > > > >> there
> > > > > > > >> > > > > >> > is
> > > > > > > >> > > > > >> > > > no
> > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might
> be
> > > > > helpful
> > > > > > > to
> > > > > > > >> > > clarify
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > > >> > > > > >> > > > > > Dong
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> > > Wang <
> > > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > > experiment
> > > > > > > >> results.
> > > > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > > > achieved
> > > > > > by
> > > > > > > >> > > > > implementing
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > > > > proposed
> > > > > > > >> > > > > >> > > > > > > change:
> > > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > > >> > > > > >> > > > > > > Please take a look when you have time
> > and
> > > > let
> > > > > > me
> > > > > > > >> know
> > > > > > > >> > > your
> > > > > > > >> > > > > >> > > feedback.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> > Harsha <
> > > > > > > >> > > kafka@harsha.io>
> > > > > > > >> > > > > >> wrote:
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> > > look
> > > > > > might
> > > > > > > >> suit
> > > > > > > >> > > our
> > > > > > > >> > > > > >> > > > requirements
> > > > > > > >> > > > > >> > > > > > > > better.
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> > > Lucas
> > > > > > Wang <
> > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > > replication
> > > > > > > >> quota
> > > > > > > >> > > > > mechanism
> > > > > > > >> > > > > >> > > > proposed
> > > > > > > >> > > > > >> > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > > scenario.
> > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > > > Harsha <
> > > > > > > >> > > > > kafka@harsha.io
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts
> > on
> > > > > making
> > > > > > > >> this
> > > > > > > >> > > > > >> configurable
> > > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of
> data
> > > > > requests
> > > > > > > to
> > > > > > > >> be
> > > > > > > >> > > > > >> > prioritized.
> > > > > > > >> > > > > >> > > > For
> > > > > > > >> > > > > >> > > > > > > > example
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when
> we
> > > > take
> > > > > > out
> > > > > > > a
> > > > > > > >> > > broker
> > > > > > > >> > > > > and
> > > > > > > >> > > > > >> > bring
> > > > > > > >> > > > > >> > > > new
> > > > > > > >> > > > > >> > > > > > one
> > > > > > > >> > > > > >> > > > > > > > it
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> > > have
> > > > > lot
> > > > > > of
> > > > > > > >> > fetch
> > > > > > > >> > > > > >> requests
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > other
> > > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > > >> > > > > >> > > > > > > > > > in clusters. This will
> negatively
> > > > > effect
> > > > > > > the
> > > > > > > >> > > > > >> > > application/client
> > > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > > We are also exploring the
> similar
> > > > > > solution
> > > > > > > to
> > > > > > > >> > > > > >> de-prioritize
> > > > > > > >> > > > > >> > > if
> > > > > > > >> > > > > >> > > > a
> > > > > > > >> > > > > >> > > > > > new
> > > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > > requests,
> > > > we
> > > > > > are
> > > > > > > >> ok
> > > > > > > >> > > with
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > replica
> > > > > > > >> > > > > >> > > > > to
> > > > > > > >> > > > > >> > > > > > be
> > > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> > should
> > > > > > > prioritize
> > > > > > > >> > the
> > > > > > > >> > > > > client
> > > > > > > >> > > > > >> > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35
> > AM
> > > > > Lucas
> > > > > > > Wang
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed
> response.
> > > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> > > feature
> > > > > > yet,
> > > > > > > >> so no
> > > > > > > >> > > > > >> > experimental
> > > > > > > >> > > > > >> > > > > > results
> > > > > > > >> > > > > >> > > > > > > > so
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in
> > the
> > > > > > > following
> > > > > > > >> > days.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right
> that
> > > the
> > > > > > > >> priority
> > > > > > > >> > > queue
> > > > > > > >> > > > > >> does
> > > > > > > >> > > > > >> > not
> > > > > > > >> > > > > >> > > > > > > > completely
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> > > ahead
> > > > > of
> > > > > > > >> > > controller
> > > > > > > >> > > > > >> > requests.
> > > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it
> to
> > > > > greatly
> > > > > > > >> > mitigate
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > effect
> > > > > > > >> > > > > >> > > > of
> > > > > > > >> > > > > >> > > > > > > stable
> > > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out
> > and
> > > > post
> > > > > > the
> > > > > > > >> > > results
> > > > > > > >> > > > > >> when I
> > > > > > > >> > > > > >> > > have
> > > > > > > >> > > > > >> > > > > it.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44
> > AM,
> > > > Eno
> > > > > > > >> Thereska
> > > > > > > >> > <
> > > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just
> > had a
> > > > > look
> > > > > > at
> > > > > > > >> > this.
> > > > > > > >> > > A
> > > > > > > >> > > > > >> couple
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any
> positive
> > > > > change
> > > > > > > >> after
> > > > > > > >> > > > > >> implementing
> > > > > > > >> > > > > >> > > > this
> > > > > > > >> > > > > >> > > > > > KIP?
> > > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > > > experimental
> > > > > > > >> > results
> > > > > > > >> > > > > that
> > > > > > > >> > > > > >> > show
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > > >> > > > > >> > > > > > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > > > sufficient
> > > > > > in
> > > > > > > >> > > > addressing
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > problem
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> > priority
> > > > > > queues,
> > > > > > > >> you
> > > > > > > >> > > will
> > > > > > > >> > > > > >> > sometimes
> > > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > > >> > > > > >> > > > > > > > > have
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > case that data plane
> requests
> > > > will
> > > > > be
> > > > > > > >> ahead
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > control
> > > > > > > >> > > > > >> > > > > > plane
> > > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > > > > This happens because the
> > system
> > > > > might
> > > > > > > >> have
> > > > > > > >> > > > already
> > > > > > > >> > > > > >> > > started
> > > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > data plane requests before
> > the
> > > > > > control
> > > > > > > >> plane
> > > > > > > >> > > > ones
> > > > > > > >> > > > > >> > > arrived.
> > > > > > > >> > > > > >> > > > So
> > > > > > > >> > > > > >> > > > > > it
> > > > > > > >> > > > > >> > > > > > > > > would
> > > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > > > problem
> > > > > > this
> > > > > > > >> KIP
> > > > > > > >> > > > > >> addresses.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at
> 4:44
> > > PM,
> > > > > Ted
> > > > > > > Yu <
> > > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at
> > 8:42
> > > > AM,
> > > > > > > Lucas
> > > > > > > >> > Wang
> > > > > > > >> > > <
> > > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> > suggestion.
> > > > I've
> > > > > > > >> updated
> > > > > > > >> > > the
> > > > > > > >> > > > > KIP.
> > > > > > > >> > > > > >> > > Please
> > > > > > > >> > > > > >> > > > > > take
> > > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > > 6:34
> > > > > PM,
> > > > > > > Ted
> > > > > > > >> Yu
> > > > > > > >> > <
> > > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > > KafkaConfig.scala
> > > > > > :
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > val
> QueuedMaxRequests =
> > > 500
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if
> you
> > > can
> > > > > > > include
> > > > > > > >> > the
> > > > > > > >> > > > > >> default
> > > > > > > >> > > > > >> > > value
> > > > > > > >> > > > > >> > > > > for
> > > > > > > >> > > > > >> > > > > > > > this
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > new
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > config
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018
> at
> > > > 4:28
> > > > > > PM,
> > > > > > > >> Lucas
> > > > > > > >> > > > Wang
> > > > > > > >> > > > > <
> > > > > > > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the
> KIP
> > by
> > > > > > adding
> > > > > > > a
> > > > > > > >> new
> > > > > > > >> > > > > config,
> > > > > > > >> > > > > >> > > > instead
> > > > > > > >> > > > > >> > > > > of
> > > > > > > >> > > > > >> > > > > > > > > reusing
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > Please take another
> > > look
> > > > > when
> > > > > > > you
> > > > > > > >> > have
> > > > > > > >> > > > > time.
> > > > > > > >> > > > > >> > > > Thanks a
> > > > > > > >> > > > > >> > > > > > > lot!
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14,
> 2018
> > at
> > > > > 2:33
> > > > > > > PM,
> > > > > > > >> Ted
> > > > > > > >> > > Yu
> > > > > > > >> > > > <
> > > > > > > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a
> waste
> > of
> > > > > > > resource
> > > > > > > >> if
> > > > > > > >> > > > > control
> > > > > > > >> > > > > >> > > request
> > > > > > > >> > > > > >> > > > > > rate
> > > > > > > >> > > > > >> > > > > > > is
> > > > > > > >> > > > > >> > > > > > > > > low
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if
> > > control
> > > > > > > request
> > > > > > > >> > rate
> > > > > > > >> > > > can
> > > > > > > >> > > > > >> get
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > > > 100,000,
> > > > > > > >> > > > > >> > > > > > > > > > > likely
> > > > > > > >> > > > > >> > > > > > > > > > > > > not.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > Then
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > using the same
> > bound
> > > as
> > > > > > that
> > > > > > > >> for
> > > > > > > >> > > data
> > > > > > > >> > > > > >> > requests
> > > > > > > >> > > > > >> > > > > seems
> > > > > > > >> > > > > >> > > > > > > > high.
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > 2018
> > > at
> > > > > > 10:13
> > > > > > > >> PM,
> > > > > > > >> > > > Lucas
> > > > > > > >> > > > > >> Wang
> > > > > > > >> > > > > >> > <
> > > > > > > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for
> > taking a
> > > > > look
> > > > > > at
> > > > > > > >> this
> > > > > > > >> > > > KIP.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today
> > the
> > > > > > setting
> > > > > > > >> of
> > > > > > > >> > > > > >> > > > > > "queued.max.requests"
> > > > > > > >> > > > > >> > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > > > > cluster A
> > > > > > > >> > > > > >> > > > > > > > > > > > > > is
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > while the
> setting
> > > in
> > > > > > > cluster
> > > > > > > >> B
> > > > > > > >> > is
> > > > > > > >> > > > > >> 100,000.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times
> > > > > difference
> > > > > > > >> might
> > > > > > > >> > > have
> > > > > > > >> > > > > >> > indicated
> > > > > > > >> > > > > >> > > > > that
> > > > > > > >> > > > > >> > > > > > > > > machines
> > > > > > > >> > > > > >> > > > > > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > cluster
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > B
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > have larger
> > memory.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > > > > >> > > > "queued.max.requests",
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > > > > >> > > > > >> > > > > > > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > B
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x
> > > capacity
> > > > > > > without
> > > > > > > >> > > > > explicitly
> > > > > > > >> > > > > >> > > > bothering
> > > > > > > >> > > > > >> > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > operators.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I understand
> the
> > > > > counter
> > > > > > > >> > argument
> > > > > > > >> > > > can
> > > > > > > >> > > > > be
> > > > > > > >> > > > > >> > that
> > > > > > > >> > > > > >> > > > > maybe
> > > > > > > >> > > > > >> > > > > > > > > that's
> > > > > > > >> > > > > >> > > > > > > > > > a
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > waste
> > > > > > > >> > > > > >> > > > > > > > > > > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > resource if
> > control
> > > > > > request
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and
> > > > > operators
> > > > > > > may
> > > > > > > >> > want
> > > > > > > >> > > > to
> > > > > > > >> > > > > >> fine
> > > > > > > >> > > > > >> > > tune
> > > > > > > >> > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > capacity
> > > > > > > >> > > > > >> > > > > > > > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > controlRequestQueue.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with
> > either
> > > > > > > approach,
> > > > > > > >> and
> > > > > > > >> > > can
> > > > > > > >> > > > > >> change
> > > > > > > >> > > > > >> > > it
> > > > > > > >> > > > > >> > > > if
> > > > > > > >> > > > > >> > > > > > you
> > > > > > > >> > > > > >> > > > > > > > or
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > anyone
> > > > > > > >> > > > > >> > > > > > > > > > > > > > else
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > strong about
> > adding
> > > > the
> > > > > > > extra
> > > > > > > >> > > > config.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > > 2018
> > > > at
> > > > > > > 3:11
> > > > > > > >> PM,
> > > > > > > >> > > Ted
> > > > > > > >> > > > > Yu
> > > > > > > >> > > > > >> <
> > > > > > > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Under
> Rejected
> > > > > > > >> Alternatives,
> > > > > > > >> > #2,
> > > > > > > >> > > > can
> > > > > > > >> > > > > >> you
> > > > > > > >> > > > > >> > > > > > elaborate
> > > > > > > >> > > > > >> > > > > > > a
> > > > > > > >> > > > > >> > > > > > > > > bit
> > > > > > > >> > > > > >> > > > > > > > > > > more
> > > > > > > >> > > > > >> > > > > > > > > > > > > on
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > why
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > separate
> config
> > > has
> > > > > > > bigger
> > > > > > > >> > > impact
> > > > > > > >> > > > ?
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun
> 13,
> > > > 2018
> > > > > at
> > > > > > > >> 2:00
> > > > > > > >> > PM,
> > > > > > > >> > > > > Dong
> > > > > > > >> > > > > >> > Lin <
> > > > > > > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for
> > the
> > > > KIP.
> > > > > > > Looks
> > > > > > > >> > good
> > > > > > > >> > > > > >> overall.
> > > > > > > >> > > > > >> > > > Some
> > > > > > > >> > > > > >> > > > > > > > > comments
> > > > > > > >> > > > > >> > > > > > > > > > > > below:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We
> usually
> > > > > specify
> > > > > > > the
> > > > > > > >> > full
> > > > > > > >> > > > > mbean
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > new
> > > > > > > >> > > > > >> > > > > > > > > metrics
> > > > > > > >> > > > > >> > > > > > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it
> in
> > > the
> > > > > > > Public
> > > > > > > >> > > > Interface
> > > > > > > >> > > > > >> > > section
> > > > > > > >> > > > > >> > > > > > > similar
> > > > > > > >> > > > > >> > > > > > > > > to
> > > > > > > >> > > > > >> > > > > > > > > > > > KIP-237
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > > > >> https://cwiki.apache.org/
> > > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > 237%3A+More+Controller+Health+
> > > > > > > >> > > > > >> Metrics>
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we
> > > could
> > > > > > follow
> > > > > > > >> the
> > > > > > > >> > > same
> > > > > > > >> > > > > >> > pattern
> > > > > > > >> > > > > >> > > as
> > > > > > > >> > > > > >> > > > > > > KIP-153
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > > > >> https://cwiki.apache.org/
> > > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > >> > > > > >> > > > > > > > > > > > > metric>,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we
> keep
> > > the
> > > > > > > >> existing
> > > > > > > >> > > > sensor
> > > > > > > >> > > > > >> name
> > > > > > > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > > > > >> > > > > >> > > > > > > > > > > and
> > > > > > > >> > > > > >> > > > > > > > > > > > > add
> > > > > > > >> > > > > >> > > > > > > > > > > > > > a
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > new
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> "ReplicationBytesInPerSec",
> > > > > > > >> > > > rather
> > > > > > > >> > > > > >> than
> > > > > > > >> > > > > >> > > > > > replacing
> > > > > > > >> > > > > >> > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > sensor
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > name "
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > BytesInPerSec"
> > > > with
> > > > > > > e.g.
> > > > > > > >> > > > > >> > > > > "ClientBytesInPerSec".
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems
> > that
> > > > the
> > > > > > KIP
> > > > > > > >> > > changes
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > > semantics
> > > > > > > >> > > > > >> > > > > > of
> > > > > > > >> > > > > >> > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > broker
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > config
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > "queued.max.requests"
> > > > > > > >> > because
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > number
> > > > > > > >> > > > > >> > > of
> > > > > > > >> > > > > >> > > > > > total
> > > > > > > >> > > > > >> > > > > > > > > > > requests
> > > > > > > >> > > > > >> > > > > > > > > > > > > > queued
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will
> > be
> > > no
> > > > > > > longer
> > > > > > > >> > > bounded
> > > > > > > >> > > > > by
> > > > > > > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > > > > >> > > > > >> > > > > > > > > > > > > This
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be
> > > > > specified
> > > > > > > in
> > > > > > > >> the
> > > > > > > >> > > > > Public
> > > > > > > >> > > > > >> > > > > Interfaces
> > > > > > > >> > > > > >> > > > > > > > > section
> > > > > > > >> > > > > >> > > > > > > > > > > for
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun
> > 13,
> > > > > 2018
> > > > > > at
> > > > > > > >> > 12:45
> > > > > > > >> > > > PM,
> > > > > > > >> > > > > >> Lucas
> > > > > > > >> > > > > >> > > > Wang
> > > > > > > >> > > > > >> > > > > <
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > lucasatucla@gmail.com
> > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka
> > > > experts,
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created
> > > > KIP-291
> > > > > > to
> > > > > > > >> add a
> > > > > > > >> > > > > >> separate
> > > > > > > >> > > > > >> > > queue
> > > > > > > >> > > > > >> > > > > for
> > > > > > > >> > > > > >> > > > > > > > > > > controller
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > > >> https://cwiki.apache.org/
> > > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > 291%
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > 3A+Have+separate+queues+for+
> > > > > > > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > requests
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you
> > > please
> > > > > > take a
> > > > > > > >> look
> > > > > > > >> > > and
> > > > > > > >> > > > > >> let me
> > > > > > > >> > > > > >> > > > know
> > > > > > > >> > > > > >> > > > > > your
> > > > > > > >> > > > > >> > > > > > > > > > > feedback?
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > > >> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Regards,
> > > > > > Mayuresh R. Gharat
> > > > > > (862) 250-7125
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Joel Koshy <jj...@gmail.com>.

Good example. I think this scenario can occur in the current code as well
but with even lower probability given that there are other non-controller
requests interleaved. It is still sketchy though and I think a safer
approach would be separate queues and pinning controller request handling
to one handler thread.

On Wed, Jul 18, 2018 at 11:12 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Becket,
>
> I think you are right that there may be out-of-order processing. However,
> it seems that out-of-order processing may also happen even if we use a
> separate queue.
>
> Here is the example:
>
> - Controller sends R1 and got disconnected before receiving response. Then
> it reconnects and sends R2. Both requests now stay in the controller
> request queue in the order they are sent.
> - thread1 takes R1_a from the request queue and then thread2 takes R2 from
> the request queue almost at the same time.
> - So R1_a and R2 are processed in parallel. There is chance that R2's
> processing is completed before R1.
>
> If out-of-order processing can happen for both approaches with very low
> probability, it may not be worthwhile to add the extra queue. What do you
> think?
>
> Thanks,
> Dong
>
>
> On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Mayuresh/Joel,
> >
> > Using the request channel as a dequeue was bright up some time ago when
> we
> > initially thinking of prioritizing the request. The concern was that the
> > controller requests are supposed to be processed in order. If we can
> ensure
> > that there is one controller request in the request channel, the order is
> > not a concern. But in cases that there are more than one controller
> request
> > inserted into the queue, the controller request order may change and
> cause
> > problem. For example, think about the following sequence:
> > 1. Controller successfully sent a request R1 to broker
> > 2. Broker receives R1 and put the request to the head of the request
> queue.
> > 3. Controller to broker connection failed and the controller reconnected
> to
> > the broker.
> > 4. Controller sends a request R2 to the broker
> > 5. Broker receives R2 and add it to the head of the request queue.
> > Now on the broker side, R2 will be processed before R1 is processed,
> which
> > may cause problem.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:
> >
> > > @Mayuresh - I like your idea. It appears to be a simpler less invasive
> > > alternative and it should work. Jun/Becket/others, do you see any
> > pitfalls
> > > with this approach?
> > >
> > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > > @Mayuresh,
> > > > That's a very interesting idea that I haven't thought before.
> > > > It seems to solve our problem at hand pretty well, and also
> > > > avoids the need to have a new size metric and capacity config
> > > > for the controller request queue. In fact, if we were to adopt
> > > > this design, there is no public interface change, and we
> > > > probably don't need a KIP.
> > > > Also implementation wise, it seems
> > > > the java class LinkedBlockingQueue can readily satisfy the
> requirement
> > > > by supporting a capacity, and also allowing inserting at both ends.
> > > >
> > > > My only concern is that this design is tied to the coincidence that
> > > > we have two request priorities and there are two ends to a deque.
> > > > Hence by using the proposed design, it seems the network layer is
> > > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > > an extra priority level in the future for some reason, we would
> > probably
> > > > need to go back to the design of separate queues, one for each
> priority
> > > > level.
> > > >
> > > > In summary, I'm ok with both designs and lean toward your suggested
> > > > approach.
> > > > Let's hear what others think.
> > > >
> > > > @Becket,
> > > > In light of Mayuresh's suggested new design, I'm answering your
> > question
> > > > only in the context
> > > > of the current KIP design: I think your suggestion makes sense, and
> I'm
> > > ok
> > > > with removing the capacity config and
> > > > just relying on the default value of 20 being sufficient enough.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Lucas,
> > > > >
> > > > > Seems like the main intent here is to prioritize the controller
> > request
> > > > > over any other requests.
> > > > > In that case, we can change the request queue to a dequeue, where
> you
> > > > > always insert the normal requests (produce, consume,..etc) to the
> end
> > > of
> > > > > the dequeue, but if its a controller request, you insert it to the
> > head
> > > > of
> > > > > the queue. This ensures that the controller request will be given
> > > higher
> > > > > priority over other requests.
> > > > >
> > > > > Also since we only read one request from the socket and mute it and
> > > only
> > > > > unmute it after handling the request, this would ensure that we
> don't
> > > > > handle controller requests out of order.
> > > > >
> > > > > With this approach we can avoid the second queue and the additional
> > > > config
> > > > > for the size of the queue.
> > > > >
> > > > > What do you think ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > >
> > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hey Joel,
> > > > > >
> > > > > > Thank for the detail explanation. I agree the current design
> makes
> > > > sense.
> > > > > > My confusion is about whether the new config for the controller
> > queue
> > > > > > capacity is necessary. I cannot think of a case in which users
> > would
> > > > > change
> > > > > > it.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > >
> > > > > > > I guess my question can be rephrased to "do we expect user to
> > ever
> > > > > change
> > > > > > > the controller request queue capacity"? If we agree that 20 is
> > > > already
> > > > > a
> > > > > > > very generous default number and we do not expect user to
> change
> > > it,
> > > > is
> > > > > > it
> > > > > > > still necessary to expose this as a config?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> @Becket
> > > > > > >> 1. Thanks for the comment. You are right that normally there
> > > should
> > > > be
> > > > > > >> just
> > > > > > >> one controller request because of muting,
> > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > controller
> > > > > > >> requests.
> > > > > > >> I went through the KIP again, and I'm not sure which part
> > conveys
> > > > that
> > > > > > >> info.
> > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > >>
> > > > > > >> 2. Though it should not happen in normal conditions, the
> current
> > > > > design
> > > > > > >> does not preclude multiple controllers running
> > > > > > >> at the same time, hence if we don't have the controller queue
> > > > capacity
> > > > > > >> config and simply make its capacity to be 1,
> > > > > > >> network threads handling requests from different controllers
> > will
> > > be
> > > > > > >> blocked during those troublesome times,
> > > > > > >> which is probably not what we want. On the other hand, adding
> > the
> > > > > extra
> > > > > > >> config with a default value, say 20, guards us from issues in
> > > those
> > > > > > >> troublesome times, and IMO there isn't much downside of adding
> > the
> > > > > extra
> > > > > > >> config.
> > > > > > >>
> > > > > > >> @Mayuresh
> > > > > > >> Good catch, this sentence is an obsolete statement based on a
> > > > previous
> > > > > > >> design. I've revised the wording in the KIP.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Lucas
> > > > > > >>
> > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Hi Lucas,
> > > > > > >> >
> > > > > > >> > Thanks for the KIP.
> > > > > > >> > I am trying to understand why you think "The memory
> > consumption
> > > > can
> > > > > > rise
> > > > > > >> > given the total number of queued requests can go up to 2x"
> in
> > > the
> > > > > > impact
> > > > > > >> > section. Normally the requests from controller to a Broker
> are
> > > not
> > > > > > high
> > > > > > >> > volume, right ?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Mayuresh
> > > > > > >> >
> > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
> from
> > > the
> > > > > > data
> > > > > > >> > plane
> > > > > > >> > > makes a lot of sense.
> > > > > > >> > >
> > > > > > >> > > In the KIP you mentioned that the controller request queue
> > may
> > > > > have
> > > > > > >> many
> > > > > > >> > > requests in it. Will this be a common case? The controller
> > > > > requests
> > > > > > >> still
> > > > > > >> > > goes through the SocketServer. The SocketServer will mute
> > the
> > > > > > channel
> > > > > > >> > once
> > > > > > >> > > a request is read and put into the request channel. So
> > > assuming
> > > > > > there
> > > > > > >> is
> > > > > > >> > > only one connection between controller and each broker, on
> > the
> > > > > > broker
> > > > > > >> > side,
> > > > > > >> > > there should be only one controller request in the
> > controller
> > > > > > request
> > > > > > >> > queue
> > > > > > >> > > at any given time. If that is the case, do we need a
> > separate
> > > > > > >> controller
> > > > > > >> > > request queue capacity config? The default value 20 means
> > that
> > > > we
> > > > > > >> expect
> > > > > > >> > > there are 20 controller switches to happen in a short
> period
> > > of
> > > > > > time.
> > > > > > >> I
> > > > > > >> > am
> > > > > > >> > > not sure whether someone should increase the controller
> > > request
> > > > > > queue
> > > > > > >> > > capacity to handle such case, as it seems indicating
> > something
> > > > > very
> > > > > > >> wrong
> > > > > > >> > > has happened.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > >
> > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > lindong28@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Thanks for the update Lucas.
> > > > > > >> > > >
> > > > > > >> > > > I think the motivation section is intuitive. It will be
> > good
> > > > to
> > > > > > >> learn
> > > > > > >> > > more
> > > > > > >> > > > about the comments from other reviewers.
> > > > > > >> > > >
> > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi Dong,
> > > > > > >> > > > >
> > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > explaining
> > > > > the
> > > > > > >> > cases
> > > > > > >> > > > that
> > > > > > >> > > > > would have user impacts.
> > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Lucas
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi Dong,
> > > > > > >> > > > > >
> > > > > > >> > > > > > The simulation of disk being slow is merely for me
> to
> > > > easily
> > > > > > >> > > construct
> > > > > > >> > > > a
> > > > > > >> > > > > > testing scenario
> > > > > > >> > > > > > with a backlog of produce requests. In production,
> > other
> > > > > than
> > > > > > >> the
> > > > > > >> > > disk
> > > > > > >> > > > > > being slow, a backlog of
> > > > > > >> > > > > > produce requests may also be caused by high produce
> > QPS.
> > > > > > >> > > > > > In that case, we may not want to kill the broker and
> > > > that's
> > > > > > when
> > > > > > >> > this
> > > > > > >> > > > KIP
> > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > >> > > > > > and non-JBOD setup.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Going back to your previous question about each
> > > > > ProduceRequest
> > > > > > >> > > covering
> > > > > > >> > > > > 20
> > > > > > >> > > > > > partitions that are randomly
> > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > enqueued
> > > > > that
> > > > > > >> > tries
> > > > > > >> > > to
> > > > > > >> > > > > > switch the current broker, say broker0, from leader
> to
> > > > > > follower
> > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> > sake
> > > of
> > > > > > >> > argument,
> > > > > > >> > > > > > let's also assume the other brokers, say broker1,
> have
> > > > > > *stopped*
> > > > > > >> > > > fetching
> > > > > > >> > > > > > from
> > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
> > > (ALL)
> > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > > > > >> LeaderAndISR
> > > > > > >> > > will
> > > > > > >> > > > be
> > > > > > >> > > > > > put into the purgatory,
> > > > > > >> > > > > >         and since they'll never be replicated to
> other
> > > > > brokers
> > > > > > >> > > (because
> > > > > > >> > > > > of
> > > > > > >> > > > > > the assumption made above), they will
> > > > > > >> > > > > >         be completed either when the LeaderAndISR
> > > request
> > > > is
> > > > > > >> > > processed
> > > > > > >> > > > or
> > > > > > >> > > > > > when the timeout happens.
> > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > transition
> > > > the
> > > > > > >> > > partition
> > > > > > >> > > > > > test-0 to become a follower,
> > > > > > >> > > > > >         after the current broker sees the
> replication
> > of
> > > > the
> > > > > > >> > > remaining
> > > > > > >> > > > 19
> > > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > > >> > > > > >         it's no longer the leader for the "test-0".
> > > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
> > > let's
> > > > > say
> > > > > > >> > there
> > > > > > >> > > > are
> > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
> > > there
> > > > > are
> > > > > > 8
> > > > > > >> io
> > > > > > >> > > > > threads,
> > > > > > >> > > > > >   so each io thread will process approximately 3000
> > > > produce
> > > > > > >> > requests.
> > > > > > >> > > > Now
> > > > > > >> > > > > > let's investigate the io thread that finally
> processed
> > > the
> > > > > > >> > > > LeaderAndISR.
> > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> time
> > > when
> > > > > > their
> > > > > > >> > > > > remaining
> > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > > > > LeaderAndISR
> > > > > > >> > > > request
> > > > > > >> > > > > is
> > > > > > >> > > > > > processed at time t3000.
> > > > > > >> > > > > >   Without this KIP, the 1st produce request would
> have
> > > > > waited
> > > > > > an
> > > > > > >> > > extra
> > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> > time
> > > of
> > > > > > >> t3000 -
> > > > > > >> > > t1,
> > > > > > >> > > > > etc.
> > > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
> > for
> > > > the
> > > > > > >> > earlier
> > > > > > >> > > > > > produce requests than for the later ones. For the
> same
> > > > > reason,
> > > > > > >> the
> > > > > > >> > > more
> > > > > > >> > > > > > ProduceRequests queued
> > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> > > > (capped
> > > > > > by
> > > > > > >> the
> > > > > > >> > > > > > produce timeout).
> > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > > acks=1
> > > > > > >> > > > > >   There will be no latency differences in this case,
> > but
> > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > test-0
> > > in
> > > > > the
> > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > > appended
> > > > > to
> > > > > > >> the
> > > > > > >> > > local
> > > > > > >> > > > > log,
> > > > > > >> > > > > >         and eventually be truncated after processing
> > the
> > > > > > >> > > LeaderAndISR.
> > > > > > >> > > > > > This is what's referred to as
> > > > > > >> > > > > >         "some unofficial definition of data loss in
> > > terms
> > > > of
> > > > > > >> > messages
> > > > > > >> > > > > > beyond the high watermark".
> > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> since
> > if
> > > > the
> > > > > > >> > > > LeaderAndISR
> > > > > > >> > > > > > is immediately processed, the response to producers
> > will
> > > > > have
> > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > producers
> > > > > to
> > > > > > >> retry
> > > > > > >> > > > > >
> > > > > > >> > > > > > This explanation above is the benefit for reducing
> the
> > > > > latency
> > > > > > >> of a
> > > > > > >> > > > > broker
> > > > > > >> > > > > > becoming the follower,
> > > > > > >> > > > > > closely related is reducing the latency of a broker
> > > > becoming
> > > > > > the
> > > > > > >> > > > leader.
> > > > > > >> > > > > > In this case, the benefit is even more obvious, if
> > other
> > > > > > brokers
> > > > > > >> > have
> > > > > > >> > > > > > resigned leadership, and the
> > > > > > >> > > > > > current broker should take leadership. Any delay in
> > > > > processing
> > > > > > >> the
> > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > >> > > > > > by clients as unavailability. In extreme cases, this
> > can
> > > > > cause
> > > > > > >> > failed
> > > > > > >> > > > > > produce requests if the retries are
> > > > > > >> > > > > > exhausted.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Another two types of controller requests are
> > > > UpdateMetadata
> > > > > > and
> > > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> means
> > > > > clients
> > > > > > >> > > receiving
> > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > > >> > > > > > for certain partitions, and the effect is more
> retries
> > > or
> > > > > even
> > > > > > >> > fatal
> > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > >> > > > > >
> > > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > > degrade
> > > > > the
> > > > > > >> > > > performance
> > > > > > >> > > > > > of topic deletion.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > >> > DescribeLogDirsRequest,
> > > > > > >> > > > you
> > > > > > >> > > > > > are right
> > > > > > >> > > > > > that this KIP cannot help with the latency in
> getting
> > > the
> > > > > log
> > > > > > >> dirs
> > > > > > >> > > > info,
> > > > > > >> > > > > > and it's only relevant
> > > > > > >> > > > > > when controller requests are involved.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regards,
> > > > > > >> > > > > > Lucas
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > lindong28@gmail.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > >> Hey Jun,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Thanks much for the comments. It is good point. So
> > the
> > > > > > feature
> > > > > > >> may
> > > > > > >> > > be
> > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> below.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Hey Lucas,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Do you think this feature is also useful for
> non-JBOD
> > > > setup
> > > > > > or
> > > > > > >> it
> > > > > > >> > is
> > > > > > >> > > > > only
> > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > understand
> > > > > > this.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> When the broker is setup using JBOD, in order to
> move
> > > > > leaders
> > > > > > >> on
> > > > > > >> > the
> > > > > > >> > > > > >> failed
> > > > > > >> > > > > >> disk to other disks, the system operator first
> needs
> > to
> > > > get
> > > > > > the
> > > > > > >> > list
> > > > > > >> > > > of
> > > > > > >> > > > > >> partitions on the failed disk. This is currently
> > > achieved
> > > > > > using
> > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > >> DescribeLogDirsRequest
> > > > > > >> > to
> > > > > > >> > > > the
> > > > > > >> > > > > >> broker. If we only prioritize the controller
> > requests,
> > > > then
> > > > > > the
> > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > >> > > > > >> may still take a long time to be processed by the
> > > broker.
> > > > > So
> > > > > > >> the
> > > > > > >> > > > overall
> > > > > > >> > > > > >> time to move leaders away from the failed disk may
> > > still
> > > > be
> > > > > > >> long
> > > > > > >> > > even
> > > > > > >> > > > > with
> > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Thanks,
> > > > > > >> > > > > >> Dong
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > >> lucasatucla@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > @Dong,
> > > > > > >> > > > > >> > Since both of the two comments in your previous
> > email
> > > > are
> > > > > > >> about
> > > > > > >> > > the
> > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
> > > this
> > > > > KIP
> > > > > > >> can
> > > > > > >> > be
> > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > Regards,
> > > > > > >> > > > > >> > Lucas
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > jun@confluent.io>
> > > > > > >> > wrote:
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
> > > > should
> > > > > > just
> > > > > > >> > kill
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
> > > only
> > > > > one
> > > > > > of
> > > > > > >> > the
> > > > > > >> > > > > disks
> > > > > > >> > > > > >> on
> > > > > > >> > > > > >> > a
> > > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
> > and
> > > > move
> > > > > > the
> > > > > > >> > > > leaders
> > > > > > >> > > > > on
> > > > > > >> > > > > >> > that
> > > > > > >> > > > > >> > > disk to other brokers. In that case, being able
> > to
> > > > > > process
> > > > > > >> the
> > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > >> > > > > >> > > requests faster will potentially help the
> > producers
> > > > > > recover
> > > > > > >> > > > quicker.
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Thanks,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Jun
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > >> lindong28@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> questions
> > > > below.
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > > > > partitions
> > > > > > >> > that
> > > > > > >> > > > are
> > > > > > >> > > > > >> > > randomly
> > > > > > >> > > > > >> > > > distributed across all partitions, then each
> > > > > > >> ProduceRequest
> > > > > > >> > > will
> > > > > > >> > > > > >> likely
> > > > > > >> > > > > >> > > > cover some partitions for which the broker is
> > > still
> > > > > > >> leader
> > > > > > >> > > after
> > > > > > >> > > > > it
> > > > > > >> > > > > >> > > quickly
> > > > > > >> > > > > >> > > > processes the
> > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still
> be
> > > slow
> > > > > in
> > > > > > >> > > > processing
> > > > > > >> > > > > >> these
> > > > > > >> > > > > >> > > > ProduceRequest and request will still be very
> > > high
> > > > > with
> > > > > > >> this
> > > > > > >> > > > KIP.
> > > > > > >> > > > > It
> > > > > > >> > > > > >> > > seems
> > > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> > after
> > > > 30
> > > > > > >> > seconds.
> > > > > > >> > > Is
> > > > > > >> > > > > >> this
> > > > > > >> > > > > >> > > > understanding correct?
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> still
> > > > > timeout
> > > > > > >> after
> > > > > > >> > > 30
> > > > > > >> > > > > >> > seconds,
> > > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > > average
> > > > > > >> produce
> > > > > > >> > > > > latency.
> > > > > > >> > > > > >> Can
> > > > > > >> > > > > >> > > you
> > > > > > >> > > > > >> > > > clarify what metrics can be improved by this
> > KIP?
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > > number
> > > > of
> > > > > > >> > > truncated
> > > > > > >> > > > > >> > messages.
> > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > throughput
> > > > > or
> > > > > > >> > reduce
> > > > > > >> > > > > >> message
> > > > > > >> > > > > >> > > > duplication? It will be good to understand
> > this.
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Thanks,
> > > > > > >> > > > > >> > > > Dong
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > > >> > > lucasatucla@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > >> > wrote:
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
> see
> > > my
> > > > > > reply
> > > > > > >> > > below.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
> > Now
> > > > > let's
> > > > > > >> > > consider
> > > > > > >> > > > a
> > > > > > >> > > > > >> more
> > > > > > >> > > > > >> > > > common
> > > > > > >> > > > > >> > > > > scenario
> > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > partitions.
> > > > And
> > > > > > >> let's
> > > > > > >> > > say
> > > > > > >> > > > > for
> > > > > > >> > > > > >> > some
> > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > >> > > > > >> > > > > The number of leader partitions on broker0
> is
> > > so
> > > > > > large,
> > > > > > >> > say
> > > > > > >> > > > 10K,
> > > > > > >> > > > > >> that
> > > > > > >> > > > > >> > > the
> > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > > leadership
> > > > > > >> for a
> > > > > > >> > > lot
> > > > > > >> > > > of
> > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > >> > > > > >> > > > > either manually or through some service
> like
> > > > cruise
> > > > > > >> > control.
> > > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > > > > transitions
> > > > > > >> > > finish
> > > > > > >> > > > > >> more
> > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> becoming
> > > more
> > > > > > >> > balanced,
> > > > > > >> > > > > >> > > > > but all existing producers corresponding to
> > the
> > > > 9K
> > > > > > >> > > partitions
> > > > > > >> > > > > will
> > > > > > >> > > > > >> > get
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> thanks
> > to
> > > > the
> > > > > > >> > batched
> > > > > > >> > > > > async
> > > > > > >> > > > > >> ZK
> > > > > > >> > > > > >> > > > > operations.
> > > > > > >> > > > > >> > > > > To me it's a useful feature to have during
> > such
> > > > > > >> > troublesome
> > > > > > >> > > > > times.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> > shown
> > > > > that
> > > > > > >> with
> > > > > > >> > > this
> > > > > > >> > > > > KIP
> > > > > > >> > > > > >> > many
> > > > > > >> > > > > >> > > > > producers
> > > > > > >> > > > > >> > > > > receive an explicit error
> > > NotLeaderForPartition,
> > > > > > based
> > > > > > >> on
> > > > > > >> > > > which
> > > > > > >> > > > > >> they
> > > > > > >> > > > > >> > > > retry
> > > > > > >> > > > > >> > > > > immediately.
> > > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> > retry)
> > > > for
> > > > > > >> their
> > > > > > >> > > > single
> > > > > > >> > > > > >> > > message
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > much smaller
> > > > > > >> > > > > >> > > > > compared with the case of timing out
> without
> > > the
> > > > > KIP
> > > > > > >> (30
> > > > > > >> > > > seconds
> > > > > > >> > > > > >> for
> > > > > > >> > > > > >> > > > timing
> > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > >> > > > > >> > > > > One might argue that reducing the timing
> out
> > on
> > > > the
> > > > > > >> > producer
> > > > > > >> > > > > side
> > > > > > >> > > > > >> can
> > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > drawbacks[1].
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > > number
> > > > of
> > > > > > >> > > truncated
> > > > > > >> > > > > >> > messages
> > > > > > >> > > > > >> > > on
> > > > > > >> > > > > >> > > > > brokers,
> > > > > > >> > > > > >> > > > > with the experiments done in the Google
> Doc,
> > it
> > > > > > should
> > > > > > >> be
> > > > > > >> > > easy
> > > > > > >> > > > > to
> > > > > > >> > > > > >> see
> > > > > > >> > > > > >> > > > that
> > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > up-to-date
> > > > > > >> metadata
> > > > > > >> > > > avoids
> > > > > > >> > > > > >> > > appending
> > > > > > >> > > > > >> > > > > of messages
> > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
> > to a
> > > > > > system
> > > > > > >> > > > operator
> > > > > > >> > > > > >> and
> > > > > > >> > > > > >> > ask
> > > > > > >> > > > > >> > > > > whether
> > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
> > > likely
> > > > > the
> > > > > > >> > answer
> > > > > > >> > > > is
> > > > > > >> > > > > >> yes.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> might
> > be
> > > > > > >> helpful to
> > > > > > >> > > > > >> construct
> > > > > > >> > > > > >> > > some
> > > > > > >> > > > > >> > > > > formulas.
> > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
> > the
> > > > > case
> > > > > > >> where
> > > > > > >> > > > there
> > > > > > >> > > > > >> is
> > > > > > >> > > > > >> > > only
> > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > >> > > > > >> > > > > Following the experiments in the Google
> Doc,
> > > > let's
> > > > > > say
> > > > > > >> > > broker0
> > > > > > >> > > > > >> > becomes
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > requests
> > > > in
> > > > > > its
> > > > > > >> > > > request
> > > > > > >> > > > > >> > queue.
> > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> this
> > > KIP,
> > > > > > >> broker0
> > > > > > >> > > can
> > > > > > >> > > > > >> reply
> > > > > > >> > > > > >> > > with
> > > > > > >> > > > > >> > > > an
> > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> processing
> > > > time
> > > > > of
> > > > > > >> > > replying
> > > > > > >> > > > > >> with
> > > > > > >> > > > > >> > > such
> > > > > > >> > > > > >> > > > an
> > > > > > >> > > > > >> > > > > error message.
> > > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > > append
> > > > > > >> messages
> > > > > > >> > to
> > > > > > >> > > > > >> > segments,
> > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> processing
> > > > time
> > > > > > for
> > > > > > >> > such
> > > > > > >> > > > > logic.
> > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > without
> > > > > this
> > > > > > >> KIP
> > > > > > >> > is
> > > > > > >> > > N
> > > > > > >> > > > *
> > > > > > >> > > > > >> (M2 -
> > > > > > >> > > > > >> > > > M1) /
> > > > > > >> > > > > >> > > > > 2.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> than
> > > M1,
> > > > > > which
> > > > > > >> > means
> > > > > > >> > > > as
> > > > > > >> > > > > >> long
> > > > > > >> > > > > >> > > as N
> > > > > > >> > > > > >> > > > > is positive,
> > > > > > >> > > > > >> > > > > we would see improvements on the average
> > > latency.
> > > > > > >> > > > > >> > > > > There does not need to be significant
> backlog
> > > of
> > > > > > >> requests
> > > > > > >> > in
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > request
> > > > > > >> > > > > >> > > > > queue,
> > > > > > >> > > > > >> > > > > or severe degradation of disk performance
> to
> > > have
> > > > > the
> > > > > > >> > > > > improvement.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Regards,
> > > > > > >> > > > > >> > > > > Lucas
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
> the
> > > > > > producer
> > > > > > >> > side
> > > > > > >> > > > can
> > > > > > >> > > > > >> > trigger
> > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > overloaded,
> > > > > > >> > > > exacerbating
> > > > > > >> > > > > >> the
> > > > > > >> > > > > >> > > > > situation.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > > > > >> > > lindong28@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > >> > wrote:
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> documentation
> > of
> > > > the
> > > > > > >> > > > experiment.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Initially I also think having a separate
> > > queue
> > > > > for
> > > > > > >> > > > controller
> > > > > > >> > > > > >> > > requests
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > > summary
> > > > > > >> section
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > > >> > Google
> > > > > > >> > > > > >> > > > > doc,
> > > > > > >> > > > > >> > > > > > controller requests are generally more
> > > > important
> > > > > > than
> > > > > > >> > data
> > > > > > >> > > > > >> requests
> > > > > > >> > > > > >> > > and
> > > > > > >> > > > > >> > > > > we
> > > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > > processed
> > > > > > >> > sooner.
> > > > > > >> > > > But
> > > > > > >> > > > > >> then
> > > > > > >> > > > > >> > > Eno
> > > > > > >> > > > > >> > > > > has
> > > > > > >> > > > > >> > > > > > two very good questions which I am not
> sure
> > > the
> > > > > > >> Google
> > > > > > >> > doc
> > > > > > >> > > > has
> > > > > > >> > > > > >> > > answered
> > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > following
> > > > > > >> questions?
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> actual
> > > > > benefit
> > > > > > of
> > > > > > >> > > > KIP-291
> > > > > > >> > > > > to
> > > > > > >> > > > > >> > > users.
> > > > > > >> > > > > >> > > > > The
> > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > simulates
> > > > the
> > > > > > >> > scenario
> > > > > > >> > > > that
> > > > > > >> > > > > >> > broker
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
> > e.g.
> > > > > slow
> > > > > > >> disk.
> > > > > > >> > > It
> > > > > > >> > > > > >> > currently
> > > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
> But
> > > in
> > > > > the
> > > > > > >> > common
> > > > > > >> > > > > >> scenario,
> > > > > > >> > > > > >> > > it
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > probably reasonable to assume that there
> > are
> > > > many
> > > > > > >> other
> > > > > > >> > > > > >> partitions
> > > > > > >> > > > > >> > > that
> > > > > > >> > > > > >> > > > > are
> > > > > > >> > > > > >> > > > > > also actively produced to and
> > ProduceRequest
> > > to
> > > > > > these
> > > > > > >> > > > > partition
> > > > > > >> > > > > >> > also
> > > > > > >> > > > > >> > > > > takes
> > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even
> if
> > > > > broker0
> > > > > > >> can
> > > > > > >> > > > become
> > > > > > >> > > > > >> > > follower
> > > > > > >> > > > > >> > > > > for
> > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> > needs
> > > > to
> > > > > > >> process
> > > > > > >> > > the
> > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > ProduceRequests
> > > > > > >> > cover
> > > > > > >> > > > > other
> > > > > > >> > > > > >> > > > > partitions.
> > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> timeout
> > > > after
> > > > > > 30
> > > > > > >> > > seconds
> > > > > > >> > > > > and
> > > > > > >> > > > > >> > most
> > > > > > >> > > > > >> > > > > > clients will still likely timeout after
> 30
> > > > > seconds.
> > > > > > >> Then
> > > > > > >> > > it
> > > > > > >> > > > is
> > > > > > >> > > > > >> not
> > > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> > since
> > > > > > client
> > > > > > >> > will
> > > > > > >> > > > > >> timeout
> > > > > > >> > > > > >> > > after
> > > > > > >> > > > > >> > > > > 30
> > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> > > > broker1,
> > > > > > >> with
> > > > > > >> > or
> > > > > > >> > > > > >> without
> > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> specific
> > > > > > benefits
> > > > > > >> of
> > > > > > >> > > this
> > > > > > >> > > > > >> KIP to
> > > > > > >> > > > > >> > > > user
> > > > > > >> > > > > >> > > > > or
> > > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
> KIP
> > > > > > decreases
> > > > > > >> > > > average
> > > > > > >> > > > > >> > > latency,
> > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > exception
> > > > > > >> exposed
> > > > > > >> > to
> > > > > > >> > > > > >> client
> > > > > > >> > > > > >> > > etc.
> > > > > > >> > > > > >> > > > It
> > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > experience
> > > > > only
> > > > > > >> when
> > > > > > >> > > > there
> > > > > > >> > > > > is
> > > > > > >> > > > > >> > > issue
> > > > > > >> > > > > >> > > > > with
> > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > > request
> > > > > > queue
> > > > > > >> > due
> > > > > > >> > > to
> > > > > > >> > > > > >> slow
> > > > > > >> > > > > >> > > disk
> > > > > > >> > > > > >> > > > as
> > > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
> KIP
> > > > also
> > > > > > >> useful
> > > > > > >> > > when
> > > > > > >> > > > > >> there
> > > > > > >> > > > > >> > is
> > > > > > >> > > > > >> > > > no
> > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> > > > helpful
> > > > > > to
> > > > > > >> > > clarify
> > > > > > >> > > > > >> this
> > > > > > >> > > > > >> > to
> > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > >> > > > > >> > > > > > Dong
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> > Wang <
> > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > > > wrote:
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > experiment
> > > > > > >> results.
> > > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > > achieved
> > > > > by
> > > > > > >> > > > > implementing
> > > > > > >> > > > > >> > the
> > > > > > >> > > > > >> > > > > > proposed
> > > > > > >> > > > > >> > > > > > > change:
> > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > >> > > > > >> > > > > > > Please take a look when you have time
> and
> > > let
> > > > > me
> > > > > > >> know
> > > > > > >> > > your
> > > > > > >> > > > > >> > > feedback.
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> Harsha <
> > > > > > >> > > kafka@harsha.io>
> > > > > > >> > > > > >> wrote:
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> > look
> > > > > might
> > > > > > >> suit
> > > > > > >> > > our
> > > > > > >> > > > > >> > > > requirements
> > > > > > >> > > > > >> > > > > > > > better.
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> > Lucas
> > > > > Wang <
> > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > replication
> > > > > > >> quota
> > > > > > >> > > > > mechanism
> > > > > > >> > > > > >> > > > proposed
> > > > > > >> > > > > >> > > > > > in
> > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > scenario.
> > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > > Harsha <
> > > > > > >> > > > > kafka@harsha.io
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > > > wrote:
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts
> on
> > > > making
> > > > > > >> this
> > > > > > >> > > > > >> configurable
> > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> > > > requests
> > > > > > to
> > > > > > >> be
> > > > > > >> > > > > >> > prioritized.
> > > > > > >> > > > > >> > > > For
> > > > > > >> > > > > >> > > > > > > > example
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
> > > take
> > > > > out
> > > > > > a
> > > > > > >> > > broker
> > > > > > >> > > > > and
> > > > > > >> > > > > >> > bring
> > > > > > >> > > > > >> > > > new
> > > > > > >> > > > > >> > > > > > one
> > > > > > >> > > > > >> > > > > > > > it
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> > have
> > > > lot
> > > > > of
> > > > > > >> > fetch
> > > > > > >> > > > > >> requests
> > > > > > >> > > > > >> > to
> > > > > > >> > > > > >> > > > > other
> > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> > > > effect
> > > > > > the
> > > > > > >> > > > > >> > > application/client
> > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > > > > solution
> > > > > > to
> > > > > > >> > > > > >> de-prioritize
> > > > > > >> > > > > >> > > if
> > > > > > >> > > > > >> > > > a
> > > > > > >> > > > > >> > > > > > new
> > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > requests,
> > > we
> > > > > are
> > > > > > >> ok
> > > > > > >> > > with
> > > > > > >> > > > > the
> > > > > > >> > > > > >> > > replica
> > > > > > >> > > > > >> > > > > to
> > > > > > >> > > > > >> > > > > > be
> > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> should
> > > > > > prioritize
> > > > > > >> > the
> > > > > > >> > > > > client
> > > > > > >> > > > > >> > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35
> AM
> > > > Lucas
> > > > > > Wang
> > > > > > >> > > wrote:
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> > feature
> > > > > yet,
> > > > > > >> so no
> > > > > > >> > > > > >> > experimental
> > > > > > >> > > > > >> > > > > > results
> > > > > > >> > > > > >> > > > > > > > so
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in
> the
> > > > > > following
> > > > > > >> > days.
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
> > the
> > > > > > >> priority
> > > > > > >> > > queue
> > > > > > >> > > > > >> does
> > > > > > >> > > > > >> > not
> > > > > > >> > > > > >> > > > > > > > completely
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> > ahead
> > > > of
> > > > > > >> > > controller
> > > > > > >> > > > > >> > requests.
> > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> > > > greatly
> > > > > > >> > mitigate
> > > > > > >> > > > the
> > > > > > >> > > > > >> > effect
> > > > > > >> > > > > >> > > > of
> > > > > > >> > > > > >> > > > > > > stable
> > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out
> and
> > > post
> > > > > the
> > > > > > >> > > results
> > > > > > >> > > > > >> when I
> > > > > > >> > > > > >> > > have
> > > > > > >> > > > > >> > > > > it.
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44
> AM,
> > > Eno
> > > > > > >> Thereska
> > > > > > >> > <
> > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just
> had a
> > > > look
> > > > > at
> > > > > > >> > this.
> > > > > > >> > > A
> > > > > > >> > > > > >> couple
> > > > > > >> > > > > >> > of
> > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> > > > change
> > > > > > >> after
> > > > > > >> > > > > >> implementing
> > > > > > >> > > > > >> > > > this
> > > > > > >> > > > > >> > > > > > KIP?
> > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > > experimental
> > > > > > >> > results
> > > > > > >> > > > > that
> > > > > > >> > > > > >> > show
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > >> > > > > >> > > > > > > > > of
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > > sufficient
> > > > > in
> > > > > > >> > > > addressing
> > > > > > >> > > > > >> the
> > > > > > >> > > > > >> > > > > problem
> > > > > > >> > > > > >> > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> priority
> > > > > queues,
> > > > > > >> you
> > > > > > >> > > will
> > > > > > >> > > > > >> > sometimes
> > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > >> > > > > >> > > > > > > > > have
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
> > > will
> > > > be
> > > > > > >> ahead
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > control
> > > > > > >> > > > > >> > > > > > plane
> > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > > > > This happens because the
> system
> > > > might
> > > > > > >> have
> > > > > > >> > > > already
> > > > > > >> > > > > >> > > started
> > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > data plane requests before
> the
> > > > > control
> > > > > > >> plane
> > > > > > >> > > > ones
> > > > > > >> > > > > >> > > arrived.
> > > > > > >> > > > > >> > > > So
> > > > > > >> > > > > >> > > > > > it
> > > > > > >> > > > > >> > > > > > > > > would
> > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > > problem
> > > > > this
> > > > > > >> KIP
> > > > > > >> > > > > >> addresses.
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
> > PM,
> > > > Ted
> > > > > > Yu <
> > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at
> 8:42
> > > AM,
> > > > > > Lucas
> > > > > > >> > Wang
> > > > > > >> > > <
> > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> suggestion.
> > > I've
> > > > > > >> updated
> > > > > > >> > > the
> > > > > > >> > > > > KIP.
> > > > > > >> > > > > >> > > Please
> > > > > > >> > > > > >> > > > > > take
> > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > 6:34
> > > > PM,
> > > > > > Ted
> > > > > > >> Yu
> > > > > > >> > <
> > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > KafkaConfig.scala
> > > > > :
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
> > 500
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
> > can
> > > > > > include
> > > > > > >> > the
> > > > > > >> > > > > >> default
> > > > > > >> > > > > >> > > value
> > > > > > >> > > > > >> > > > > for
> > > > > > >> > > > > >> > > > > > > > this
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > new
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > config
> > > > > > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > > 4:28
> > > > > PM,
> > > > > > >> Lucas
> > > > > > >> > > > Wang
> > > > > > >> > > > > <
> > > > > > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP
> by
> > > > > adding
> > > > > > a
> > > > > > >> new
> > > > > > >> > > > > config,
> > > > > > >> > > > > >> > > > instead
> > > > > > >> > > > > >> > > > > of
> > > > > > >> > > > > >> > > > > > > > > reusing
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > Please take another
> > look
> > > > when
> > > > > > you
> > > > > > >> > have
> > > > > > >> > > > > time.
> > > > > > >> > > > > >> > > > Thanks a
> > > > > > >> > > > > >> > > > > > > lot!
> > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018
> at
> > > > 2:33
> > > > > > PM,
> > > > > > >> Ted
> > > > > > >> > > Yu
> > > > > > >> > > > <
> > > > > > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste
> of
> > > > > > resource
> > > > > > >> if
> > > > > > >> > > > > control
> > > > > > >> > > > > >> > > request
> > > > > > >> > > > > >> > > > > > rate
> > > > > > >> > > > > >> > > > > > > is
> > > > > > >> > > > > >> > > > > > > > > low
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if
> > control
> > > > > > request
> > > > > > >> > rate
> > > > > > >> > > > can
> > > > > > >> > > > > >> get
> > > > > > >> > > > > >> > to
> > > > > > >> > > > > >> > > > > > > 100,000,
> > > > > > >> > > > > >> > > > > > > > > > > likely
> > > > > > >> > > > > >> > > > > > > > > > > > > not.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > Then
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > using the same
> bound
> > as
> > > > > that
> > > > > > >> for
> > > > > > >> > > data
> > > > > > >> > > > > >> > requests
> > > > > > >> > > > > >> > > > > seems
> > > > > > >> > > > > >> > > > > > > > high.
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13,
> 2018
> > at
> > > > > 10:13
> > > > > > >> PM,
> > > > > > >> > > > Lucas
> > > > > > >> > > > > >> Wang
> > > > > > >> > > > > >> > <
> > > > > > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for
> taking a
> > > > look
> > > > > at
> > > > > > >> this
> > > > > > >> > > > KIP.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today
> the
> > > > > setting
> > > > > > >> of
> > > > > > >> > > > > >> > > > > > "queued.max.requests"
> > > > > > >> > > > > >> > > > > > > in
> > > > > > >> > > > > >> > > > > > > > > > > > cluster A
> > > > > > >> > > > > >> > > > > > > > > > > > > > is
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > while the setting
> > in
> > > > > > cluster
> > > > > > >> B
> > > > > > >> > is
> > > > > > >> > > > > >> 100,000.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times
> > > > difference
> > > > > > >> might
> > > > > > >> > > have
> > > > > > >> > > > > >> > indicated
> > > > > > >> > > > > >> > > > > that
> > > > > > >> > > > > >> > > > > > > > > machines
> > > > > > >> > > > > >> > > > > > > > > > > in
> > > > > > >> > > > > >> > > > > > > > > > > > > > > cluster
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > B
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > have larger
> memory.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > > > >> > > > "queued.max.requests",
> > > > > > >> > > > > >> the
> > > > > > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > > > >> > > > > >> > > > > > > > > > > > in
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > B
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x
> > capacity
> > > > > > without
> > > > > > >> > > > > explicitly
> > > > > > >> > > > > >> > > > bothering
> > > > > > >> > > > > >> > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > operators.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I understand the
> > > > counter
> > > > > > >> > argument
> > > > > > >> > > > can
> > > > > > >> > > > > be
> > > > > > >> > > > > >> > that
> > > > > > >> > > > > >> > > > > maybe
> > > > > > >> > > > > >> > > > > > > > > that's
> > > > > > >> > > > > >> > > > > > > > > > a
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > waste
> > > > > > >> > > > > >> > > > > > > > > > > > > > of
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > resource if
> control
> > > > > request
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and
> > > > operators
> > > > > > may
> > > > > > >> > want
> > > > > > >> > > > to
> > > > > > >> > > > > >> fine
> > > > > > >> > > > > >> > > tune
> > > > > > >> > > > > >> > > > > the
> > > > > > >> > > > > >> > > > > > > > > > capacity
> > > > > > >> > > > > >> > > > > > > > > > > of
> > > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > controlRequestQueue.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with
> either
> > > > > > approach,
> > > > > > >> and
> > > > > > >> > > can
> > > > > > >> > > > > >> change
> > > > > > >> > > > > >> > > it
> > > > > > >> > > > > >> > > > if
> > > > > > >> > > > > >> > > > > > you
> > > > > > >> > > > > >> > > > > > > > or
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > anyone
> > > > > > >> > > > > >> > > > > > > > > > > > > > else
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > strong about
> adding
> > > the
> > > > > > extra
> > > > > > >> > > > config.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > 2018
> > > at
> > > > > > 3:11
> > > > > > >> PM,
> > > > > > >> > > Ted
> > > > > > >> > > > > Yu
> > > > > > >> > > > > >> <
> > > > > > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> > > > > > >> Alternatives,
> > > > > > >> > #2,
> > > > > > >> > > > can
> > > > > > >> > > > > >> you
> > > > > > >> > > > > >> > > > > > elaborate
> > > > > > >> > > > > >> > > > > > > a
> > > > > > >> > > > > >> > > > > > > > > bit
> > > > > > >> > > > > >> > > > > > > > > > > more
> > > > > > >> > > > > >> > > > > > > > > > > > > on
> > > > > > >> > > > > >> > > > > > > > > > > > > > > why
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > separate config
> > has
> > > > > > bigger
> > > > > > >> > > impact
> > > > > > >> > > > ?
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > > 2018
> > > > at
> > > > > > >> 2:00
> > > > > > >> > PM,
> > > > > > >> > > > > Dong
> > > > > > >> > > > > >> > Lin <
> > > > > > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for
> the
> > > KIP.
> > > > > > Looks
> > > > > > >> > good
> > > > > > >> > > > > >> overall.
> > > > > > >> > > > > >> > > > Some
> > > > > > >> > > > > >> > > > > > > > > comments
> > > > > > >> > > > > >> > > > > > > > > > > > below:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually
> > > > specify
> > > > > > the
> > > > > > >> > full
> > > > > > >> > > > > mbean
> > > > > > >> > > > > >> for
> > > > > > >> > > > > >> > > the
> > > > > > >> > > > > >> > > > > new
> > > > > > >> > > > > >> > > > > > > > > metrics
> > > > > > >> > > > > >> > > > > > > > > > > in
> > > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in
> > the
> > > > > > Public
> > > > > > >> > > > Interface
> > > > > > >> > > > > >> > > section
> > > > > > >> > > > > >> > > > > > > similar
> > > > > > >> > > > > >> > > > > > > > > to
> > > > > > >> > > > > >> > > > > > > > > > > > KIP-237
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > > >> https://cwiki.apache.org/
> > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > 237%3A+More+Controller+Health+
> > > > > > >> > > > > >> Metrics>
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we
> > could
> > > > > follow
> > > > > > >> the
> > > > > > >> > > same
> > > > > > >> > > > > >> > pattern
> > > > > > >> > > > > >> > > as
> > > > > > >> > > > > >> > > > > > > KIP-153
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > > >> https://cwiki.apache.org/
> > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > >> > > > > >> > > > > > > > > > > > > metric>,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep
> > the
> > > > > > >> existing
> > > > > > >> > > > sensor
> > > > > > >> > > > > >> name
> > > > > > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > > > >> > > > > >> > > > > > > > > > > and
> > > > > > >> > > > > >> > > > > > > > > > > > > add
> > > > > > >> > > > > >> > > > > > > > > > > > > > a
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > new
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> "ReplicationBytesInPerSec",
> > > > > > >> > > > rather
> > > > > > >> > > > > >> than
> > > > > > >> > > > > >> > > > > > replacing
> > > > > > >> > > > > >> > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > sensor
> > > > > > >> > > > > >> > > > > > > > > > > > > > > name "
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> BytesInPerSec"
> > > with
> > > > > > e.g.
> > > > > > >> > > > > >> > > > > "ClientBytesInPerSec".
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems
> that
> > > the
> > > > > KIP
> > > > > > >> > > changes
> > > > > > >> > > > > the
> > > > > > >> > > > > >> > > > semantics
> > > > > > >> > > > > >> > > > > > of
> > > > > > >> > > > > >> > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > broker
> > > > > > >> > > > > >> > > > > > > > > > > > > > > config
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > "queued.max.requests"
> > > > > > >> > because
> > > > > > >> > > > the
> > > > > > >> > > > > >> > number
> > > > > > >> > > > > >> > > of
> > > > > > >> > > > > >> > > > > > total
> > > > > > >> > > > > >> > > > > > > > > > > requests
> > > > > > >> > > > > >> > > > > > > > > > > > > > queued
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > in
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will
> be
> > no
> > > > > > longer
> > > > > > >> > > bounded
> > > > > > >> > > > > by
> > > > > > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > > > >> > > > > >> > > > > > > > > > > > > This
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be
> > > > specified
> > > > > > in
> > > > > > >> the
> > > > > > >> > > > > Public
> > > > > > >> > > > > >> > > > > Interfaces
> > > > > > >> > > > > >> > > > > > > > > section
> > > > > > >> > > > > >> > > > > > > > > > > for
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun
> 13,
> > > > 2018
> > > > > at
> > > > > > >> > 12:45
> > > > > > >> > > > PM,
> > > > > > >> > > > > >> Lucas
> > > > > > >> > > > > >> > > > Wang
> > > > > > >> > > > > >> > > > > <
> > > > > > >> > > > > >> > > > > > > > > > > > > > > >
> lucasatucla@gmail.com
> > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka
> > > experts,
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created
> > > KIP-291
> > > > > to
> > > > > > >> add a
> > > > > > >> > > > > >> separate
> > > > > > >> > > > > >> > > queue
> > > > > > >> > > > > >> > > > > for
> > > > > > >> > > > > >> > > > > > > > > > > controller
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > >> https://cwiki.apache.org/
> > > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > 291%
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > >> > 3A+Have+separate+queues+for+
> > > > > > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > requests
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you
> > please
> > > > > take a
> > > > > > >> look
> > > > > > >> > > and
> > > > > > >> > > > > >> let me
> > > > > > >> > > > > >> > > > know
> > > > > > >> > > > > >> > > > > > your
> > > > > > >> > > > > >> > > > > > > > > > > feedback?
> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > > >> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Regards,
> > > > > Mayuresh R. Gharat
> > > > > (862) 250-7125
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Becket,

I think you are right that there may be out-of-order processing. However,
it seems that out-of-order processing may also happen even if we use a
separate queue.

Here is the example:

- Controller sends R1 and got disconnected before receiving response. Then
it reconnects and sends R2. Both requests now stay in the controller
request queue in the order they are sent.
- thread1 takes R1_a from the request queue and then thread2 takes R2 from
the request queue almost at the same time.
- So R1_a and R2 are processed in parallel. There is chance that R2's
processing is completed before R1.

If out-of-order processing can happen for both approaches with very low
probability, it may not be worthwhile to add the extra queue. What do you
think?

Thanks,
Dong


On Wed, Jul 18, 2018 at 6:17 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Mayuresh/Joel,
>
> Using the request channel as a dequeue was bright up some time ago when we
> initially thinking of prioritizing the request. The concern was that the
> controller requests are supposed to be processed in order. If we can ensure
> that there is one controller request in the request channel, the order is
> not a concern. But in cases that there are more than one controller request
> inserted into the queue, the controller request order may change and cause
> problem. For example, think about the following sequence:
> 1. Controller successfully sent a request R1 to broker
> 2. Broker receives R1 and put the request to the head of the request queue.
> 3. Controller to broker connection failed and the controller reconnected to
> the broker.
> 4. Controller sends a request R2 to the broker
> 5. Broker receives R2 and add it to the head of the request queue.
> Now on the broker side, R2 will be processed before R1 is processed, which
> may cause problem.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:
>
> > @Mayuresh - I like your idea. It appears to be a simpler less invasive
> > alternative and it should work. Jun/Becket/others, do you see any
> pitfalls
> > with this approach?
> >
> > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > > @Mayuresh,
> > > That's a very interesting idea that I haven't thought before.
> > > It seems to solve our problem at hand pretty well, and also
> > > avoids the need to have a new size metric and capacity config
> > > for the controller request queue. In fact, if we were to adopt
> > > this design, there is no public interface change, and we
> > > probably don't need a KIP.
> > > Also implementation wise, it seems
> > > the java class LinkedBlockingQueue can readily satisfy the requirement
> > > by supporting a capacity, and also allowing inserting at both ends.
> > >
> > > My only concern is that this design is tied to the coincidence that
> > > we have two request priorities and there are two ends to a deque.
> > > Hence by using the proposed design, it seems the network layer is
> > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > an extra priority level in the future for some reason, we would
> probably
> > > need to go back to the design of separate queues, one for each priority
> > > level.
> > >
> > > In summary, I'm ok with both designs and lean toward your suggested
> > > approach.
> > > Let's hear what others think.
> > >
> > > @Becket,
> > > In light of Mayuresh's suggested new design, I'm answering your
> question
> > > only in the context
> > > of the current KIP design: I think your suggestion makes sense, and I'm
> > ok
> > > with removing the capacity config and
> > > just relying on the default value of 20 being sufficient enough.
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com
> > > > wrote:
> > >
> > > > Hi Lucas,
> > > >
> > > > Seems like the main intent here is to prioritize the controller
> request
> > > > over any other requests.
> > > > In that case, we can change the request queue to a dequeue, where you
> > > > always insert the normal requests (produce, consume,..etc) to the end
> > of
> > > > the dequeue, but if its a controller request, you insert it to the
> head
> > > of
> > > > the queue. This ensures that the controller request will be given
> > higher
> > > > priority over other requests.
> > > >
> > > > Also since we only read one request from the socket and mute it and
> > only
> > > > unmute it after handling the request, this would ensure that we don't
> > > > handle controller requests out of order.
> > > >
> > > > With this approach we can avoid the second queue and the additional
> > > config
> > > > for the size of the queue.
> > > >
> > > > What do you think ?
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Hey Joel,
> > > > >
> > > > > Thank for the detail explanation. I agree the current design makes
> > > sense.
> > > > > My confusion is about whether the new config for the controller
> queue
> > > > > capacity is necessary. I cannot think of a case in which users
> would
> > > > change
> > > > > it.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > >
> > > > > > I guess my question can be rephrased to "do we expect user to
> ever
> > > > change
> > > > > > the controller request queue capacity"? If we agree that 20 is
> > > already
> > > > a
> > > > > > very generous default number and we do not expect user to change
> > it,
> > > is
> > > > > it
> > > > > > still necessary to expose this as a config?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> @Becket
> > > > > >> 1. Thanks for the comment. You are right that normally there
> > should
> > > be
> > > > > >> just
> > > > > >> one controller request because of muting,
> > > > > >> and I had NOT intended to say there would be many enqueued
> > > controller
> > > > > >> requests.
> > > > > >> I went through the KIP again, and I'm not sure which part
> conveys
> > > that
> > > > > >> info.
> > > > > >> I'd be happy to revise if you point it out the section.
> > > > > >>
> > > > > >> 2. Though it should not happen in normal conditions, the current
> > > > design
> > > > > >> does not preclude multiple controllers running
> > > > > >> at the same time, hence if we don't have the controller queue
> > > capacity
> > > > > >> config and simply make its capacity to be 1,
> > > > > >> network threads handling requests from different controllers
> will
> > be
> > > > > >> blocked during those troublesome times,
> > > > > >> which is probably not what we want. On the other hand, adding
> the
> > > > extra
> > > > > >> config with a default value, say 20, guards us from issues in
> > those
> > > > > >> troublesome times, and IMO there isn't much downside of adding
> the
> > > > extra
> > > > > >> config.
> > > > > >>
> > > > > >> @Mayuresh
> > > > > >> Good catch, this sentence is an obsolete statement based on a
> > > previous
> > > > > >> design. I've revised the wording in the KIP.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Lucas
> > > > > >>
> > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > >>
> > > > > >> > Hi Lucas,
> > > > > >> >
> > > > > >> > Thanks for the KIP.
> > > > > >> > I am trying to understand why you think "The memory
> consumption
> > > can
> > > > > rise
> > > > > >> > given the total number of queued requests can go up to 2x" in
> > the
> > > > > impact
> > > > > >> > section. Normally the requests from controller to a Broker are
> > not
> > > > > high
> > > > > >> > volume, right ?
> > > > > >> >
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Mayuresh
> > > > > >> >
> > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > becket.qin@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane from
> > the
> > > > > data
> > > > > >> > plane
> > > > > >> > > makes a lot of sense.
> > > > > >> > >
> > > > > >> > > In the KIP you mentioned that the controller request queue
> may
> > > > have
> > > > > >> many
> > > > > >> > > requests in it. Will this be a common case? The controller
> > > > requests
> > > > > >> still
> > > > > >> > > goes through the SocketServer. The SocketServer will mute
> the
> > > > > channel
> > > > > >> > once
> > > > > >> > > a request is read and put into the request channel. So
> > assuming
> > > > > there
> > > > > >> is
> > > > > >> > > only one connection between controller and each broker, on
> the
> > > > > broker
> > > > > >> > side,
> > > > > >> > > there should be only one controller request in the
> controller
> > > > > request
> > > > > >> > queue
> > > > > >> > > at any given time. If that is the case, do we need a
> separate
> > > > > >> controller
> > > > > >> > > request queue capacity config? The default value 20 means
> that
> > > we
> > > > > >> expect
> > > > > >> > > there are 20 controller switches to happen in a short period
> > of
> > > > > time.
> > > > > >> I
> > > > > >> > am
> > > > > >> > > not sure whether someone should increase the controller
> > request
> > > > > queue
> > > > > >> > > capacity to handle such case, as it seems indicating
> something
> > > > very
> > > > > >> wrong
> > > > > >> > > has happened.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > lindong28@gmail.com>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Thanks for the update Lucas.
> > > > > >> > > >
> > > > > >> > > > I think the motivation section is intuitive. It will be
> good
> > > to
> > > > > >> learn
> > > > > >> > > more
> > > > > >> > > > about the comments from other reviewers.
> > > > > >> > > >
> > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Dong,
> > > > > >> > > > >
> > > > > >> > > > > I've updated the motivation section of the KIP by
> > explaining
> > > > the
> > > > > >> > cases
> > > > > >> > > > that
> > > > > >> > > > > would have user impacts.
> > > > > >> > > > > Please take a look at let me know your comments.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Lucas
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi Dong,
> > > > > >> > > > > >
> > > > > >> > > > > > The simulation of disk being slow is merely for me to
> > > easily
> > > > > >> > > construct
> > > > > >> > > > a
> > > > > >> > > > > > testing scenario
> > > > > >> > > > > > with a backlog of produce requests. In production,
> other
> > > > than
> > > > > >> the
> > > > > >> > > disk
> > > > > >> > > > > > being slow, a backlog of
> > > > > >> > > > > > produce requests may also be caused by high produce
> QPS.
> > > > > >> > > > > > In that case, we may not want to kill the broker and
> > > that's
> > > > > when
> > > > > >> > this
> > > > > >> > > > KIP
> > > > > >> > > > > > can be useful, both for JBOD
> > > > > >> > > > > > and non-JBOD setup.
> > > > > >> > > > > >
> > > > > >> > > > > > Going back to your previous question about each
> > > > ProduceRequest
> > > > > >> > > covering
> > > > > >> > > > > 20
> > > > > >> > > > > > partitions that are randomly
> > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > enqueued
> > > > that
> > > > > >> > tries
> > > > > >> > > to
> > > > > >> > > > > > switch the current broker, say broker0, from leader to
> > > > > follower
> > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> sake
> > of
> > > > > >> > argument,
> > > > > >> > > > > > let's also assume the other brokers, say broker1, have
> > > > > *stopped*
> > > > > >> > > > fetching
> > > > > >> > > > > > from
> > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
> > (ALL)
> > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > > > >> LeaderAndISR
> > > > > >> > > will
> > > > > >> > > > be
> > > > > >> > > > > > put into the purgatory,
> > > > > >> > > > > >         and since they'll never be replicated to other
> > > > brokers
> > > > > >> > > (because
> > > > > >> > > > > of
> > > > > >> > > > > > the assumption made above), they will
> > > > > >> > > > > >         be completed either when the LeaderAndISR
> > request
> > > is
> > > > > >> > > processed
> > > > > >> > > > or
> > > > > >> > > > > > when the timeout happens.
> > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> transition
> > > the
> > > > > >> > > partition
> > > > > >> > > > > > test-0 to become a follower,
> > > > > >> > > > > >         after the current broker sees the replication
> of
> > > the
> > > > > >> > > remaining
> > > > > >> > > > 19
> > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > >> > > > > >         it's no longer the leader for the "test-0".
> > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
> > let's
> > > > say
> > > > > >> > there
> > > > > >> > > > are
> > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
> > there
> > > > are
> > > > > 8
> > > > > >> io
> > > > > >> > > > > threads,
> > > > > >> > > > > >   so each io thread will process approximately 3000
> > > produce
> > > > > >> > requests.
> > > > > >> > > > Now
> > > > > >> > > > > > let's investigate the io thread that finally processed
> > the
> > > > > >> > > > LeaderAndISR.
> > > > > >> > > > > >   For the 3000 produce requests, if we model the time
> > when
> > > > > their
> > > > > >> > > > > remaining
> > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > > > LeaderAndISR
> > > > > >> > > > request
> > > > > >> > > > > is
> > > > > >> > > > > > processed at time t3000.
> > > > > >> > > > > >   Without this KIP, the 1st produce request would have
> > > > waited
> > > > > an
> > > > > >> > > extra
> > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> time
> > of
> > > > > >> t3000 -
> > > > > >> > > t1,
> > > > > >> > > > > etc.
> > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
> for
> > > the
> > > > > >> > earlier
> > > > > >> > > > > > produce requests than for the later ones. For the same
> > > > reason,
> > > > > >> the
> > > > > >> > > more
> > > > > >> > > > > > ProduceRequests queued
> > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> > > (capped
> > > > > by
> > > > > >> the
> > > > > >> > > > > > produce timeout).
> > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > acks=1
> > > > > >> > > > > >   There will be no latency differences in this case,
> but
> > > > > >> > > > > >   2.1 without this KIP, the records of partition
> test-0
> > in
> > > > the
> > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > appended
> > > > to
> > > > > >> the
> > > > > >> > > local
> > > > > >> > > > > log,
> > > > > >> > > > > >         and eventually be truncated after processing
> the
> > > > > >> > > LeaderAndISR.
> > > > > >> > > > > > This is what's referred to as
> > > > > >> > > > > >         "some unofficial definition of data loss in
> > terms
> > > of
> > > > > >> > messages
> > > > > >> > > > > > beyond the high watermark".
> > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect since
> if
> > > the
> > > > > >> > > > LeaderAndISR
> > > > > >> > > > > > is immediately processed, the response to producers
> will
> > > > have
> > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > producers
> > > > to
> > > > > >> retry
> > > > > >> > > > > >
> > > > > >> > > > > > This explanation above is the benefit for reducing the
> > > > latency
> > > > > >> of a
> > > > > >> > > > > broker
> > > > > >> > > > > > becoming the follower,
> > > > > >> > > > > > closely related is reducing the latency of a broker
> > > becoming
> > > > > the
> > > > > >> > > > leader.
> > > > > >> > > > > > In this case, the benefit is even more obvious, if
> other
> > > > > brokers
> > > > > >> > have
> > > > > >> > > > > > resigned leadership, and the
> > > > > >> > > > > > current broker should take leadership. Any delay in
> > > > processing
> > > > > >> the
> > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > >> > > > > > by clients as unavailability. In extreme cases, this
> can
> > > > cause
> > > > > >> > failed
> > > > > >> > > > > > produce requests if the retries are
> > > > > >> > > > > > exhausted.
> > > > > >> > > > > >
> > > > > >> > > > > > Another two types of controller requests are
> > > UpdateMetadata
> > > > > and
> > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > >> > > > > > For UpdateMetadata requests, delayed processing means
> > > > clients
> > > > > >> > > receiving
> > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > >> > > > > > for certain partitions, and the effect is more retries
> > or
> > > > even
> > > > > >> > fatal
> > > > > >> > > > > > failure if the retries are exhausted.
> > > > > >> > > > > >
> > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > degrade
> > > > the
> > > > > >> > > > performance
> > > > > >> > > > > > of topic deletion.
> > > > > >> > > > > >
> > > > > >> > > > > > Regarding your last question of the delay for
> > > > > >> > DescribeLogDirsRequest,
> > > > > >> > > > you
> > > > > >> > > > > > are right
> > > > > >> > > > > > that this KIP cannot help with the latency in getting
> > the
> > > > log
> > > > > >> dirs
> > > > > >> > > > info,
> > > > > >> > > > > > and it's only relevant
> > > > > >> > > > > > when controller requests are involved.
> > > > > >> > > > > >
> > > > > >> > > > > > Regards,
> > > > > >> > > > > > Lucas
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > lindong28@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> Hey Jun,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Thanks much for the comments. It is good point. So
> the
> > > > > feature
> > > > > >> may
> > > > > >> > > be
> > > > > >> > > > > >> useful for JBOD use-case. I have one question below.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Hey Lucas,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Do you think this feature is also useful for non-JBOD
> > > setup
> > > > > or
> > > > > >> it
> > > > > >> > is
> > > > > >> > > > > only
> > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > understand
> > > > > this.
> > > > > >> > > > > >>
> > > > > >> > > > > >> When the broker is setup using JBOD, in order to move
> > > > leaders
> > > > > >> on
> > > > > >> > the
> > > > > >> > > > > >> failed
> > > > > >> > > > > >> disk to other disks, the system operator first needs
> to
> > > get
> > > > > the
> > > > > >> > list
> > > > > >> > > > of
> > > > > >> > > > > >> partitions on the failed disk. This is currently
> > achieved
> > > > > using
> > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > >> DescribeLogDirsRequest
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > >> broker. If we only prioritize the controller
> requests,
> > > then
> > > > > the
> > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > >> may still take a long time to be processed by the
> > broker.
> > > > So
> > > > > >> the
> > > > > >> > > > overall
> > > > > >> > > > > >> time to move leaders away from the failed disk may
> > still
> > > be
> > > > > >> long
> > > > > >> > > even
> > > > > >> > > > > with
> > > > > >> > > > > >> this KIP. What do you think?
> > > > > >> > > > > >>
> > > > > >> > > > > >> Thanks,
> > > > > >> > > > > >> Dong
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >>
> > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > @Dong,
> > > > > >> > > > > >> > Since both of the two comments in your previous
> email
> > > are
> > > > > >> about
> > > > > >> > > the
> > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
> > this
> > > > KIP
> > > > > >> can
> > > > > >> > be
> > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > >> > > > > >> > Please let me know, thanks!
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > Regards,
> > > > > >> > > > > >> > Lucas
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > jun@confluent.io>
> > > > > >> > wrote:
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
> > > should
> > > > > just
> > > > > >> > kill
> > > > > >> > > > the
> > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
> > only
> > > > one
> > > > > of
> > > > > >> > the
> > > > > >> > > > > disks
> > > > > >> > > > > >> on
> > > > > >> > > > > >> > a
> > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
> and
> > > move
> > > > > the
> > > > > >> > > > leaders
> > > > > >> > > > > on
> > > > > >> > > > > >> > that
> > > > > >> > > > > >> > > disk to other brokers. In that case, being able
> to
> > > > > process
> > > > > >> the
> > > > > >> > > > > >> > LeaderAndIsr
> > > > > >> > > > > >> > > requests faster will potentially help the
> producers
> > > > > recover
> > > > > >> > > > quicker.
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Jun
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > >> lindong28@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > > Hey Lucas,
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
> > > below.
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > > > partitions
> > > > > >> > that
> > > > > >> > > > are
> > > > > >> > > > > >> > > randomly
> > > > > >> > > > > >> > > > distributed across all partitions, then each
> > > > > >> ProduceRequest
> > > > > >> > > will
> > > > > >> > > > > >> likely
> > > > > >> > > > > >> > > > cover some partitions for which the broker is
> > still
> > > > > >> leader
> > > > > >> > > after
> > > > > >> > > > > it
> > > > > >> > > > > >> > > quickly
> > > > > >> > > > > >> > > > processes the
> > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be
> > slow
> > > > in
> > > > > >> > > > processing
> > > > > >> > > > > >> these
> > > > > >> > > > > >> > > > ProduceRequest and request will still be very
> > high
> > > > with
> > > > > >> this
> > > > > >> > > > KIP.
> > > > > >> > > > > It
> > > > > >> > > > > >> > > seems
> > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> after
> > > 30
> > > > > >> > seconds.
> > > > > >> > > Is
> > > > > >> > > > > >> this
> > > > > >> > > > > >> > > > understanding correct?
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
> > > > timeout
> > > > > >> after
> > > > > >> > > 30
> > > > > >> > > > > >> > seconds,
> > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > average
> > > > > >> produce
> > > > > >> > > > > latency.
> > > > > >> > > > > >> Can
> > > > > >> > > > > >> > > you
> > > > > >> > > > > >> > > > clarify what metrics can be improved by this
> KIP?
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > number
> > > of
> > > > > >> > > truncated
> > > > > >> > > > > >> > messages.
> > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > throughput
> > > > or
> > > > > >> > reduce
> > > > > >> > > > > >> message
> > > > > >> > > > > >> > > > duplication? It will be good to understand
> this.
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Thanks,
> > > > > >> > > > > >> > > > Dong
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > >> > > lucasatucla@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > >> > wrote:
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > > Hi Dong,
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please see
> > my
> > > > > reply
> > > > > >> > > below.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
> Now
> > > > let's
> > > > > >> > > consider
> > > > > >> > > > a
> > > > > >> > > > > >> more
> > > > > >> > > > > >> > > > common
> > > > > >> > > > > >> > > > > scenario
> > > > > >> > > > > >> > > > > where broker0 is the leader of many
> partitions.
> > > And
> > > > > >> let's
> > > > > >> > > say
> > > > > >> > > > > for
> > > > > >> > > > > >> > some
> > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > >> > > > > The number of leader partitions on broker0 is
> > so
> > > > > large,
> > > > > >> > say
> > > > > >> > > > 10K,
> > > > > >> > > > > >> that
> > > > > >> > > > > >> > > the
> > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > leadership
> > > > > >> for a
> > > > > >> > > lot
> > > > > >> > > > of
> > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > >> > > > > >> > > > > either manually or through some service like
> > > cruise
> > > > > >> > control.
> > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > > > transitions
> > > > > >> > > finish
> > > > > >> > > > > >> more
> > > > > >> > > > > >> > > > > quickly, helping the cluster itself becoming
> > more
> > > > > >> > balanced,
> > > > > >> > > > > >> > > > > but all existing producers corresponding to
> the
> > > 9K
> > > > > >> > > partitions
> > > > > >> > > > > will
> > > > > >> > > > > >> > get
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > >> > > > > rather than relying on their timeout, thanks
> to
> > > the
> > > > > >> > batched
> > > > > >> > > > > async
> > > > > >> > > > > >> ZK
> > > > > >> > > > > >> > > > > operations.
> > > > > >> > > > > >> > > > > To me it's a useful feature to have during
> such
> > > > > >> > troublesome
> > > > > >> > > > > times.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> shown
> > > > that
> > > > > >> with
> > > > > >> > > this
> > > > > >> > > > > KIP
> > > > > >> > > > > >> > many
> > > > > >> > > > > >> > > > > producers
> > > > > >> > > > > >> > > > > receive an explicit error
> > NotLeaderForPartition,
> > > > > based
> > > > > >> on
> > > > > >> > > > which
> > > > > >> > > > > >> they
> > > > > >> > > > > >> > > > retry
> > > > > >> > > > > >> > > > > immediately.
> > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> retry)
> > > for
> > > > > >> their
> > > > > >> > > > single
> > > > > >> > > > > >> > > message
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > much smaller
> > > > > >> > > > > >> > > > > compared with the case of timing out without
> > the
> > > > KIP
> > > > > >> (30
> > > > > >> > > > seconds
> > > > > >> > > > > >> for
> > > > > >> > > > > >> > > > timing
> > > > > >> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > >> > > > > One might argue that reducing the timing out
> on
> > > the
> > > > > >> > producer
> > > > > >> > > > > side
> > > > > >> > > > > >> can
> > > > > >> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > drawbacks[1].
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > number
> > > of
> > > > > >> > > truncated
> > > > > >> > > > > >> > messages
> > > > > >> > > > > >> > > on
> > > > > >> > > > > >> > > > > brokers,
> > > > > >> > > > > >> > > > > with the experiments done in the Google Doc,
> it
> > > > > should
> > > > > >> be
> > > > > >> > > easy
> > > > > >> > > > > to
> > > > > >> > > > > >> see
> > > > > >> > > > > >> > > > that
> > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> up-to-date
> > > > > >> metadata
> > > > > >> > > > avoids
> > > > > >> > > > > >> > > appending
> > > > > >> > > > > >> > > > > of messages
> > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
> to a
> > > > > system
> > > > > >> > > > operator
> > > > > >> > > > > >> and
> > > > > >> > > > > >> > ask
> > > > > >> > > > > >> > > > > whether
> > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
> > likely
> > > > the
> > > > > >> > answer
> > > > > >> > > > is
> > > > > >> > > > > >> yes.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 3. To answer your question, I think it might
> be
> > > > > >> helpful to
> > > > > >> > > > > >> construct
> > > > > >> > > > > >> > > some
> > > > > >> > > > > >> > > > > formulas.
> > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
> the
> > > > case
> > > > > >> where
> > > > > >> > > > there
> > > > > >> > > > > >> is
> > > > > >> > > > > >> > > only
> > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > >> > > > > Following the experiments in the Google Doc,
> > > let's
> > > > > say
> > > > > >> > > broker0
> > > > > >> > > > > >> > becomes
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > requests
> > > in
> > > > > its
> > > > > >> > > > request
> > > > > >> > > > > >> > queue.
> > > > > >> > > > > >> > > > > With the up-to-date metadata brought by this
> > KIP,
> > > > > >> broker0
> > > > > >> > > can
> > > > > >> > > > > >> reply
> > > > > >> > > > > >> > > with
> > > > > >> > > > > >> > > > an
> > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > >> > > > > >> > > > > let's use M1 to denote the average processing
> > > time
> > > > of
> > > > > >> > > replying
> > > > > >> > > > > >> with
> > > > > >> > > > > >> > > such
> > > > > >> > > > > >> > > > an
> > > > > >> > > > > >> > > > > error message.
> > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > append
> > > > > >> messages
> > > > > >> > to
> > > > > >> > > > > >> > segments,
> > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > >> > > > > >> > > > > let's use M2 to denote the average processing
> > > time
> > > > > for
> > > > > >> > such
> > > > > >> > > > > logic.
> > > > > >> > > > > >> > > > > Then the average extra latency incurred
> without
> > > > this
> > > > > >> KIP
> > > > > >> > is
> > > > > >> > > N
> > > > > >> > > > *
> > > > > >> > > > > >> (M2 -
> > > > > >> > > > > >> > > > M1) /
> > > > > >> > > > > >> > > > > 2.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > In practice, M2 should always be larger than
> > M1,
> > > > > which
> > > > > >> > means
> > > > > >> > > > as
> > > > > >> > > > > >> long
> > > > > >> > > > > >> > > as N
> > > > > >> > > > > >> > > > > is positive,
> > > > > >> > > > > >> > > > > we would see improvements on the average
> > latency.
> > > > > >> > > > > >> > > > > There does not need to be significant backlog
> > of
> > > > > >> requests
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > >> > > request
> > > > > >> > > > > >> > > > > queue,
> > > > > >> > > > > >> > > > > or severe degradation of disk performance to
> > have
> > > > the
> > > > > >> > > > > improvement.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Regards,
> > > > > >> > > > > >> > > > > Lucas
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> > > > > producer
> > > > > >> > side
> > > > > >> > > > can
> > > > > >> > > > > >> > trigger
> > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > overloaded,
> > > > > >> > > > exacerbating
> > > > > >> > > > > >> the
> > > > > >> > > > > >> > > > > situation.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > > > >> > > lindong28@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > >> > wrote:
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Thanks much for the detailed documentation
> of
> > > the
> > > > > >> > > > experiment.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Initially I also think having a separate
> > queue
> > > > for
> > > > > >> > > > controller
> > > > > >> > > > > >> > > requests
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > summary
> > > > > >> section
> > > > > >> > of
> > > > > >> > > > the
> > > > > >> > > > > >> > Google
> > > > > >> > > > > >> > > > > doc,
> > > > > >> > > > > >> > > > > > controller requests are generally more
> > > important
> > > > > than
> > > > > >> > data
> > > > > >> > > > > >> requests
> > > > > >> > > > > >> > > and
> > > > > >> > > > > >> > > > > we
> > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > processed
> > > > > >> > sooner.
> > > > > >> > > > But
> > > > > >> > > > > >> then
> > > > > >> > > > > >> > > Eno
> > > > > >> > > > > >> > > > > has
> > > > > >> > > > > >> > > > > > two very good questions which I am not sure
> > the
> > > > > >> Google
> > > > > >> > doc
> > > > > >> > > > has
> > > > > >> > > > > >> > > answered
> > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> following
> > > > > >> questions?
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
> > > > benefit
> > > > > of
> > > > > >> > > > KIP-291
> > > > > >> > > > > to
> > > > > >> > > > > >> > > users.
> > > > > >> > > > > >> > > > > The
> > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> simulates
> > > the
> > > > > >> > scenario
> > > > > >> > > > that
> > > > > >> > > > > >> > broker
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
> e.g.
> > > > slow
> > > > > >> disk.
> > > > > >> > > It
> > > > > >> > > > > >> > currently
> > > > > >> > > > > >> > > > > > assumes that there is only 1 partition. But
> > in
> > > > the
> > > > > >> > common
> > > > > >> > > > > >> scenario,
> > > > > >> > > > > >> > > it
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > probably reasonable to assume that there
> are
> > > many
> > > > > >> other
> > > > > >> > > > > >> partitions
> > > > > >> > > > > >> > > that
> > > > > >> > > > > >> > > > > are
> > > > > >> > > > > >> > > > > > also actively produced to and
> ProduceRequest
> > to
> > > > > these
> > > > > >> > > > > partition
> > > > > >> > > > > >> > also
> > > > > >> > > > > >> > > > > takes
> > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
> > > > broker0
> > > > > >> can
> > > > > >> > > > become
> > > > > >> > > > > >> > > follower
> > > > > >> > > > > >> > > > > for
> > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> needs
> > > to
> > > > > >> process
> > > > > >> > > the
> > > > > >> > > > > >> > > > > ProduceRequest
> > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > ProduceRequests
> > > > > >> > cover
> > > > > >> > > > > other
> > > > > >> > > > > >> > > > > partitions.
> > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still timeout
> > > after
> > > > > 30
> > > > > >> > > seconds
> > > > > >> > > > > and
> > > > > >> > > > > >> > most
> > > > > >> > > > > >> > > > > > clients will still likely timeout after 30
> > > > seconds.
> > > > > >> Then
> > > > > >> > > it
> > > > > >> > > > is
> > > > > >> > > > > >> not
> > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> since
> > > > > client
> > > > > >> > will
> > > > > >> > > > > >> timeout
> > > > > >> > > > > >> > > after
> > > > > >> > > > > >> > > > > 30
> > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> > > broker1,
> > > > > >> with
> > > > > >> > or
> > > > > >> > > > > >> without
> > > > > >> > > > > >> > > > > KIP-291.
> > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> > > > > benefits
> > > > > >> of
> > > > > >> > > this
> > > > > >> > > > > >> KIP to
> > > > > >> > > > > >> > > > user
> > > > > >> > > > > >> > > > > or
> > > > > >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> > > > > decreases
> > > > > >> > > > average
> > > > > >> > > > > >> > > latency,
> > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > exception
> > > > > >> exposed
> > > > > >> > to
> > > > > >> > > > > >> client
> > > > > >> > > > > >> > > etc.
> > > > > >> > > > > >> > > > It
> > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> experience
> > > > only
> > > > > >> when
> > > > > >> > > > there
> > > > > >> > > > > is
> > > > > >> > > > > >> > > issue
> > > > > >> > > > > >> > > > > with
> > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > request
> > > > > queue
> > > > > >> > due
> > > > > >> > > to
> > > > > >> > > > > >> slow
> > > > > >> > > > > >> > > disk
> > > > > >> > > > > >> > > > as
> > > > > >> > > > > >> > > > > > described in the Google doc? Or is this KIP
> > > also
> > > > > >> useful
> > > > > >> > > when
> > > > > >> > > > > >> there
> > > > > >> > > > > >> > is
> > > > > >> > > > > >> > > > no
> > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> > > helpful
> > > > > to
> > > > > >> > > clarify
> > > > > >> > > > > >> this
> > > > > >> > > > > >> > to
> > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Thanks much,
> > > > > >> > > > > >> > > > > > Dong
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> Wang <
> > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > > > wrote:
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > experiment
> > > > > >> results.
> > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > achieved
> > > > by
> > > > > >> > > > > implementing
> > > > > >> > > > > >> > the
> > > > > >> > > > > >> > > > > > proposed
> > > > > >> > > > > >> > > > > > > change:
> > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > >> > > > > >> > > > > > > Please take a look when you have time and
> > let
> > > > me
> > > > > >> know
> > > > > >> > > your
> > > > > >> > > > > >> > > feedback.
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > Regards,
> > > > > >> > > > > >> > > > > > > Lucas
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > > > >> > > kafka@harsha.io>
> > > > > >> > > > > >> wrote:
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> look
> > > > might
> > > > > >> suit
> > > > > >> > > our
> > > > > >> > > > > >> > > > requirements
> > > > > >> > > > > >> > > > > > > > better.
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > Harsha
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> Lucas
> > > > Wang <
> > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > replication
> > > > > >> quota
> > > > > >> > > > > mechanism
> > > > > >> > > > > >> > > > proposed
> > > > > >> > > > > >> > > > > > in
> > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> scenario.
> > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > Harsha <
> > > > > >> > > > > kafka@harsha.io
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > > wrote:
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
> > > making
> > > > > >> this
> > > > > >> > > > > >> configurable
> > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> > > requests
> > > > > to
> > > > > >> be
> > > > > >> > > > > >> > prioritized.
> > > > > >> > > > > >> > > > For
> > > > > >> > > > > >> > > > > > > > example
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
> > take
> > > > out
> > > > > a
> > > > > >> > > broker
> > > > > >> > > > > and
> > > > > >> > > > > >> > bring
> > > > > >> > > > > >> > > > new
> > > > > >> > > > > >> > > > > > one
> > > > > >> > > > > >> > > > > > > > it
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> have
> > > lot
> > > > of
> > > > > >> > fetch
> > > > > >> > > > > >> requests
> > > > > >> > > > > >> > to
> > > > > >> > > > > >> > > > > other
> > > > > >> > > > > >> > > > > > > > > leaders
> > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> > > effect
> > > > > the
> > > > > >> > > > > >> > > application/client
> > > > > >> > > > > >> > > > > > > > > requests.
> > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > > > solution
> > > > > to
> > > > > >> > > > > >> de-prioritize
> > > > > >> > > > > >> > > if
> > > > > >> > > > > >> > > > a
> > > > > >> > > > > >> > > > > > new
> > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> requests,
> > we
> > > > are
> > > > > >> ok
> > > > > >> > > with
> > > > > >> > > > > the
> > > > > >> > > > > >> > > replica
> > > > > >> > > > > >> > > > > to
> > > > > >> > > > > >> > > > > > be
> > > > > >> > > > > >> > > > > > > > > > taking time but the leaders should
> > > > > prioritize
> > > > > >> > the
> > > > > >> > > > > client
> > > > > >> > > > > >> > > > > requests.
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
> > > Lucas
> > > > > Wang
> > > > > >> > > wrote:
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> feature
> > > > yet,
> > > > > >> so no
> > > > > >> > > > > >> > experimental
> > > > > >> > > > > >> > > > > > results
> > > > > >> > > > > >> > > > > > > > so
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> > > > > following
> > > > > >> > days.
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
> the
> > > > > >> priority
> > > > > >> > > queue
> > > > > >> > > > > >> does
> > > > > >> > > > > >> > not
> > > > > >> > > > > >> > > > > > > > completely
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> ahead
> > > of
> > > > > >> > > controller
> > > > > >> > > > > >> > requests.
> > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> > > greatly
> > > > > >> > mitigate
> > > > > >> > > > the
> > > > > >> > > > > >> > effect
> > > > > >> > > > > >> > > > of
> > > > > >> > > > > >> > > > > > > stable
> > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and
> > post
> > > > the
> > > > > >> > > results
> > > > > >> > > > > >> when I
> > > > > >> > > > > >> > > have
> > > > > >> > > > > >> > > > > it.
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM,
> > Eno
> > > > > >> Thereska
> > > > > >> > <
> > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a
> > > look
> > > > at
> > > > > >> > this.
> > > > > >> > > A
> > > > > >> > > > > >> couple
> > > > > >> > > > > >> > of
> > > > > >> > > > > >> > > > > > > > questions:
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> > > change
> > > > > >> after
> > > > > >> > > > > >> implementing
> > > > > >> > > > > >> > > > this
> > > > > >> > > > > >> > > > > > KIP?
> > > > > >> > > > > >> > > > > > > > > I'm
> > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > experimental
> > > > > >> > results
> > > > > >> > > > > that
> > > > > >> > > > > >> > show
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > > > > benefit
> > > > > >> > > > > >> > > > > > > > > of
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > sufficient
> > > > in
> > > > > >> > > > addressing
> > > > > >> > > > > >> the
> > > > > >> > > > > >> > > > > problem
> > > > > >> > > > > >> > > > > > > the
> > > > > >> > > > > >> > > > > > > > > KIP
> > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
> > > > queues,
> > > > > >> you
> > > > > >> > > will
> > > > > >> > > > > >> > sometimes
> > > > > >> > > > > >> > > > > > > (often?)
> > > > > >> > > > > >> > > > > > > > > have
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
> > will
> > > be
> > > > > >> ahead
> > > > > >> > of
> > > > > >> > > > the
> > > > > >> > > > > >> > > control
> > > > > >> > > > > >> > > > > > plane
> > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > >> > > > > >> > > > > > > > > > > > This happens because the system
> > > might
> > > > > >> have
> > > > > >> > > > already
> > > > > >> > > > > >> > > started
> > > > > >> > > > > >> > > > > > > > > processing
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > data plane requests before the
> > > > control
> > > > > >> plane
> > > > > >> > > > ones
> > > > > >> > > > > >> > > arrived.
> > > > > >> > > > > >> > > > So
> > > > > >> > > > > >> > > > > > it
> > > > > >> > > > > >> > > > > > > > > would
> > > > > >> > > > > >> > > > > > > > > > > be
> > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > problem
> > > > this
> > > > > >> KIP
> > > > > >> > > > > >> addresses.
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
> PM,
> > > Ted
> > > > > Yu <
> > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42
> > AM,
> > > > > Lucas
> > > > > >> > Wang
> > > > > >> > > <
> > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion.
> > I've
> > > > > >> updated
> > > > > >> > > the
> > > > > >> > > > > KIP.
> > > > > >> > > > > >> > > Please
> > > > > >> > > > > >> > > > > > take
> > > > > >> > > > > >> > > > > > > > > > another
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> 6:34
> > > PM,
> > > > > Ted
> > > > > >> Yu
> > > > > >> > <
> > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > KafkaConfig.scala
> > > > :
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
> 500
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
> can
> > > > > include
> > > > > >> > the
> > > > > >> > > > > >> default
> > > > > >> > > > > >> > > value
> > > > > >> > > > > >> > > > > for
> > > > > >> > > > > >> > > > > > > > this
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > new
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > config
> > > > > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > 4:28
> > > > PM,
> > > > > >> Lucas
> > > > > >> > > > Wang
> > > > > >> > > > > <
> > > > > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by
> > > > adding
> > > > > a
> > > > > >> new
> > > > > >> > > > > config,
> > > > > >> > > > > >> > > > instead
> > > > > >> > > > > >> > > > > of
> > > > > >> > > > > >> > > > > > > > > reusing
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > Please take another
> look
> > > when
> > > > > you
> > > > > >> > have
> > > > > >> > > > > time.
> > > > > >> > > > > >> > > > Thanks a
> > > > > >> > > > > >> > > > > > > lot!
> > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > > 2:33
> > > > > PM,
> > > > > >> Ted
> > > > > >> > > Yu
> > > > > >> > > > <
> > > > > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of
> > > > > resource
> > > > > >> if
> > > > > >> > > > > control
> > > > > >> > > > > >> > > request
> > > > > >> > > > > >> > > > > > rate
> > > > > >> > > > > >> > > > > > > is
> > > > > >> > > > > >> > > > > > > > > low
> > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if
> control
> > > > > request
> > > > > >> > rate
> > > > > >> > > > can
> > > > > >> > > > > >> get
> > > > > >> > > > > >> > to
> > > > > >> > > > > >> > > > > > > 100,000,
> > > > > >> > > > > >> > > > > > > > > > > likely
> > > > > >> > > > > >> > > > > > > > > > > > > not.
> > > > > >> > > > > >> > > > > > > > > > > > > > > Then
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > using the same bound
> as
> > > > that
> > > > > >> for
> > > > > >> > > data
> > > > > >> > > > > >> > requests
> > > > > >> > > > > >> > > > > seems
> > > > > >> > > > > >> > > > > > > > high.
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018
> at
> > > > 10:13
> > > > > >> PM,
> > > > > >> > > > Lucas
> > > > > >> > > > > >> Wang
> > > > > >> > > > > >> > <
> > > > > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a
> > > look
> > > > at
> > > > > >> this
> > > > > >> > > > KIP.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the
> > > > setting
> > > > > >> of
> > > > > >> > > > > >> > > > > > "queued.max.requests"
> > > > > >> > > > > >> > > > > > > in
> > > > > >> > > > > >> > > > > > > > > > > > cluster A
> > > > > >> > > > > >> > > > > > > > > > > > > > is
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > while the setting
> in
> > > > > cluster
> > > > > >> B
> > > > > >> > is
> > > > > >> > > > > >> 100,000.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times
> > > difference
> > > > > >> might
> > > > > >> > > have
> > > > > >> > > > > >> > indicated
> > > > > >> > > > > >> > > > > that
> > > > > >> > > > > >> > > > > > > > > machines
> > > > > >> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > >> > > > > > > > > > > > > > > cluster
> > > > > >> > > > > >> > > > > > > > > > > > > > > > B
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > > >> > > > "queued.max.requests",
> > > > > >> > > > > >> the
> > > > > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > > >> > > > > >> > > > > > > > > > > > in
> > > > > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > B
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x
> capacity
> > > > > without
> > > > > >> > > > > explicitly
> > > > > >> > > > > >> > > > bothering
> > > > > >> > > > > >> > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > operators.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I understand the
> > > counter
> > > > > >> > argument
> > > > > >> > > > can
> > > > > >> > > > > be
> > > > > >> > > > > >> > that
> > > > > >> > > > > >> > > > > maybe
> > > > > >> > > > > >> > > > > > > > > that's
> > > > > >> > > > > >> > > > > > > > > > a
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > waste
> > > > > >> > > > > >> > > > > > > > > > > > > > of
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > resource if control
> > > > request
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and
> > > operators
> > > > > may
> > > > > >> > want
> > > > > >> > > > to
> > > > > >> > > > > >> fine
> > > > > >> > > > > >> > > tune
> > > > > >> > > > > >> > > > > the
> > > > > >> > > > > >> > > > > > > > > > capacity
> > > > > >> > > > > >> > > > > > > > > > > of
> > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> controlRequestQueue.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either
> > > > > approach,
> > > > > >> and
> > > > > >> > > can
> > > > > >> > > > > >> change
> > > > > >> > > > > >> > > it
> > > > > >> > > > > >> > > > if
> > > > > >> > > > > >> > > > > > you
> > > > > >> > > > > >> > > > > > > > or
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > anyone
> > > > > >> > > > > >> > > > > > > > > > > > > > else
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding
> > the
> > > > > extra
> > > > > >> > > > config.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> 2018
> > at
> > > > > 3:11
> > > > > >> PM,
> > > > > >> > > Ted
> > > > > >> > > > > Yu
> > > > > >> > > > > >> <
> > > > > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> > > > > >> Alternatives,
> > > > > >> > #2,
> > > > > >> > > > can
> > > > > >> > > > > >> you
> > > > > >> > > > > >> > > > > > elaborate
> > > > > >> > > > > >> > > > > > > a
> > > > > >> > > > > >> > > > > > > > > bit
> > > > > >> > > > > >> > > > > > > > > > > more
> > > > > >> > > > > >> > > > > > > > > > > > > on
> > > > > >> > > > > >> > > > > > > > > > > > > > > why
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > separate config
> has
> > > > > bigger
> > > > > >> > > impact
> > > > > >> > > > ?
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > 2018
> > > at
> > > > > >> 2:00
> > > > > >> > PM,
> > > > > >> > > > > Dong
> > > > > >> > > > > >> > Lin <
> > > > > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the
> > KIP.
> > > > > Looks
> > > > > >> > good
> > > > > >> > > > > >> overall.
> > > > > >> > > > > >> > > > Some
> > > > > >> > > > > >> > > > > > > > > comments
> > > > > >> > > > > >> > > > > > > > > > > > below:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually
> > > specify
> > > > > the
> > > > > >> > full
> > > > > >> > > > > mbean
> > > > > >> > > > > >> for
> > > > > >> > > > > >> > > the
> > > > > >> > > > > >> > > > > new
> > > > > >> > > > > >> > > > > > > > > metrics
> > > > > >> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in
> the
> > > > > Public
> > > > > >> > > > Interface
> > > > > >> > > > > >> > > section
> > > > > >> > > > > >> > > > > > > similar
> > > > > >> > > > > >> > > > > > > > > to
> > > > > >> > > > > >> > > > > > > > > > > > KIP-237
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > >> https://cwiki.apache.org/
> > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > 237%3A+More+Controller+Health+
> > > > > >> > > > > >> Metrics>
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we
> could
> > > > follow
> > > > > >> the
> > > > > >> > > same
> > > > > >> > > > > >> > pattern
> > > > > >> > > > > >> > > as
> > > > > >> > > > > >> > > > > > > KIP-153
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > > >> https://cwiki.apache.org/
> > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > >> > > > > >> > > > > > > > > > > > > metric>,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep
> the
> > > > > >> existing
> > > > > >> > > > sensor
> > > > > >> > > > > >> name
> > > > > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > > >> > > > > >> > > > > > > > > > > and
> > > > > >> > > > > >> > > > > > > > > > > > > add
> > > > > >> > > > > >> > > > > > > > > > > > > > a
> > > > > >> > > > > >> > > > > > > > > > > > > > > > new
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> "ReplicationBytesInPerSec",
> > > > > >> > > > rather
> > > > > >> > > > > >> than
> > > > > >> > > > > >> > > > > > replacing
> > > > > >> > > > > >> > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > sensor
> > > > > >> > > > > >> > > > > > > > > > > > > > > name "
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec"
> > with
> > > > > e.g.
> > > > > >> > > > > >> > > > > "ClientBytesInPerSec".
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that
> > the
> > > > KIP
> > > > > >> > > changes
> > > > > >> > > > > the
> > > > > >> > > > > >> > > > semantics
> > > > > >> > > > > >> > > > > > of
> > > > > >> > > > > >> > > > > > > > the
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > broker
> > > > > >> > > > > >> > > > > > > > > > > > > > > config
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > "queued.max.requests"
> > > > > >> > because
> > > > > >> > > > the
> > > > > >> > > > > >> > number
> > > > > >> > > > > >> > > of
> > > > > >> > > > > >> > > > > > total
> > > > > >> > > > > >> > > > > > > > > > > requests
> > > > > >> > > > > >> > > > > > > > > > > > > > queued
> > > > > >> > > > > >> > > > > > > > > > > > > > > > in
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be
> no
> > > > > longer
> > > > > >> > > bounded
> > > > > >> > > > > by
> > > > > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > > >> > > > > >> > > > > > > > > > > > > This
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be
> > > specified
> > > > > in
> > > > > >> the
> > > > > >> > > > > Public
> > > > > >> > > > > >> > > > > Interfaces
> > > > > >> > > > > >> > > > > > > > > section
> > > > > >> > > > > >> > > > > > > > > > > for
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > > 2018
> > > > at
> > > > > >> > 12:45
> > > > > >> > > > PM,
> > > > > >> > > > > >> Lucas
> > > > > >> > > > > >> > > > Wang
> > > > > >> > > > > >> > > > > <
> > > > > >> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com
> >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka
> > experts,
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created
> > KIP-291
> > > > to
> > > > > >> add a
> > > > > >> > > > > >> separate
> > > > > >> > > > > >> > > queue
> > > > > >> > > > > >> > > > > for
> > > > > >> > > > > >> > > > > > > > > > > controller
> > > > > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> https://cwiki.apache.org/
> > > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > 291%
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > 3A+Have+separate+queues+for+
> > > > > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > requests
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you
> please
> > > > take a
> > > > > >> look
> > > > > >> > > and
> > > > > >> > > > > >> let me
> > > > > >> > > > > >> > > > know
> > > > > >> > > > > >> > > > > > your
> > > > > >> > > > > >> > > > > > > > > > > feedback?
> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -Regards,
> > > > Mayuresh R. Gharat
> > > > (862) 250-7125
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Becket,

Sorry I misunderstood your example. I thought you mean requests from
different controller are re-ordered.

I think you have provided a very good example and it should be safer to
still use two queues. Let me clarify the example a bit more below.

- If the controller has received response for R1 before it is disconnected
from the broker, it will send a new request R2 after it is re-connected to
the broker. There is no issue in this case because R1 will be processed
before R2.

- If the controller has not received response for R1 before it is
disconnected, it will re-send R1 followed by R2 after it is re-connected to
the broker. With high probability the order of processing should be R1, R1
and R2. This is because we have multiple request handler threads and the
first two R1 will typically both be processed before R2. With low
probability the order of processing will be R1, R2, R1, which can
potentially be a problem.

Thanks,
Dong

On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Becket,
>
> It seems that the requests from the old controller will be discarded due
> to old controller epoch. It is not clear whether this is a problem.
>
> And if this out-of-order processing of controller requests is a problem,
> it seems like an existing problem which also applies to the multi-queue
> based design. So it is probably not a concern specific to the use of deque.
> Does that sound reasonable?
>
> Thanks,
> Dong
>
>
> On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com> wrote:
>
>> Hi Mayuresh/Joel,
>>
>> Using the request channel as a dequeue was bright up some time ago when we
>> initially thinking of prioritizing the request. The concern was that the
>> controller requests are supposed to be processed in order. If we can
>> ensure
>> that there is one controller request in the request channel, the order is
>> not a concern. But in cases that there are more than one controller
>> request
>> inserted into the queue, the controller request order may change and cause
>> problem. For example, think about the following sequence:
>> 1. Controller successfully sent a request R1 to broker
>> 2. Broker receives R1 and put the request to the head of the request
>> queue.
>> 3. Controller to broker connection failed and the controller reconnected
>> to
>> the broker.
>> 4. Controller sends a request R2 to the broker
>> 5. Broker receives R2 and add it to the head of the request queue.
>> Now on the broker side, R2 will be processed before R1 is processed, which
>> may cause problem.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>>
>> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:
>>
>> > @Mayuresh - I like your idea. It appears to be a simpler less invasive
>> > alternative and it should work. Jun/Becket/others, do you see any
>> pitfalls
>> > with this approach?
>> >
>> > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
>> > wrote:
>> >
>> > > @Mayuresh,
>> > > That's a very interesting idea that I haven't thought before.
>> > > It seems to solve our problem at hand pretty well, and also
>> > > avoids the need to have a new size metric and capacity config
>> > > for the controller request queue. In fact, if we were to adopt
>> > > this design, there is no public interface change, and we
>> > > probably don't need a KIP.
>> > > Also implementation wise, it seems
>> > > the java class LinkedBlockingQueue can readily satisfy the requirement
>> > > by supporting a capacity, and also allowing inserting at both ends.
>> > >
>> > > My only concern is that this design is tied to the coincidence that
>> > > we have two request priorities and there are two ends to a deque.
>> > > Hence by using the proposed design, it seems the network layer is
>> > > more tightly coupled with upper layer logic, e.g. if we were to add
>> > > an extra priority level in the future for some reason, we would
>> probably
>> > > need to go back to the design of separate queues, one for each
>> priority
>> > > level.
>> > >
>> > > In summary, I'm ok with both designs and lean toward your suggested
>> > > approach.
>> > > Let's hear what others think.
>> > >
>> > > @Becket,
>> > > In light of Mayuresh's suggested new design, I'm answering your
>> question
>> > > only in the context
>> > > of the current KIP design: I think your suggestion makes sense, and
>> I'm
>> > ok
>> > > with removing the capacity config and
>> > > just relying on the default value of 20 being sufficient enough.
>> > >
>> > > Thanks,
>> > > Lucas
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
>> > > gharatmayuresh15@gmail.com
>> > > > wrote:
>> > >
>> > > > Hi Lucas,
>> > > >
>> > > > Seems like the main intent here is to prioritize the controller
>> request
>> > > > over any other requests.
>> > > > In that case, we can change the request queue to a dequeue, where
>> you
>> > > > always insert the normal requests (produce, consume,..etc) to the
>> end
>> > of
>> > > > the dequeue, but if its a controller request, you insert it to the
>> head
>> > > of
>> > > > the queue. This ensures that the controller request will be given
>> > higher
>> > > > priority over other requests.
>> > > >
>> > > > Also since we only read one request from the socket and mute it and
>> > only
>> > > > unmute it after handling the request, this would ensure that we
>> don't
>> > > > handle controller requests out of order.
>> > > >
>> > > > With this approach we can avoid the second queue and the additional
>> > > config
>> > > > for the size of the queue.
>> > > >
>> > > > What do you think ?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Mayuresh
>> > > >
>> > > >
>> > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Hey Joel,
>> > > > >
>> > > > > Thank for the detail explanation. I agree the current design makes
>> > > sense.
>> > > > > My confusion is about whether the new config for the controller
>> queue
>> > > > > capacity is necessary. I cannot think of a case in which users
>> would
>> > > > change
>> > > > > it.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <becket.qin@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Lucas,
>> > > > > >
>> > > > > > I guess my question can be rephrased to "do we expect user to
>> ever
>> > > > change
>> > > > > > the controller request queue capacity"? If we agree that 20 is
>> > > already
>> > > > a
>> > > > > > very generous default number and we do not expect user to change
>> > it,
>> > > is
>> > > > > it
>> > > > > > still necessary to expose this as a config?
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jiangjie (Becket) Qin
>> > > > > >
>> > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
>> lucasatucla@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> @Becket
>> > > > > >> 1. Thanks for the comment. You are right that normally there
>> > should
>> > > be
>> > > > > >> just
>> > > > > >> one controller request because of muting,
>> > > > > >> and I had NOT intended to say there would be many enqueued
>> > > controller
>> > > > > >> requests.
>> > > > > >> I went through the KIP again, and I'm not sure which part
>> conveys
>> > > that
>> > > > > >> info.
>> > > > > >> I'd be happy to revise if you point it out the section.
>> > > > > >>
>> > > > > >> 2. Though it should not happen in normal conditions, the
>> current
>> > > > design
>> > > > > >> does not preclude multiple controllers running
>> > > > > >> at the same time, hence if we don't have the controller queue
>> > > capacity
>> > > > > >> config and simply make its capacity to be 1,
>> > > > > >> network threads handling requests from different controllers
>> will
>> > be
>> > > > > >> blocked during those troublesome times,
>> > > > > >> which is probably not what we want. On the other hand, adding
>> the
>> > > > extra
>> > > > > >> config with a default value, say 20, guards us from issues in
>> > those
>> > > > > >> troublesome times, and IMO there isn't much downside of adding
>> the
>> > > > extra
>> > > > > >> config.
>> > > > > >>
>> > > > > >> @Mayuresh
>> > > > > >> Good catch, this sentence is an obsolete statement based on a
>> > > previous
>> > > > > >> design. I've revised the wording in the KIP.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Lucas
>> > > > > >>
>> > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> > > > > >> gharatmayuresh15@gmail.com> wrote:
>> > > > > >>
>> > > > > >> > Hi Lucas,
>> > > > > >> >
>> > > > > >> > Thanks for the KIP.
>> > > > > >> > I am trying to understand why you think "The memory
>> consumption
>> > > can
>> > > > > rise
>> > > > > >> > given the total number of queued requests can go up to 2x" in
>> > the
>> > > > > impact
>> > > > > >> > section. Normally the requests from controller to a Broker
>> are
>> > not
>> > > > > high
>> > > > > >> > volume, right ?
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Thanks,
>> > > > > >> >
>> > > > > >> > Mayuresh
>> > > > > >> >
>> > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
>> > becket.qin@gmail.com>
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
>> from
>> > the
>> > > > > data
>> > > > > >> > plane
>> > > > > >> > > makes a lot of sense.
>> > > > > >> > >
>> > > > > >> > > In the KIP you mentioned that the controller request queue
>> may
>> > > > have
>> > > > > >> many
>> > > > > >> > > requests in it. Will this be a common case? The controller
>> > > > requests
>> > > > > >> still
>> > > > > >> > > goes through the SocketServer. The SocketServer will mute
>> the
>> > > > > channel
>> > > > > >> > once
>> > > > > >> > > a request is read and put into the request channel. So
>> > assuming
>> > > > > there
>> > > > > >> is
>> > > > > >> > > only one connection between controller and each broker, on
>> the
>> > > > > broker
>> > > > > >> > side,
>> > > > > >> > > there should be only one controller request in the
>> controller
>> > > > > request
>> > > > > >> > queue
>> > > > > >> > > at any given time. If that is the case, do we need a
>> separate
>> > > > > >> controller
>> > > > > >> > > request queue capacity config? The default value 20 means
>> that
>> > > we
>> > > > > >> expect
>> > > > > >> > > there are 20 controller switches to happen in a short
>> period
>> > of
>> > > > > time.
>> > > > > >> I
>> > > > > >> > am
>> > > > > >> > > not sure whether someone should increase the controller
>> > request
>> > > > > queue
>> > > > > >> > > capacity to handle such case, as it seems indicating
>> something
>> > > > very
>> > > > > >> wrong
>> > > > > >> > > has happened.
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jiangjie (Becket) Qin
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
>> > lindong28@gmail.com>
>> > > > > >> wrote:
>> > > > > >> > >
>> > > > > >> > > > Thanks for the update Lucas.
>> > > > > >> > > >
>> > > > > >> > > > I think the motivation section is intuitive. It will be
>> good
>> > > to
>> > > > > >> learn
>> > > > > >> > > more
>> > > > > >> > > > about the comments from other reviewers.
>> > > > > >> > > >
>> > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
>> > > > > lucasatucla@gmail.com>
>> > > > > >> > > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > >
>> > > > > >> > > > > I've updated the motivation section of the KIP by
>> > explaining
>> > > > the
>> > > > > >> > cases
>> > > > > >> > > > that
>> > > > > >> > > > > would have user impacts.
>> > > > > >> > > > > Please take a look at let me know your comments.
>> > > > > >> > > > >
>> > > > > >> > > > > Thanks,
>> > > > > >> > > > > Lucas
>> > > > > >> > > > >
>> > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
>> > > > > lucasatucla@gmail.com
>> > > > > >> >
>> > > > > >> > > > wrote:
>> > > > > >> > > > >
>> > > > > >> > > > > > Hi Dong,
>> > > > > >> > > > > >
>> > > > > >> > > > > > The simulation of disk being slow is merely for me to
>> > > easily
>> > > > > >> > > construct
>> > > > > >> > > > a
>> > > > > >> > > > > > testing scenario
>> > > > > >> > > > > > with a backlog of produce requests. In production,
>> other
>> > > > than
>> > > > > >> the
>> > > > > >> > > disk
>> > > > > >> > > > > > being slow, a backlog of
>> > > > > >> > > > > > produce requests may also be caused by high produce
>> QPS.
>> > > > > >> > > > > > In that case, we may not want to kill the broker and
>> > > that's
>> > > > > when
>> > > > > >> > this
>> > > > > >> > > > KIP
>> > > > > >> > > > > > can be useful, both for JBOD
>> > > > > >> > > > > > and non-JBOD setup.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Going back to your previous question about each
>> > > > ProduceRequest
>> > > > > >> > > covering
>> > > > > >> > > > > 20
>> > > > > >> > > > > > partitions that are randomly
>> > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
>> > enqueued
>> > > > that
>> > > > > >> > tries
>> > > > > >> > > to
>> > > > > >> > > > > > switch the current broker, say broker0, from leader
>> to
>> > > > > follower
>> > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
>> sake
>> > of
>> > > > > >> > argument,
>> > > > > >> > > > > > let's also assume the other brokers, say broker1,
>> have
>> > > > > *stopped*
>> > > > > >> > > > fetching
>> > > > > >> > > > > > from
>> > > > > >> > > > > > the current broker, i.e. broker0.
>> > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
>> > (ALL)
>> > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
>> > > > > >> LeaderAndISR
>> > > > > >> > > will
>> > > > > >> > > > be
>> > > > > >> > > > > > put into the purgatory,
>> > > > > >> > > > > >         and since they'll never be replicated to
>> other
>> > > > brokers
>> > > > > >> > > (because
>> > > > > >> > > > > of
>> > > > > >> > > > > > the assumption made above), they will
>> > > > > >> > > > > >         be completed either when the LeaderAndISR
>> > request
>> > > is
>> > > > > >> > > processed
>> > > > > >> > > > or
>> > > > > >> > > > > > when the timeout happens.
>> > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
>> transition
>> > > the
>> > > > > >> > > partition
>> > > > > >> > > > > > test-0 to become a follower,
>> > > > > >> > > > > >         after the current broker sees the
>> replication of
>> > > the
>> > > > > >> > > remaining
>> > > > > >> > > > 19
>> > > > > >> > > > > > partitions, it can send a response indicating that
>> > > > > >> > > > > >         it's no longer the leader for the "test-0".
>> > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
>> > let's
>> > > > say
>> > > > > >> > there
>> > > > > >> > > > are
>> > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
>> > there
>> > > > are
>> > > > > 8
>> > > > > >> io
>> > > > > >> > > > > threads,
>> > > > > >> > > > > >   so each io thread will process approximately 3000
>> > > produce
>> > > > > >> > requests.
>> > > > > >> > > > Now
>> > > > > >> > > > > > let's investigate the io thread that finally
>> processed
>> > the
>> > > > > >> > > > LeaderAndISR.
>> > > > > >> > > > > >   For the 3000 produce requests, if we model the time
>> > when
>> > > > > their
>> > > > > >> > > > > remaining
>> > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
>> > > > > LeaderAndISR
>> > > > > >> > > > request
>> > > > > >> > > > > is
>> > > > > >> > > > > > processed at time t3000.
>> > > > > >> > > > > >   Without this KIP, the 1st produce request would
>> have
>> > > > waited
>> > > > > an
>> > > > > >> > > extra
>> > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
>> time
>> > of
>> > > > > >> t3000 -
>> > > > > >> > > t1,
>> > > > > >> > > > > etc.
>> > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
>> for
>> > > the
>> > > > > >> > earlier
>> > > > > >> > > > > > produce requests than for the later ones. For the
>> same
>> > > > reason,
>> > > > > >> the
>> > > > > >> > > more
>> > > > > >> > > > > > ProduceRequests queued
>> > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
>> > > (capped
>> > > > > by
>> > > > > >> the
>> > > > > >> > > > > > produce timeout).
>> > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
>> > acks=1
>> > > > > >> > > > > >   There will be no latency differences in this case,
>> but
>> > > > > >> > > > > >   2.1 without this KIP, the records of partition
>> test-0
>> > in
>> > > > the
>> > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
>> > appended
>> > > > to
>> > > > > >> the
>> > > > > >> > > local
>> > > > > >> > > > > log,
>> > > > > >> > > > > >         and eventually be truncated after processing
>> the
>> > > > > >> > > LeaderAndISR.
>> > > > > >> > > > > > This is what's referred to as
>> > > > > >> > > > > >         "some unofficial definition of data loss in
>> > terms
>> > > of
>> > > > > >> > messages
>> > > > > >> > > > > > beyond the high watermark".
>> > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
>> since if
>> > > the
>> > > > > >> > > > LeaderAndISR
>> > > > > >> > > > > > is immediately processed, the response to producers
>> will
>> > > > have
>> > > > > >> > > > > >         the NotLeaderForPartition error, causing
>> > producers
>> > > > to
>> > > > > >> retry
>> > > > > >> > > > > >
>> > > > > >> > > > > > This explanation above is the benefit for reducing
>> the
>> > > > latency
>> > > > > >> of a
>> > > > > >> > > > > broker
>> > > > > >> > > > > > becoming the follower,
>> > > > > >> > > > > > closely related is reducing the latency of a broker
>> > > becoming
>> > > > > the
>> > > > > >> > > > leader.
>> > > > > >> > > > > > In this case, the benefit is even more obvious, if
>> other
>> > > > > brokers
>> > > > > >> > have
>> > > > > >> > > > > > resigned leadership, and the
>> > > > > >> > > > > > current broker should take leadership. Any delay in
>> > > > processing
>> > > > > >> the
>> > > > > >> > > > > > LeaderAndISR will be perceived
>> > > > > >> > > > > > by clients as unavailability. In extreme cases, this
>> can
>> > > > cause
>> > > > > >> > failed
>> > > > > >> > > > > > produce requests if the retries are
>> > > > > >> > > > > > exhausted.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Another two types of controller requests are
>> > > UpdateMetadata
>> > > > > and
>> > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
>> > > > > >> > > > > > For UpdateMetadata requests, delayed processing means
>> > > > clients
>> > > > > >> > > receiving
>> > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
>> > > > > >> > > > > > for certain partitions, and the effect is more
>> retries
>> > or
>> > > > even
>> > > > > >> > fatal
>> > > > > >> > > > > > failure if the retries are exhausted.
>> > > > > >> > > > > >
>> > > > > >> > > > > > For StopReplica requests, a long queuing time may
>> > degrade
>> > > > the
>> > > > > >> > > > performance
>> > > > > >> > > > > > of topic deletion.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Regarding your last question of the delay for
>> > > > > >> > DescribeLogDirsRequest,
>> > > > > >> > > > you
>> > > > > >> > > > > > are right
>> > > > > >> > > > > > that this KIP cannot help with the latency in getting
>> > the
>> > > > log
>> > > > > >> dirs
>> > > > > >> > > > info,
>> > > > > >> > > > > > and it's only relevant
>> > > > > >> > > > > > when controller requests are involved.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Regards,
>> > > > > >> > > > > > Lucas
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
>> > > > lindong28@gmail.com
>> > > > > >
>> > > > > >> > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > >> Hey Jun,
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Thanks much for the comments. It is good point. So
>> the
>> > > > > feature
>> > > > > >> may
>> > > > > >> > > be
>> > > > > >> > > > > >> useful for JBOD use-case. I have one question below.
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Hey Lucas,
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Do you think this feature is also useful for
>> non-JBOD
>> > > setup
>> > > > > or
>> > > > > >> it
>> > > > > >> > is
>> > > > > >> > > > > only
>> > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
>> > understand
>> > > > > this.
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> When the broker is setup using JBOD, in order to
>> move
>> > > > leaders
>> > > > > >> on
>> > > > > >> > the
>> > > > > >> > > > > >> failed
>> > > > > >> > > > > >> disk to other disks, the system operator first
>> needs to
>> > > get
>> > > > > the
>> > > > > >> > list
>> > > > > >> > > > of
>> > > > > >> > > > > >> partitions on the failed disk. This is currently
>> > achieved
>> > > > > using
>> > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
>> > > > > >> DescribeLogDirsRequest
>> > > > > >> > to
>> > > > > >> > > > the
>> > > > > >> > > > > >> broker. If we only prioritize the controller
>> requests,
>> > > then
>> > > > > the
>> > > > > >> > > > > >> DescribeLogDirsRequest
>> > > > > >> > > > > >> may still take a long time to be processed by the
>> > broker.
>> > > > So
>> > > > > >> the
>> > > > > >> > > > overall
>> > > > > >> > > > > >> time to move leaders away from the failed disk may
>> > still
>> > > be
>> > > > > >> long
>> > > > > >> > > even
>> > > > > >> > > > > with
>> > > > > >> > > > > >> this KIP. What do you think?
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> Thanks,
>> > > > > >> > > > > >> Dong
>> > > > > >> > > > > >>
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> > > > > >> lucasatucla@gmail.com
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >>
>> > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > @Dong,
>> > > > > >> > > > > >> > Since both of the two comments in your previous
>> email
>> > > are
>> > > > > >> about
>> > > > > >> > > the
>> > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
>> > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
>> > this
>> > > > KIP
>> > > > > >> can
>> > > > > >> > be
>> > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > >> > > > > >> > Please let me know, thanks!
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > Regards,
>> > > > > >> > > > > >> > Lucas
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
>> > > > jun@confluent.io>
>> > > > > >> > wrote:
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > > Hi, Lucas, Dong,
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
>> > > should
>> > > > > just
>> > > > > >> > kill
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
>> > only
>> > > > one
>> > > > > of
>> > > > > >> > the
>> > > > > >> > > > > disks
>> > > > > >> > > > > >> on
>> > > > > >> > > > > >> > a
>> > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
>> and
>> > > move
>> > > > > the
>> > > > > >> > > > leaders
>> > > > > >> > > > > on
>> > > > > >> > > > > >> > that
>> > > > > >> > > > > >> > > disk to other brokers. In that case, being able
>> to
>> > > > > process
>> > > > > >> the
>> > > > > >> > > > > >> > LeaderAndIsr
>> > > > > >> > > > > >> > > requests faster will potentially help the
>> producers
>> > > > > recover
>> > > > > >> > > > quicker.
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > Thanks,
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > Jun
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> > > > > >> lindong28@gmail.com
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > > Hey Lucas,
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
>> > > below.
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
>> > > > > partitions
>> > > > > >> > that
>> > > > > >> > > > are
>> > > > > >> > > > > >> > > randomly
>> > > > > >> > > > > >> > > > distributed across all partitions, then each
>> > > > > >> ProduceRequest
>> > > > > >> > > will
>> > > > > >> > > > > >> likely
>> > > > > >> > > > > >> > > > cover some partitions for which the broker is
>> > still
>> > > > > >> leader
>> > > > > >> > > after
>> > > > > >> > > > > it
>> > > > > >> > > > > >> > > quickly
>> > > > > >> > > > > >> > > > processes the
>> > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be
>> > slow
>> > > > in
>> > > > > >> > > > processing
>> > > > > >> > > > > >> these
>> > > > > >> > > > > >> > > > ProduceRequest and request will still be very
>> > high
>> > > > with
>> > > > > >> this
>> > > > > >> > > > KIP.
>> > > > > >> > > > > It
>> > > > > >> > > > > >> > > seems
>> > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
>> after
>> > > 30
>> > > > > >> > seconds.
>> > > > > >> > > Is
>> > > > > >> > > > > >> this
>> > > > > >> > > > > >> > > > understanding correct?
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
>> > > > timeout
>> > > > > >> after
>> > > > > >> > > 30
>> > > > > >> > > > > >> > seconds,
>> > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
>> > average
>> > > > > >> produce
>> > > > > >> > > > > latency.
>> > > > > >> > > > > >> Can
>> > > > > >> > > > > >> > > you
>> > > > > >> > > > > >> > > > clarify what metrics can be improved by this
>> KIP?
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Not sure why system operator directly cares
>> > number
>> > > of
>> > > > > >> > > truncated
>> > > > > >> > > > > >> > messages.
>> > > > > >> > > > > >> > > > Do you mean this KIP can improve average
>> > throughput
>> > > > or
>> > > > > >> > reduce
>> > > > > >> > > > > >> message
>> > > > > >> > > > > >> > > > duplication? It will be good to understand
>> this.
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > Thanks,
>> > > > > >> > > > > >> > > > Dong
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
>> > > > > >> > > lucasatucla@gmail.com
>> > > > > >> > > > >
>> > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > >> > > >
>> > > > > >> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
>> see
>> > my
>> > > > > reply
>> > > > > >> > > below.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
>> Now
>> > > > let's
>> > > > > >> > > consider
>> > > > > >> > > > a
>> > > > > >> > > > > >> more
>> > > > > >> > > > > >> > > > common
>> > > > > >> > > > > >> > > > > scenario
>> > > > > >> > > > > >> > > > > where broker0 is the leader of many
>> partitions.
>> > > And
>> > > > > >> let's
>> > > > > >> > > say
>> > > > > >> > > > > for
>> > > > > >> > > > > >> > some
>> > > > > >> > > > > >> > > > > reason its IO becomes slow.
>> > > > > >> > > > > >> > > > > The number of leader partitions on broker0
>> is
>> > so
>> > > > > large,
>> > > > > >> > say
>> > > > > >> > > > 10K,
>> > > > > >> > > > > >> that
>> > > > > >> > > > > >> > > the
>> > > > > >> > > > > >> > > > > cluster is skewed,
>> > > > > >> > > > > >> > > > > and the operator would like to shift the
>> > > leadership
>> > > > > >> for a
>> > > > > >> > > lot
>> > > > > >> > > > of
>> > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > >> > > > > >> > > > > either manually or through some service like
>> > > cruise
>> > > > > >> > control.
>> > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
>> > > > > transitions
>> > > > > >> > > finish
>> > > > > >> > > > > >> more
>> > > > > >> > > > > >> > > > > quickly, helping the cluster itself becoming
>> > more
>> > > > > >> > balanced,
>> > > > > >> > > > > >> > > > > but all existing producers corresponding to
>> the
>> > > 9K
>> > > > > >> > > partitions
>> > > > > >> > > > > will
>> > > > > >> > > > > >> > get
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > errors relatively quickly
>> > > > > >> > > > > >> > > > > rather than relying on their timeout,
>> thanks to
>> > > the
>> > > > > >> > batched
>> > > > > >> > > > > async
>> > > > > >> > > > > >> ZK
>> > > > > >> > > > > >> > > > > operations.
>> > > > > >> > > > > >> > > > > To me it's a useful feature to have during
>> such
>> > > > > >> > troublesome
>> > > > > >> > > > > times.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
>> shown
>> > > > that
>> > > > > >> with
>> > > > > >> > > this
>> > > > > >> > > > > KIP
>> > > > > >> > > > > >> > many
>> > > > > >> > > > > >> > > > > producers
>> > > > > >> > > > > >> > > > > receive an explicit error
>> > NotLeaderForPartition,
>> > > > > based
>> > > > > >> on
>> > > > > >> > > > which
>> > > > > >> > > > > >> they
>> > > > > >> > > > > >> > > > retry
>> > > > > >> > > > > >> > > > > immediately.
>> > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
>> retry)
>> > > for
>> > > > > >> their
>> > > > > >> > > > single
>> > > > > >> > > > > >> > > message
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > much smaller
>> > > > > >> > > > > >> > > > > compared with the case of timing out without
>> > the
>> > > > KIP
>> > > > > >> (30
>> > > > > >> > > > seconds
>> > > > > >> > > > > >> for
>> > > > > >> > > > > >> > > > timing
>> > > > > >> > > > > >> > > > > out + quick retry).
>> > > > > >> > > > > >> > > > > One might argue that reducing the timing
>> out on
>> > > the
>> > > > > >> > producer
>> > > > > >> > > > > side
>> > > > > >> > > > > >> can
>> > > > > >> > > > > >> > > > > achieve the same result,
>> > > > > >> > > > > >> > > > > yet reducing the timeout has its own
>> > > drawbacks[1].
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
>> > number
>> > > of
>> > > > > >> > > truncated
>> > > > > >> > > > > >> > messages
>> > > > > >> > > > > >> > > on
>> > > > > >> > > > > >> > > > > brokers,
>> > > > > >> > > > > >> > > > > with the experiments done in the Google
>> Doc, it
>> > > > > should
>> > > > > >> be
>> > > > > >> > > easy
>> > > > > >> > > > > to
>> > > > > >> > > > > >> see
>> > > > > >> > > > > >> > > > that
>> > > > > >> > > > > >> > > > > a lot fewer messages need
>> > > > > >> > > > > >> > > > > to be truncated on broker0 since the
>> up-to-date
>> > > > > >> metadata
>> > > > > >> > > > avoids
>> > > > > >> > > > > >> > > appending
>> > > > > >> > > > > >> > > > > of messages
>> > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
>> to a
>> > > > > system
>> > > > > >> > > > operator
>> > > > > >> > > > > >> and
>> > > > > >> > > > > >> > ask
>> > > > > >> > > > > >> > > > > whether
>> > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
>> > likely
>> > > > the
>> > > > > >> > answer
>> > > > > >> > > > is
>> > > > > >> > > > > >> yes.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > 3. To answer your question, I think it
>> might be
>> > > > > >> helpful to
>> > > > > >> > > > > >> construct
>> > > > > >> > > > > >> > > some
>> > > > > >> > > > > >> > > > > formulas.
>> > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
>> the
>> > > > case
>> > > > > >> where
>> > > > > >> > > > there
>> > > > > >> > > > > >> is
>> > > > > >> > > > > >> > > only
>> > > > > >> > > > > >> > > > > ONE partition involved.
>> > > > > >> > > > > >> > > > > Following the experiments in the Google Doc,
>> > > let's
>> > > > > say
>> > > > > >> > > broker0
>> > > > > >> > > > > >> > becomes
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > follower at time t0,
>> > > > > >> > > > > >> > > > > and after t0 there were still N produce
>> > requests
>> > > in
>> > > > > its
>> > > > > >> > > > request
>> > > > > >> > > > > >> > queue.
>> > > > > >> > > > > >> > > > > With the up-to-date metadata brought by this
>> > KIP,
>> > > > > >> broker0
>> > > > > >> > > can
>> > > > > >> > > > > >> reply
>> > > > > >> > > > > >> > > with
>> > > > > >> > > > > >> > > > an
>> > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > >> > > > > >> > > > > let's use M1 to denote the average
>> processing
>> > > time
>> > > > of
>> > > > > >> > > replying
>> > > > > >> > > > > >> with
>> > > > > >> > > > > >> > > such
>> > > > > >> > > > > >> > > > an
>> > > > > >> > > > > >> > > > > error message.
>> > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
>> > append
>> > > > > >> messages
>> > > > > >> > to
>> > > > > >> > > > > >> > segments,
>> > > > > >> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > >> > > > > >> > > > > let's use M2 to denote the average
>> processing
>> > > time
>> > > > > for
>> > > > > >> > such
>> > > > > >> > > > > logic.
>> > > > > >> > > > > >> > > > > Then the average extra latency incurred
>> without
>> > > > this
>> > > > > >> KIP
>> > > > > >> > is
>> > > > > >> > > N
>> > > > > >> > > > *
>> > > > > >> > > > > >> (M2 -
>> > > > > >> > > > > >> > > > M1) /
>> > > > > >> > > > > >> > > > > 2.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > In practice, M2 should always be larger than
>> > M1,
>> > > > > which
>> > > > > >> > means
>> > > > > >> > > > as
>> > > > > >> > > > > >> long
>> > > > > >> > > > > >> > > as N
>> > > > > >> > > > > >> > > > > is positive,
>> > > > > >> > > > > >> > > > > we would see improvements on the average
>> > latency.
>> > > > > >> > > > > >> > > > > There does not need to be significant
>> backlog
>> > of
>> > > > > >> requests
>> > > > > >> > in
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > request
>> > > > > >> > > > > >> > > > > queue,
>> > > > > >> > > > > >> > > > > or severe degradation of disk performance to
>> > have
>> > > > the
>> > > > > >> > > > > improvement.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > Regards,
>> > > > > >> > > > > >> > > > > Lucas
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
>> the
>> > > > > producer
>> > > > > >> > side
>> > > > > >> > > > can
>> > > > > >> > > > > >> > trigger
>> > > > > >> > > > > >> > > > > unnecessary duplicate requests
>> > > > > >> > > > > >> > > > > when the corresponding leader broker is
>> > > overloaded,
>> > > > > >> > > > exacerbating
>> > > > > >> > > > > >> the
>> > > > > >> > > > > >> > > > > situation.
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
>> > > > > >> > > lindong28@gmail.com
>> > > > > >> > > > >
>> > > > > >> > > > > >> > wrote:
>> > > > > >> > > > > >> > > > >
>> > > > > >> > > > > >> > > > > > Hey Lucas,
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Thanks much for the detailed
>> documentation of
>> > > the
>> > > > > >> > > > experiment.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Initially I also think having a separate
>> > queue
>> > > > for
>> > > > > >> > > > controller
>> > > > > >> > > > > >> > > requests
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
>> > summary
>> > > > > >> section
>> > > > > >> > of
>> > > > > >> > > > the
>> > > > > >> > > > > >> > Google
>> > > > > >> > > > > >> > > > > doc,
>> > > > > >> > > > > >> > > > > > controller requests are generally more
>> > > important
>> > > > > than
>> > > > > >> > data
>> > > > > >> > > > > >> requests
>> > > > > >> > > > > >> > > and
>> > > > > >> > > > > >> > > > > we
>> > > > > >> > > > > >> > > > > > probably want controller requests to be
>> > > processed
>> > > > > >> > sooner.
>> > > > > >> > > > But
>> > > > > >> > > > > >> then
>> > > > > >> > > > > >> > > Eno
>> > > > > >> > > > > >> > > > > has
>> > > > > >> > > > > >> > > > > > two very good questions which I am not
>> sure
>> > the
>> > > > > >> Google
>> > > > > >> > doc
>> > > > > >> > > > has
>> > > > > >> > > > > >> > > answered
>> > > > > >> > > > > >> > > > > > explicitly. Could you help with the
>> following
>> > > > > >> questions?
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
>> > > > benefit
>> > > > > of
>> > > > > >> > > > KIP-291
>> > > > > >> > > > > to
>> > > > > >> > > > > >> > > users.
>> > > > > >> > > > > >> > > > > The
>> > > > > >> > > > > >> > > > > > experiment setup in the Google doc
>> simulates
>> > > the
>> > > > > >> > scenario
>> > > > > >> > > > that
>> > > > > >> > > > > >> > broker
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
>> e.g.
>> > > > slow
>> > > > > >> disk.
>> > > > > >> > > It
>> > > > > >> > > > > >> > currently
>> > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
>> But
>> > in
>> > > > the
>> > > > > >> > common
>> > > > > >> > > > > >> scenario,
>> > > > > >> > > > > >> > > it
>> > > > > >> > > > > >> > > > is
>> > > > > >> > > > > >> > > > > > probably reasonable to assume that there
>> are
>> > > many
>> > > > > >> other
>> > > > > >> > > > > >> partitions
>> > > > > >> > > > > >> > > that
>> > > > > >> > > > > >> > > > > are
>> > > > > >> > > > > >> > > > > > also actively produced to and
>> ProduceRequest
>> > to
>> > > > > these
>> > > > > >> > > > > partition
>> > > > > >> > > > > >> > also
>> > > > > >> > > > > >> > > > > takes
>> > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
>> > > > broker0
>> > > > > >> can
>> > > > > >> > > > become
>> > > > > >> > > > > >> > > follower
>> > > > > >> > > > > >> > > > > for
>> > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
>> needs
>> > > to
>> > > > > >> process
>> > > > > >> > > the
>> > > > > >> > > > > >> > > > > ProduceRequest
>> > > > > >> > > > > >> > > > > > slowly t in the queue because these
>> > > > ProduceRequests
>> > > > > >> > cover
>> > > > > >> > > > > other
>> > > > > >> > > > > >> > > > > partitions.
>> > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
>> timeout
>> > > after
>> > > > > 30
>> > > > > >> > > seconds
>> > > > > >> > > > > and
>> > > > > >> > > > > >> > most
>> > > > > >> > > > > >> > > > > > clients will still likely timeout after 30
>> > > > seconds.
>> > > > > >> Then
>> > > > > >> > > it
>> > > > > >> > > > is
>> > > > > >> > > > > >> not
>> > > > > >> > > > > >> > > > > > obviously what is the benefit to client
>> since
>> > > > > client
>> > > > > >> > will
>> > > > > >> > > > > >> timeout
>> > > > > >> > > > > >> > > after
>> > > > > >> > > > > >> > > > > 30
>> > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
>> > > broker1,
>> > > > > >> with
>> > > > > >> > or
>> > > > > >> > > > > >> without
>> > > > > >> > > > > >> > > > > KIP-291.
>> > > > > >> > > > > >> > > > > > Did I miss something here?
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
>> specific
>> > > > > benefits
>> > > > > >> of
>> > > > > >> > > this
>> > > > > >> > > > > >> KIP to
>> > > > > >> > > > > >> > > > user
>> > > > > >> > > > > >> > > > > or
>> > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
>> KIP
>> > > > > decreases
>> > > > > >> > > > average
>> > > > > >> > > > > >> > > latency,
>> > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
>> > exception
>> > > > > >> exposed
>> > > > > >> > to
>> > > > > >> > > > > >> client
>> > > > > >> > > > > >> > > etc.
>> > > > > >> > > > > >> > > > It
>> > > > > >> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
>> experience
>> > > > only
>> > > > > >> when
>> > > > > >> > > > there
>> > > > > >> > > > > is
>> > > > > >> > > > > >> > > issue
>> > > > > >> > > > > >> > > > > with
>> > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
>> > request
>> > > > > queue
>> > > > > >> > due
>> > > > > >> > > to
>> > > > > >> > > > > >> slow
>> > > > > >> > > > > >> > > disk
>> > > > > >> > > > > >> > > > as
>> > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
>> KIP
>> > > also
>> > > > > >> useful
>> > > > > >> > > when
>> > > > > >> > > > > >> there
>> > > > > >> > > > > >> > is
>> > > > > >> > > > > >> > > > no
>> > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
>> > > helpful
>> > > > > to
>> > > > > >> > > clarify
>> > > > > >> > > > > >> this
>> > > > > >> > > > > >> > to
>> > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > Thanks much,
>> > > > > >> > > > > >> > > > > > Dong
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
>> Wang <
>> > > > > >> > > > > >> lucasatucla@gmail.com
>> > > > > >> > > > > >> > >
>> > > > > >> > > > > >> > > > > wrote:
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > > Hi Eno,
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
>> > experiment
>> > > > > >> results.
>> > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
>> > > achieved
>> > > > by
>> > > > > >> > > > > implementing
>> > > > > >> > > > > >> > the
>> > > > > >> > > > > >> > > > > > proposed
>> > > > > >> > > > > >> > > > > > > change:
>> > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
>> > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > >> > > > > >> > > > > > > Please take a look when you have time
>> and
>> > let
>> > > > me
>> > > > > >> know
>> > > > > >> > > your
>> > > > > >> > > > > >> > > feedback.
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > Regards,
>> > > > > >> > > > > >> > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha
>> <
>> > > > > >> > > kafka@harsha.io>
>> > > > > >> > > > > >> wrote:
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
>> look
>> > > > might
>> > > > > >> suit
>> > > > > >> > > our
>> > > > > >> > > > > >> > > > requirements
>> > > > > >> > > > > >> > > > > > > > better.
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > Harsha
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
>> Lucas
>> > > > Wang <
>> > > > > >> > > > > >> > > > lucasatucla@gmail.com
>> > > > > >> > > > > >> > > > > >
>> > > > > >> > > > > >> > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
>> > > replication
>> > > > > >> quota
>> > > > > >> > > > > mechanism
>> > > > > >> > > > > >> > > > proposed
>> > > > > >> > > > > >> > > > > > in
>> > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
>> scenario.
>> > > > > >> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
>> > Harsha <
>> > > > > >> > > > > kafka@harsha.io
>> > > > > >> > > > > >> >
>> > > > > >> > > > > >> > > > wrote:
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
>> > > making
>> > > > > >> this
>> > > > > >> > > > > >> configurable
>> > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
>> > > requests
>> > > > > to
>> > > > > >> be
>> > > > > >> > > > > >> > prioritized.
>> > > > > >> > > > > >> > > > For
>> > > > > >> > > > > >> > > > > > > > example
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
>> > take
>> > > > out
>> > > > > a
>> > > > > >> > > broker
>> > > > > >> > > > > and
>> > > > > >> > > > > >> > bring
>> > > > > >> > > > > >> > > > new
>> > > > > >> > > > > >> > > > > > one
>> > > > > >> > > > > >> > > > > > > > it
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > will try to become follower and
>> have
>> > > lot
>> > > > of
>> > > > > >> > fetch
>> > > > > >> > > > > >> requests
>> > > > > >> > > > > >> > to
>> > > > > >> > > > > >> > > > > other
>> > > > > >> > > > > >> > > > > > > > > leaders
>> > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
>> > > effect
>> > > > > the
>> > > > > >> > > > > >> > > application/client
>> > > > > >> > > > > >> > > > > > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
>> > > > solution
>> > > > > to
>> > > > > >> > > > > >> de-prioritize
>> > > > > >> > > > > >> > > if
>> > > > > >> > > > > >> > > > a
>> > > > > >> > > > > >> > > > > > new
>> > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
>> requests,
>> > we
>> > > > are
>> > > > > >> ok
>> > > > > >> > > with
>> > > > > >> > > > > the
>> > > > > >> > > > > >> > > replica
>> > > > > >> > > > > >> > > > > to
>> > > > > >> > > > > >> > > > > > be
>> > > > > >> > > > > >> > > > > > > > > > taking time but the leaders should
>> > > > > prioritize
>> > > > > >> > the
>> > > > > >> > > > > client
>> > > > > >> > > > > >> > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > Thanks,
>> > > > > >> > > > > >> > > > > > > > > > Harsha
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
>> > > Lucas
>> > > > > Wang
>> > > > > >> > > wrote:
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
>> > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
>> feature
>> > > > yet,
>> > > > > >> so no
>> > > > > >> > > > > >> > experimental
>> > > > > >> > > > > >> > > > > > results
>> > > > > >> > > > > >> > > > > > > > so
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > far.
>> > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
>> > > > > following
>> > > > > >> > days.
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
>> the
>> > > > > >> priority
>> > > > > >> > > queue
>> > > > > >> > > > > >> does
>> > > > > >> > > > > >> > not
>> > > > > >> > > > > >> > > > > > > > completely
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > prevent
>> > > > > >> > > > > >> > > > > > > > > > > data requests being processed
>> ahead
>> > > of
>> > > > > >> > > controller
>> > > > > >> > > > > >> > requests.
>> > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
>> > > greatly
>> > > > > >> > mitigate
>> > > > > >> > > > the
>> > > > > >> > > > > >> > effect
>> > > > > >> > > > > >> > > > of
>> > > > > >> > > > > >> > > > > > > stable
>> > > > > >> > > > > >> > > > > > > > > > > metadata.
>> > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and
>> > post
>> > > > the
>> > > > > >> > > results
>> > > > > >> > > > > >> when I
>> > > > > >> > > > > >> > > have
>> > > > > >> > > > > >> > > > > it.
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > Regards,
>> > > > > >> > > > > >> > > > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM,
>> > Eno
>> > > > > >> Thereska
>> > > > > >> > <
>> > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
>> > > > > >> > > > > >> > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had
>> a
>> > > look
>> > > > at
>> > > > > >> > this.
>> > > > > >> > > A
>> > > > > >> > > > > >> couple
>> > > > > >> > > > > >> > of
>> > > > > >> > > > > >> > > > > > > > questions:
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
>> > > change
>> > > > > >> after
>> > > > > >> > > > > >> implementing
>> > > > > >> > > > > >> > > > this
>> > > > > >> > > > > >> > > > > > KIP?
>> > > > > >> > > > > >> > > > > > > > > I'm
>> > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
>> > > > experimental
>> > > > > >> > results
>> > > > > >> > > > > that
>> > > > > >> > > > > >> > show
>> > > > > >> > > > > >> > > > the
>> > > > > >> > > > > >> > > > > > > > benefit
>> > > > > >> > > > > >> > > > > > > > > of
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > two queues.
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
>> > > sufficient
>> > > > in
>> > > > > >> > > > addressing
>> > > > > >> > > > > >> the
>> > > > > >> > > > > >> > > > > problem
>> > > > > >> > > > > >> > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > KIP
>> > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
>> > > > queues,
>> > > > > >> you
>> > > > > >> > > will
>> > > > > >> > > > > >> > sometimes
>> > > > > >> > > > > >> > > > > > > (often?)
>> > > > > >> > > > > >> > > > > > > > > have
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
>> > will
>> > > be
>> > > > > >> ahead
>> > > > > >> > of
>> > > > > >> > > > the
>> > > > > >> > > > > >> > > control
>> > > > > >> > > > > >> > > > > > plane
>> > > > > >> > > > > >> > > > > > > > > > > requests.
>> > > > > >> > > > > >> > > > > > > > > > > > This happens because the
>> system
>> > > might
>> > > > > >> have
>> > > > > >> > > > already
>> > > > > >> > > > > >> > > started
>> > > > > >> > > > > >> > > > > > > > > processing
>> > > > > >> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > >> > > > > > > > > > > > data plane requests before the
>> > > > control
>> > > > > >> plane
>> > > > > >> > > > ones
>> > > > > >> > > > > >> > > arrived.
>> > > > > >> > > > > >> > > > So
>> > > > > >> > > > > >> > > > > > it
>> > > > > >> > > > > >> > > > > > > > > would
>> > > > > >> > > > > >> > > > > > > > > > > be
>> > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
>> > problem
>> > > > this
>> > > > > >> KIP
>> > > > > >> > > > > >> addresses.
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > Thanks
>> > > > > >> > > > > >> > > > > > > > > > > > Eno
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
>> PM,
>> > > Ted
>> > > > > Yu <
>> > > > > >> > > > > >> > > > > yuzhihong@gmail.com
>> > > > > >> > > > > >> > > > > > >
>> > > > > >> > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > Thanks
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42
>> > AM,
>> > > > > Lucas
>> > > > > >> > Wang
>> > > > > >> > > <
>> > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion.
>> > I've
>> > > > > >> updated
>> > > > > >> > > the
>> > > > > >> > > > > KIP.
>> > > > > >> > > > > >> > > Please
>> > > > > >> > > > > >> > > > > > take
>> > > > > >> > > > > >> > > > > > > > > > another
>> > > > > >> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > look.
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
>> 6:34
>> > > PM,
>> > > > > Ted
>> > > > > >> Yu
>> > > > > >> > <
>> > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
>> > > KafkaConfig.scala
>> > > > :
>> > > > > >> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
>> 500
>> > > > > >> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
>> can
>> > > > > include
>> > > > > >> > the
>> > > > > >> > > > > >> default
>> > > > > >> > > > > >> > > value
>> > > > > >> > > > > >> > > > > for
>> > > > > >> > > > > >> > > > > > > > this
>> > > > > >> > > > > >> > > > > > > > >
>> > > > > >>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

Let me present a more specific sequence of events to show why out-of-order
processing can happen. Can you see if it makes sense?

Say broker has 12 request handler threads and there are many large
ProduceRequest in the queue. Each ProduceRequest takes 200 ms to be
processed. Let's also say R1 and R2 are small request and each request
takes 10 ms to be processed. Here is the sequence of events:

- Controller sends R1_a to broker and got disconnected. Then the controller
reconnects and re-sends R1_b immediately.
- One handler thread in the broker becomes free and takes an R1_b from the
request queue, processes it and sends the response for R1_B to the broker.
Right after the broker sends the response it encounters a long GC which
pauses all the handler threads.
- Controller receives the response for R1_b and sends R2.
- The broker finishes the GC. At this moment there are R1_a and R2 in the
queue and R2 is at the front of the queue. So the requests can be processed
in the order R1_b, R2, R1_a.

Thanks,
Dong


On Wed, Jul 18, 2018 at 10:50 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Dong,
>
> Sure. Regarding the 2nd case you mentioned
> "- If the controller has not received response for R1 before it is
> disconnected, it will re-send R1 followed by R2 after it is re-connected to
> the broker."
>
> with the max inflight request set to 1, after the connection is
> re-established, the controller won't send R2
> before it gets a response for R1, right? Plus the controller is using

blocking calls for each request, i.e.
> NetworkClientUtils.sendAndReceive, with infinite retries for each request
> within the same instance of RequestSendThread.
> So within the same instance of RequestSendThread, sending out multiple
> different requests seems impossible.
>
> However, based on the comments in the source code, it seems multiple
> requests can happen if
> the broker loses its zk session, and then reconnects with zookeeper,
> multiple generations of RequestSendThreads can trigger multiple different
> requests.
> In that case, we cannot prevent out-of-order processing even with the queue
> since those multiple requests are from different connections.
> Broker generations can help in those cases, but I won't dive into that
> discussion.
> Is that right?
>
> Lucas
>
> On Wed, Jul 18, 2018 at 9:08 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Lucas,
> >
> > I think for now we can probably discuss based on the existing Kafka's
> > design where controller to a broker is hard coded to be 1. It looks like
> > Becket has provided a good example in which requests from the same
> > controller can be processed out of order.
> >
> > Thanks,
> > Dong
> >
> > On Wed, Jul 18, 2018 at 8:35 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > @Becket and Dong,
> > > I think currently the ordering guarantee is achieved because
> > > the max inflight request from the controller to a broker is hard coded
> to
> > > be 1.
> > >
> > > If let's hypothetically say the max inflight requests is > 1, then I
> > think
> > > Dong
> > > is right to say that even the separate queue cannot guarantee ordered
> > > processing,
> > > For example, Req1 and Req2 are sent to a broker, and after a connection
> > > reconnection,
> > > both requests are sent again, causing the broker to have 4 requests in
> > the
> > > following order
> > > Req2 > Req1 > Req2 > Req1.
> > >
> > > In summary, it seems using the dequeue should not cause problems with
> > > out-of-order processing.
> > > Is that right?
> > >
> > > Lucas
> > >
> > > On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hey Becket,
> > > >
> > > > It seems that the requests from the old controller will be discarded
> > due
> > > to
> > > > old controller epoch. It is not clear whether this is a problem.
> > > >
> > > > And if this out-of-order processing of controller requests is a
> > problem,
> > > it
> > > > seems like an existing problem which also applies to the multi-queue
> > > based
> > > > design. So it is probably not a concern specific to the use of deque.
> > > Does
> > > > that sound reasonable?
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > >
> > > > On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Mayuresh/Joel,
> > > > >
> > > > > Using the request channel as a dequeue was bright up some time ago
> > when
> > > > we
> > > > > initially thinking of prioritizing the request. The concern was
> that
> > > the
> > > > > controller requests are supposed to be processed in order. If we
> can
> > > > ensure
> > > > > that there is one controller request in the request channel, the
> > order
> > > is
> > > > > not a concern. But in cases that there are more than one controller
> > > > request
> > > > > inserted into the queue, the controller request order may change
> and
> > > > cause
> > > > > problem. For example, think about the following sequence:
> > > > > 1. Controller successfully sent a request R1 to broker
> > > > > 2. Broker receives R1 and put the request to the head of the
> request
> > > > queue.
> > > > > 3. Controller to broker connection failed and the controller
> > > reconnected
> > > > to
> > > > > the broker.
> > > > > 4. Controller sends a request R2 to the broker
> > > > > 5. Broker receives R2 and add it to the head of the request queue.
> > > > > Now on the broker side, R2 will be processed before R1 is
> processed,
> > > > which
> > > > > may cause problem.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> > > wrote:
> > > > >
> > > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> > > invasive
> > > > > > alternative and it should work. Jun/Becket/others, do you see any
> > > > > pitfalls
> > > > > > with this approach?
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > @Mayuresh,
> > > > > > > That's a very interesting idea that I haven't thought before.
> > > > > > > It seems to solve our problem at hand pretty well, and also
> > > > > > > avoids the need to have a new size metric and capacity config
> > > > > > > for the controller request queue. In fact, if we were to adopt
> > > > > > > this design, there is no public interface change, and we
> > > > > > > probably don't need a KIP.
> > > > > > > Also implementation wise, it seems
> > > > > > > the java class LinkedBlockingQueue can readily satisfy the
> > > > requirement
> > > > > > > by supporting a capacity, and also allowing inserting at both
> > ends.
> > > > > > >
> > > > > > > My only concern is that this design is tied to the coincidence
> > that
> > > > > > > we have two request priorities and there are two ends to a
> deque.
> > > > > > > Hence by using the proposed design, it seems the network layer
> is
> > > > > > > more tightly coupled with upper layer logic, e.g. if we were to
> > add
> > > > > > > an extra priority level in the future for some reason, we would
> > > > > probably
> > > > > > > need to go back to the design of separate queues, one for each
> > > > priority
> > > > > > > level.
> > > > > > >
> > > > > > > In summary, I'm ok with both designs and lean toward your
> > suggested
> > > > > > > approach.
> > > > > > > Let's hear what others think.
> > > > > > >
> > > > > > > @Becket,
> > > > > > > In light of Mayuresh's suggested new design, I'm answering your
> > > > > question
> > > > > > > only in the context
> > > > > > > of the current KIP design: I think your suggestion makes sense,
> > and
> > > > I'm
> > > > > > ok
> > > > > > > with removing the capacity config and
> > > > > > > just relying on the default value of 20 being sufficient
> enough.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lucas
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > > > gharatmayuresh15@gmail.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Lucas,
> > > > > > > >
> > > > > > > > Seems like the main intent here is to prioritize the
> controller
> > > > > request
> > > > > > > > over any other requests.
> > > > > > > > In that case, we can change the request queue to a dequeue,
> > where
> > > > you
> > > > > > > > always insert the normal requests (produce, consume,..etc) to
> > the
> > > > end
> > > > > > of
> > > > > > > > the dequeue, but if its a controller request, you insert it
> to
> > > the
> > > > > head
> > > > > > > of
> > > > > > > > the queue. This ensures that the controller request will be
> > given
> > > > > > higher
> > > > > > > > priority over other requests.
> > > > > > > >
> > > > > > > > Also since we only read one request from the socket and mute
> it
> > > and
> > > > > > only
> > > > > > > > unmute it after handling the request, this would ensure that
> we
> > > > don't
> > > > > > > > handle controller requests out of order.
> > > > > > > >
> > > > > > > > With this approach we can avoid the second queue and the
> > > additional
> > > > > > > config
> > > > > > > > for the size of the queue.
> > > > > > > >
> > > > > > > > What do you think ?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Mayuresh
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Joel,
> > > > > > > > >
> > > > > > > > > Thank for the detail explanation. I agree the current
> design
> > > > makes
> > > > > > > sense.
> > > > > > > > > My confusion is about whether the new config for the
> > controller
> > > > > queue
> > > > > > > > > capacity is necessary. I cannot think of a case in which
> > users
> > > > > would
> > > > > > > > change
> > > > > > > > > it.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Lucas,
> > > > > > > > > >
> > > > > > > > > > I guess my question can be rephrased to "do we expect
> user
> > to
> > > > > ever
> > > > > > > > change
> > > > > > > > > > the controller request queue capacity"? If we agree that
> 20
> > > is
> > > > > > > already
> > > > > > > > a
> > > > > > > > > > very generous default number and we do not expect user to
> > > > change
> > > > > > it,
> > > > > > > is
> > > > > > > > > it
> > > > > > > > > > still necessary to expose this as a config?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> @Becket
> > > > > > > > > >> 1. Thanks for the comment. You are right that normally
> > there
> > > > > > should
> > > > > > > be
> > > > > > > > > >> just
> > > > > > > > > >> one controller request because of muting,
> > > > > > > > > >> and I had NOT intended to say there would be many
> enqueued
> > > > > > > controller
> > > > > > > > > >> requests.
> > > > > > > > > >> I went through the KIP again, and I'm not sure which
> part
> > > > > conveys
> > > > > > > that
> > > > > > > > > >> info.
> > > > > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > > > > >>
> > > > > > > > > >> 2. Though it should not happen in normal conditions, the
> > > > current
> > > > > > > > design
> > > > > > > > > >> does not preclude multiple controllers running
> > > > > > > > > >> at the same time, hence if we don't have the controller
> > > queue
> > > > > > > capacity
> > > > > > > > > >> config and simply make its capacity to be 1,
> > > > > > > > > >> network threads handling requests from different
> > controllers
> > > > > will
> > > > > > be
> > > > > > > > > >> blocked during those troublesome times,
> > > > > > > > > >> which is probably not what we want. On the other hand,
> > > adding
> > > > > the
> > > > > > > > extra
> > > > > > > > > >> config with a default value, say 20, guards us from
> issues
> > > in
> > > > > > those
> > > > > > > > > >> troublesome times, and IMO there isn't much downside of
> > > adding
> > > > > the
> > > > > > > > extra
> > > > > > > > > >> config.
> > > > > > > > > >>
> > > > > > > > > >> @Mayuresh
> > > > > > > > > >> Good catch, this sentence is an obsolete statement based
> > on
> > > a
> > > > > > > previous
> > > > > > > > > >> design. I've revised the wording in the KIP.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >> Lucas
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Lucas,
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks for the KIP.
> > > > > > > > > >> > I am trying to understand why you think "The memory
> > > > > consumption
> > > > > > > can
> > > > > > > > > rise
> > > > > > > > > >> > given the total number of queued requests can go up to
> > 2x"
> > > > in
> > > > > > the
> > > > > > > > > impact
> > > > > > > > > >> > section. Normally the requests from controller to a
> > Broker
> > > > are
> > > > > > not
> > > > > > > > > high
> > > > > > > > > >> > volume, right ?
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks,
> > > > > > > > > >> >
> > > > > > > > > >> > Mayuresh
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > > becket.qin@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> > plane
> > > > from
> > > > > > the
> > > > > > > > > data
> > > > > > > > > >> > plane
> > > > > > > > > >> > > makes a lot of sense.
> > > > > > > > > >> > >
> > > > > > > > > >> > > In the KIP you mentioned that the controller request
> > > queue
> > > > > may
> > > > > > > > have
> > > > > > > > > >> many
> > > > > > > > > >> > > requests in it. Will this be a common case? The
> > > controller
> > > > > > > > requests
> > > > > > > > > >> still
> > > > > > > > > >> > > goes through the SocketServer. The SocketServer will
> > > mute
> > > > > the
> > > > > > > > > channel
> > > > > > > > > >> > once
> > > > > > > > > >> > > a request is read and put into the request channel.
> So
> > > > > > assuming
> > > > > > > > > there
> > > > > > > > > >> is
> > > > > > > > > >> > > only one connection between controller and each
> > broker,
> > > on
> > > > > the
> > > > > > > > > broker
> > > > > > > > > >> > side,
> > > > > > > > > >> > > there should be only one controller request in the
> > > > > controller
> > > > > > > > > request
> > > > > > > > > >> > queue
> > > > > > > > > >> > > at any given time. If that is the case, do we need a
> > > > > separate
> > > > > > > > > >> controller
> > > > > > > > > >> > > request queue capacity config? The default value 20
> > > means
> > > > > that
> > > > > > > we
> > > > > > > > > >> expect
> > > > > > > > > >> > > there are 20 controller switches to happen in a
> short
> > > > period
> > > > > > of
> > > > > > > > > time.
> > > > > > > > > >> I
> > > > > > > > > >> > am
> > > > > > > > > >> > > not sure whether someone should increase the
> > controller
> > > > > > request
> > > > > > > > > queue
> > > > > > > > > >> > > capacity to handle such case, as it seems indicating
> > > > > something
> > > > > > > > very
> > > > > > > > > >> wrong
> > > > > > > > > >> > > has happened.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks,
> > > > > > > > > >> > >
> > > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > > lindong28@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Thanks for the update Lucas.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > I think the motivation section is intuitive. It
> will
> > > be
> > > > > good
> > > > > > > to
> > > > > > > > > >> learn
> > > > > > > > > >> > > more
> > > > > > > > > >> > > > about the comments from other reviewers.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Hi Dong,
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > I've updated the motivation section of the KIP
> by
> > > > > > explaining
> > > > > > > > the
> > > > > > > > > >> > cases
> > > > > > > > > >> > > > that
> > > > > > > > > >> > > > > would have user impacts.
> > > > > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Thanks,
> > > > > > > > > >> > > > > Lucas
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com
> > > > > > > > > >> >
> > > > > > > > > >> > > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Hi Dong,
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > The simulation of disk being slow is merely
> for
> > me
> > > > to
> > > > > > > easily
> > > > > > > > > >> > > construct
> > > > > > > > > >> > > > a
> > > > > > > > > >> > > > > > testing scenario
> > > > > > > > > >> > > > > > with a backlog of produce requests. In
> > production,
> > > > > other
> > > > > > > > than
> > > > > > > > > >> the
> > > > > > > > > >> > > disk
> > > > > > > > > >> > > > > > being slow, a backlog of
> > > > > > > > > >> > > > > > produce requests may also be caused by high
> > > produce
> > > > > QPS.
> > > > > > > > > >> > > > > > In that case, we may not want to kill the
> broker
> > > and
> > > > > > > that's
> > > > > > > > > when
> > > > > > > > > >> > this
> > > > > > > > > >> > > > KIP
> > > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > > > > >> > > > > > and non-JBOD setup.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Going back to your previous question about
> each
> > > > > > > > ProduceRequest
> > > > > > > > > >> > > covering
> > > > > > > > > >> > > > > 20
> > > > > > > > > >> > > > > > partitions that are randomly
> > > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request
> is
> > > > > > enqueued
> > > > > > > > that
> > > > > > > > > >> > tries
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > > > switch the current broker, say broker0, from
> > > leader
> > > > to
> > > > > > > > > follower
> > > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For
> > the
> > > > > sake
> > > > > > of
> > > > > > > > > >> > argument,
> > > > > > > > > >> > > > > > let's also assume the other brokers, say
> > broker1,
> > > > have
> > > > > > > > > *stopped*
> > > > > > > > > >> > > > fetching
> > > > > > > > > >> > > > > > from
> > > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks
> =
> > > -1
> > > > > > (ALL)
> > > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> > ahead
> > > of
> > > > > > > > > >> LeaderAndISR
> > > > > > > > > >> > > will
> > > > > > > > > >> > > > be
> > > > > > > > > >> > > > > > put into the purgatory,
> > > > > > > > > >> > > > > >         and since they'll never be replicated
> to
> > > > other
> > > > > > > > brokers
> > > > > > > > > >> > > (because
> > > > > > > > > >> > > > > of
> > > > > > > > > >> > > > > > the assumption made above), they will
> > > > > > > > > >> > > > > >         be completed either when the
> > LeaderAndISR
> > > > > > request
> > > > > > > is
> > > > > > > > > >> > > processed
> > > > > > > > > >> > > > or
> > > > > > > > > >> > > > > > when the timeout happens.
> > > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > > > > transition
> > > > > > > the
> > > > > > > > > >> > > partition
> > > > > > > > > >> > > > > > test-0 to become a follower,
> > > > > > > > > >> > > > > >         after the current broker sees the
> > > > replication
> > > > > of
> > > > > > > the
> > > > > > > > > >> > > remaining
> > > > > > > > > >> > > > 19
> > > > > > > > > >> > > > > > partitions, it can send a response indicating
> > that
> > > > > > > > > >> > > > > >         it's no longer the leader for the
> > > "test-0".
> > > > > > > > > >> > > > > >   To see the latency difference between 1.1
> and
> > > 1.2,
> > > > > > let's
> > > > > > > > say
> > > > > > > > > >> > there
> > > > > > > > > >> > > > are
> > > > > > > > > >> > > > > > 24K produce requests ahead of the
> LeaderAndISR,
> > > and
> > > > > > there
> > > > > > > > are
> > > > > > > > > 8
> > > > > > > > > >> io
> > > > > > > > > >> > > > > threads,
> > > > > > > > > >> > > > > >   so each io thread will process approximately
> > > 3000
> > > > > > > produce
> > > > > > > > > >> > requests.
> > > > > > > > > >> > > > Now
> > > > > > > > > >> > > > > > let's investigate the io thread that finally
> > > > processed
> > > > > > the
> > > > > > > > > >> > > > LeaderAndISR.
> > > > > > > > > >> > > > > >   For the 3000 produce requests, if we model
> the
> > > > time
> > > > > > when
> > > > > > > > > their
> > > > > > > > > >> > > > > remaining
> > > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999,
> and
> > > the
> > > > > > > > > LeaderAndISR
> > > > > > > > > >> > > > request
> > > > > > > > > >> > > > > is
> > > > > > > > > >> > > > > > processed at time t3000.
> > > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> > would
> > > > have
> > > > > > > > waited
> > > > > > > > > an
> > > > > > > > > >> > > extra
> > > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
> > extra
> > > > > time
> > > > > > of
> > > > > > > > > >> t3000 -
> > > > > > > > > >> > > t1,
> > > > > > > > > >> > > > > etc.
> > > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> > > bigger
> > > > > for
> > > > > > > the
> > > > > > > > > >> > earlier
> > > > > > > > > >> > > > > > produce requests than for the later ones. For
> > the
> > > > same
> > > > > > > > reason,
> > > > > > > > > >> the
> > > > > > > > > >> > > more
> > > > > > > > > >> > > > > > ProduceRequests queued
> > > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit
> we
> > > get
> > > > > > > (capped
> > > > > > > > > by
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > produce timeout).
> > > > > > > > > >> > > > > > 2. If the enqueued produce requests have
> acks=0
> > or
> > > > > > acks=1
> > > > > > > > > >> > > > > >   There will be no latency differences in this
> > > case,
> > > > > but
> > > > > > > > > >> > > > > >   2.1 without this KIP, the records of
> partition
> > > > > test-0
> > > > > > in
> > > > > > > > the
> > > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will
> > be
> > > > > > appended
> > > > > > > > to
> > > > > > > > > >> the
> > > > > > > > > >> > > local
> > > > > > > > > >> > > > > log,
> > > > > > > > > >> > > > > >         and eventually be truncated after
> > > processing
> > > > > the
> > > > > > > > > >> > > LeaderAndISR.
> > > > > > > > > >> > > > > > This is what's referred to as
> > > > > > > > > >> > > > > >         "some unofficial definition of data
> loss
> > > in
> > > > > > terms
> > > > > > > of
> > > > > > > > > >> > messages
> > > > > > > > > >> > > > > > beyond the high watermark".
> > > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the
> effect
> > > > since
> > > > > if
> > > > > > > the
> > > > > > > > > >> > > > LeaderAndISR
> > > > > > > > > >> > > > > > is immediately processed, the response to
> > > producers
> > > > > will
> > > > > > > > have
> > > > > > > > > >> > > > > >         the NotLeaderForPartition error,
> causing
> > > > > > producers
> > > > > > > > to
> > > > > > > > > >> retry
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > This explanation above is the benefit for
> > reducing
> > > > the
> > > > > > > > latency
> > > > > > > > > >> of a
> > > > > > > > > >> > > > > broker
> > > > > > > > > >> > > > > > becoming the follower,
> > > > > > > > > >> > > > > > closely related is reducing the latency of a
> > > broker
> > > > > > > becoming
> > > > > > > > > the
> > > > > > > > > >> > > > leader.
> > > > > > > > > >> > > > > > In this case, the benefit is even more
> obvious,
> > if
> > > > > other
> > > > > > > > > brokers
> > > > > > > > > >> > have
> > > > > > > > > >> > > > > > resigned leadership, and the
> > > > > > > > > >> > > > > > current broker should take leadership. Any
> delay
> > > in
> > > > > > > > processing
> > > > > > > > > >> the
> > > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > > > > >> > > > > > by clients as unavailability. In extreme
> cases,
> > > this
> > > > > can
> > > > > > > > cause
> > > > > > > > > >> > failed
> > > > > > > > > >> > > > > > produce requests if the retries are
> > > > > > > > > >> > > > > > exhausted.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Another two types of controller requests are
> > > > > > > UpdateMetadata
> > > > > > > > > and
> > > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> > > follows:
> > > > > > > > > >> > > > > > For UpdateMetadata requests, delayed
> processing
> > > > means
> > > > > > > > clients
> > > > > > > > > >> > > receiving
> > > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
> > > info
> > > > > > > > > >> > > > > > for certain partitions, and the effect is more
> > > > retries
> > > > > > or
> > > > > > > > even
> > > > > > > > > >> > fatal
> > > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
> > may
> > > > > > degrade
> > > > > > > > the
> > > > > > > > > >> > > > performance
> > > > > > > > > >> > > > > > of topic deletion.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > > > > >> > DescribeLogDirsRequest,
> > > > > > > > > >> > > > you
> > > > > > > > > >> > > > > > are right
> > > > > > > > > >> > > > > > that this KIP cannot help with the latency in
> > > > getting
> > > > > > the
> > > > > > > > log
> > > > > > > > > >> dirs
> > > > > > > > > >> > > > info,
> > > > > > > > > >> > > > > > and it's only relevant
> > > > > > > > > >> > > > > > when controller requests are involved.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Regards,
> > > > > > > > > >> > > > > > Lucas
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > > > > lindong28@gmail.com
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >> Hey Jun,
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Thanks much for the comments. It is good
> point.
> > > So
> > > > > the
> > > > > > > > > feature
> > > > > > > > > >> may
> > > > > > > > > >> > > be
> > > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> > > > below.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Hey Lucas,
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Do you think this feature is also useful for
> > > > non-JBOD
> > > > > > > setup
> > > > > > > > > or
> > > > > > > > > >> it
> > > > > > > > > >> > is
> > > > > > > > > >> > > > > only
> > > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful
> to
> > > > > > understand
> > > > > > > > > this.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order
> > to
> > > > move
> > > > > > > > leaders
> > > > > > > > > >> on
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > >> failed
> > > > > > > > > >> > > > > >> disk to other disks, the system operator
> first
> > > > needs
> > > > > to
> > > > > > > get
> > > > > > > > > the
> > > > > > > > > >> > list
> > > > > > > > > >> > > > of
> > > > > > > > > >> > > > > >> partitions on the failed disk. This is
> > currently
> > > > > > achieved
> > > > > > > > > using
> > > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > > > > >> DescribeLogDirsRequest
> > > > > > > > > >> > to
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> broker. If we only prioritize the controller
> > > > > requests,
> > > > > > > then
> > > > > > > > > the
> > > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > > > > >> > > > > >> may still take a long time to be processed by
> > the
> > > > > > broker.
> > > > > > > > So
> > > > > > > > > >> the
> > > > > > > > > >> > > > overall
> > > > > > > > > >> > > > > >> time to move leaders away from the failed
> disk
> > > may
> > > > > > still
> > > > > > > be
> > > > > > > > > >> long
> > > > > > > > > >> > > even
> > > > > > > > > >> > > > > with
> > > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Thanks,
> > > > > > > > > >> > > > > >> Dong
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > > > > >> lucasatucla@gmail.com
> > > > > > > > > >> > >
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > > > > >> > > > > >> >
> > > > > > > > > >> > > > > >> > @Dong,
> > > > > > > > > >> > > > > >> > Since both of the two comments in your
> > previous
> > > > > email
> > > > > > > are
> > > > > > > > > >> about
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > >> > benefits of this KIP and whether it's
> useful,
> > > > > > > > > >> > > > > >> > in light of Jun's last comment, do you
> agree
> > > that
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > >> can
> > > > > > > > > >> > be
> > > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > > > > >> > > > > >> >
> > > > > > > > > >> > > > > >> > Regards,
> > > > > > > > > >> > > > > >> > Lucas
> > > > > > > > > >> > > > > >> >
> > > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > > > > jun@confluent.io>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > > > > >> >
> > > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> > > probably
> > > > > > > should
> > > > > > > > > just
> > > > > > > > > >> > kill
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> > help.
> > > If
> > > > > > only
> > > > > > > > one
> > > > > > > > > of
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > disks
> > > > > > > > > >> > > > > >> on
> > > > > > > > > >> > > > > >> > a
> > > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
> > > disk
> > > > > and
> > > > > > > move
> > > > > > > > > the
> > > > > > > > > >> > > > leaders
> > > > > > > > > >> > > > > on
> > > > > > > > > >> > > > > >> > that
> > > > > > > > > >> > > > > >> > > disk to other brokers. In that case,
> being
> > > able
> > > > > to
> > > > > > > > > process
> > > > > > > > > >> the
> > > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > > > > >> > > > > >> > > requests faster will potentially help the
> > > > > producers
> > > > > > > > > recover
> > > > > > > > > >> > > > quicker.
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > Thanks,
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > Jun
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin
> <
> > > > > > > > > >> lindong28@gmail.com
> > > > > > > > > >> > >
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > > > questions
> > > > > > > below.
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> > covers
> > > 20
> > > > > > > > > partitions
> > > > > > > > > >> > that
> > > > > > > > > >> > > > are
> > > > > > > > > >> > > > > >> > > randomly
> > > > > > > > > >> > > > > >> > > > distributed across all partitions, then
> > > each
> > > > > > > > > >> ProduceRequest
> > > > > > > > > >> > > will
> > > > > > > > > >> > > > > >> likely
> > > > > > > > > >> > > > > >> > > > cover some partitions for which the
> > broker
> > > is
> > > > > > still
> > > > > > > > > >> leader
> > > > > > > > > >> > > after
> > > > > > > > > >> > > > > it
> > > > > > > > > >> > > > > >> > > quickly
> > > > > > > > > >> > > > > >> > > > processes the
> > > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
> > still
> > > > be
> > > > > > slow
> > > > > > > > in
> > > > > > > > > >> > > > processing
> > > > > > > > > >> > > > > >> these
> > > > > > > > > >> > > > > >> > > > ProduceRequest and request will still
> be
> > > very
> > > > > > high
> > > > > > > > with
> > > > > > > > > >> this
> > > > > > > > > >> > > > KIP.
> > > > > > > > > >> > > > > It
> > > > > > > > > >> > > > > >> > > seems
> > > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> > timeout
> > > > > after
> > > > > > > 30
> > > > > > > > > >> > seconds.
> > > > > > > > > >> > > Is
> > > > > > > > > >> > > > > >> this
> > > > > > > > > >> > > > > >> > > > understanding correct?
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest
> will
> > > > still
> > > > > > > > timeout
> > > > > > > > > >> after
> > > > > > > > > >> > > 30
> > > > > > > > > >> > > > > >> > seconds,
> > > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> > reduces
> > > > > > average
> > > > > > > > > >> produce
> > > > > > > > > >> > > > > latency.
> > > > > > > > > >> > > > > >> Can
> > > > > > > > > >> > > > > >> > > you
> > > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
> > > this
> > > > > KIP?
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > Not sure why system operator directly
> > cares
> > > > > > number
> > > > > > > of
> > > > > > > > > >> > > truncated
> > > > > > > > > >> > > > > >> > messages.
> > > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve
> average
> > > > > > throughput
> > > > > > > > or
> > > > > > > > > >> > reduce
> > > > > > > > > >> > > > > >> message
> > > > > > > > > >> > > > > >> > > > duplication? It will be good to
> > understand
> > > > > this.
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > Thanks,
> > > > > > > > > >> > > > > >> > > > Dong
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas
> Wang
> > <
> > > > > > > > > >> > > lucasatucla@gmail.com
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > >> > wrote:
> > > > > > > > > >> > > > > >> > > >
> > > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> > Please
> > > > see
> > > > > > my
> > > > > > > > > reply
> > > > > > > > > >> > > below.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> > > partition.
> > > > > Now
> > > > > > > > let's
> > > > > > > > > >> > > consider
> > > > > > > > > >> > > > a
> > > > > > > > > >> > > > > >> more
> > > > > > > > > >> > > > > >> > > > common
> > > > > > > > > >> > > > > >> > > > > scenario
> > > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > > > > partitions.
> > > > > > > And
> > > > > > > > > >> let's
> > > > > > > > > >> > > say
> > > > > > > > > >> > > > > for
> > > > > > > > > >> > > > > >> > some
> > > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> > > broker0
> > > > is
> > > > > > so
> > > > > > > > > large,
> > > > > > > > > >> > say
> > > > > > > > > >> > > > 10K,
> > > > > > > > > >> > > > > >> that
> > > > > > > > > >> > > > > >> > > the
> > > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > > > > >> > > > > >> > > > > and the operator would like to shift
> > the
> > > > > > > leadership
> > > > > > > > > >> for a
> > > > > > > > > >> > > lot
> > > > > > > > > >> > > > of
> > > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > > > > >> > > > > >> > > > > either manually or through some
> service
> > > > like
> > > > > > > cruise
> > > > > > > > > >> > control.
> > > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> > > leadership
> > > > > > > > > transitions
> > > > > > > > > >> > > finish
> > > > > > > > > >> > > > > >> more
> > > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> > > > becoming
> > > > > > more
> > > > > > > > > >> > balanced,
> > > > > > > > > >> > > > > >> > > > > but all existing producers
> > corresponding
> > > to
> > > > > the
> > > > > > > 9K
> > > > > > > > > >> > > partitions
> > > > > > > > > >> > > > > will
> > > > > > > > > >> > > > > >> > get
> > > > > > > > > >> > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> > > > thanks
> > > > > to
> > > > > > > the
> > > > > > > > > >> > batched
> > > > > > > > > >> > > > > async
> > > > > > > > > >> > > > > >> ZK
> > > > > > > > > >> > > > > >> > > > > operations.
> > > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> > > during
> > > > > such
> > > > > > > > > >> > troublesome
> > > > > > > > > >> > > > > times.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
> > have
> > > > > shown
> > > > > > > > that
> > > > > > > > > >> with
> > > > > > > > > >> > > this
> > > > > > > > > >> > > > > KIP
> > > > > > > > > >> > > > > >> > many
> > > > > > > > > >> > > > > >> > > > > producers
> > > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > > > NotLeaderForPartition,
> > > > > > > > > based
> > > > > > > > > >> on
> > > > > > > > > >> > > > which
> > > > > > > > > >> > > > > >> they
> > > > > > > > > >> > > > > >> > > > retry
> > > > > > > > > >> > > > > >> > > > > immediately.
> > > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> > seconds+quick
> > > > > retry)
> > > > > > > for
> > > > > > > > > >> their
> > > > > > > > > >> > > > single
> > > > > > > > > >> > > > > >> > > message
> > > > > > > > > >> > > > > >> > > > is
> > > > > > > > > >> > > > > >> > > > > much smaller
> > > > > > > > > >> > > > > >> > > > > compared with the case of timing out
> > > > without
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > >> (30
> > > > > > > > > >> > > > seconds
> > > > > > > > > >> > > > > >> for
> > > > > > > > > >> > > > > >> > > > timing
> > > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> > timing
> > > > out
> > > > > on
> > > > > > > the
> > > > > > > > > >> > producer
> > > > > > > > > >> > > > > side
> > > > > > > > > >> > > > > >> can
> > > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > > > > drawbacks[1].
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show
> > the
> > > > > > number
> > > > > > > of
> > > > > > > > > >> > > truncated
> > > > > > > > > >> > > > > >> > messages
> > > > > > > > > >> > > > > >> > > on
> > > > > > > > > >> > > > > >> > > > > brokers,
> > > > > > > > > >> > > > > >> > > > > with the experiments done in the
> Google
> > > > Doc,
> > > > > it
> > > > > > > > > should
> > > > > > > > > >> be
> > > > > > > > > >> > > easy
> > > > > > > > > >> > > > > to
> > > > > > > > > >> > > > > >> see
> > > > > > > > > >> > > > > >> > > > that
> > > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > > > > up-to-date
> > > > > > > > > >> metadata
> > > > > > > > > >> > > > avoids
> > > > > > > > > >> > > > > >> > > appending
> > > > > > > > > >> > > > > >> > > > > of messages
> > > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
> > > talk
> > > > > to a
> > > > > > > > > system
> > > > > > > > > >> > > > operator
> > > > > > > > > >> > > > > >> and
> > > > > > > > > >> > > > > >> > ask
> > > > > > > > > >> > > > > >> > > > > whether
> > > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
> > > most
> > > > > > likely
> > > > > > > > the
> > > > > > > > > >> > answer
> > > > > > > > > >> > > > is
> > > > > > > > > >> > > > > >> yes.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think
> it
> > > > might
> > > > > be
> > > > > > > > > >> helpful to
> > > > > > > > > >> > > > > >> construct
> > > > > > > > > >> > > > > >> > > some
> > > > > > > > > >> > > > > >> > > > > formulas.
> > > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
> > back
> > > to
> > > > > the
> > > > > > > > case
> > > > > > > > > >> where
> > > > > > > > > >> > > > there
> > > > > > > > > >> > > > > >> is
> > > > > > > > > >> > > > > >> > > only
> > > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > > > > >> > > > > >> > > > > Following the experiments in the
> Google
> > > > Doc,
> > > > > > > let's
> > > > > > > > > say
> > > > > > > > > >> > > broker0
> > > > > > > > > >> > > > > >> > becomes
> > > > > > > > > >> > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > > > > >> > > > > >> > > > > and after t0 there were still N
> produce
> > > > > > requests
> > > > > > > in
> > > > > > > > > its
> > > > > > > > > >> > > > request
> > > > > > > > > >> > > > > >> > queue.
> > > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought
> by
> > > > this
> > > > > > KIP,
> > > > > > > > > >> broker0
> > > > > > > > > >> > > can
> > > > > > > > > >> > > > > >> reply
> > > > > > > > > >> > > > > >> > > with
> > > > > > > > > >> > > > > >> > > > an
> > > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > > > processing
> > > > > > > time
> > > > > > > > of
> > > > > > > > > >> > > replying
> > > > > > > > > >> > > > > >> with
> > > > > > > > > >> > > > > >> > > such
> > > > > > > > > >> > > > > >> > > > an
> > > > > > > > > >> > > > > >> > > > > error message.
> > > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will
> need
> > to
> > > > > > append
> > > > > > > > > >> messages
> > > > > > > > > >> > to
> > > > > > > > > >> > > > > >> > segments,
> > > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > > > processing
> > > > > > > time
> > > > > > > > > for
> > > > > > > > > >> > such
> > > > > > > > > >> > > > > logic.
> > > > > > > > > >> > > > > >> > > > > Then the average extra latency
> incurred
> > > > > without
> > > > > > > > this
> > > > > > > > > >> KIP
> > > > > > > > > >> > is
> > > > > > > > > >> > > N
> > > > > > > > > >> > > > *
> > > > > > > > > >> > > > > >> (M2 -
> > > > > > > > > >> > > > > >> > > > M1) /
> > > > > > > > > >> > > > > >> > > > > 2.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > In practice, M2 should always be
> larger
> > > > than
> > > > > > M1,
> > > > > > > > > which
> > > > > > > > > >> > means
> > > > > > > > > >> > > > as
> > > > > > > > > >> > > > > >> long
> > > > > > > > > >> > > > > >> > > as N
> > > > > > > > > >> > > > > >> > > > > is positive,
> > > > > > > > > >> > > > > >> > > > > we would see improvements on the
> > average
> > > > > > latency.
> > > > > > > > > >> > > > > >> > > > > There does not need to be significant
> > > > backlog
> > > > > > of
> > > > > > > > > >> requests
> > > > > > > > > >> > in
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > request
> > > > > > > > > >> > > > > >> > > > > queue,
> > > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> > performance
> > > > to
> > > > > > have
> > > > > > > > the
> > > > > > > > > >> > > > > improvement.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > Regards,
> > > > > > > > > >> > > > > >> > > > > Lucas
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the
> timeout
> > on
> > > > the
> > > > > > > > > producer
> > > > > > > > > >> > side
> > > > > > > > > >> > > > can
> > > > > > > > > >> > > > > >> > trigger
> > > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > > > > >> > > > > >> > > > > when the corresponding leader broker
> is
> > > > > > > overloaded,
> > > > > > > > > >> > > > exacerbating
> > > > > > > > > >> > > > > >> the
> > > > > > > > > >> > > > > >> > > > > situation.
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
> > Lin
> > > <
> > > > > > > > > >> > > lindong28@gmail.com
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > >> > wrote:
> > > > > > > > > >> > > > > >> > > > >
> > > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > > > documentation
> > > > > of
> > > > > > > the
> > > > > > > > > >> > > > experiment.
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> > > separate
> > > > > > queue
> > > > > > > > for
> > > > > > > > > >> > > > controller
> > > > > > > > > >> > > > > >> > > requests
> > > > > > > > > >> > > > > >> > > > is
> > > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in
> > the
> > > > > > summary
> > > > > > > > > >> section
> > > > > > > > > >> > of
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> > Google
> > > > > > > > > >> > > > > >> > > > > doc,
> > > > > > > > > >> > > > > >> > > > > > controller requests are generally
> > more
> > > > > > > important
> > > > > > > > > than
> > > > > > > > > >> > data
> > > > > > > > > >> > > > > >> requests
> > > > > > > > > >> > > > > >> > > and
> > > > > > > > > >> > > > > >> > > > > we
> > > > > > > > > >> > > > > >> > > > > > probably want controller requests
> to
> > be
> > > > > > > processed
> > > > > > > > > >> > sooner.
> > > > > > > > > >> > > > But
> > > > > > > > > >> > > > > >> then
> > > > > > > > > >> > > > > >> > > Eno
> > > > > > > > > >> > > > > >> > > > > has
> > > > > > > > > >> > > > > >> > > > > > two very good questions which I am
> > not
> > > > sure
> > > > > > the
> > > > > > > > > >> Google
> > > > > > > > > >> > doc
> > > > > > > > > >> > > > has
> > > > > > > > > >> > > > > >> > > answered
> > > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > > > > following
> > > > > > > > > >> questions?
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> > > > actual
> > > > > > > > benefit
> > > > > > > > > of
> > > > > > > > > >> > > > KIP-291
> > > > > > > > > >> > > > > to
> > > > > > > > > >> > > > > >> > > users.
> > > > > > > > > >> > > > > >> > > > > The
> > > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > > > > simulates
> > > > > > > the
> > > > > > > > > >> > scenario
> > > > > > > > > >> > > > that
> > > > > > > > > >> > > > > >> > broker
> > > > > > > > > >> > > > > >> > > > is
> > > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest
> due
> > > to
> > > > > e.g.
> > > > > > > > slow
> > > > > > > > > >> disk.
> > > > > > > > > >> > > It
> > > > > > > > > >> > > > > >> > currently
> > > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
> > partition.
> > > > But
> > > > > > in
> > > > > > > > the
> > > > > > > > > >> > common
> > > > > > > > > >> > > > > >> scenario,
> > > > > > > > > >> > > > > >> > > it
> > > > > > > > > >> > > > > >> > > > is
> > > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> > > there
> > > > > are
> > > > > > > many
> > > > > > > > > >> other
> > > > > > > > > >> > > > > >> partitions
> > > > > > > > > >> > > > > >> > > that
> > > > > > > > > >> > > > > >> > > > > are
> > > > > > > > > >> > > > > >> > > > > > also actively produced to and
> > > > > ProduceRequest
> > > > > > to
> > > > > > > > > these
> > > > > > > > > >> > > > > partition
> > > > > > > > > >> > > > > >> > also
> > > > > > > > > >> > > > > >> > > > > takes
> > > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
> > even
> > > > if
> > > > > > > > broker0
> > > > > > > > > >> can
> > > > > > > > > >> > > > become
> > > > > > > > > >> > > > > >> > > follower
> > > > > > > > > >> > > > > >> > > > > for
> > > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
> > still
> > > > > needs
> > > > > > > to
> > > > > > > > > >> process
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > > > > ProduceRequests
> > > > > > > > > >> > cover
> > > > > > > > > >> > > > > other
> > > > > > > > > >> > > > > >> > > > > partitions.
> > > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> > > > timeout
> > > > > > > after
> > > > > > > > > 30
> > > > > > > > > >> > > seconds
> > > > > > > > > >> > > > > and
> > > > > > > > > >> > > > > >> > most
> > > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
> > after
> > > > 30
> > > > > > > > seconds.
> > > > > > > > > >> Then
> > > > > > > > > >> > > it
> > > > > > > > > >> > > > is
> > > > > > > > > >> > > > > >> not
> > > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
> > client
> > > > > since
> > > > > > > > > client
> > > > > > > > > >> > will
> > > > > > > > > >> > > > > >> timeout
> > > > > > > > > >> > > > > >> > > after
> > > > > > > > > >> > > > > >> > > > > 30
> > > > > > > > > >> > > > > >> > > > > > seconds before possibly
> re-connecting
> > > to
> > > > > > > broker1,
> > > > > > > > > >> with
> > > > > > > > > >> > or
> > > > > > > > > >> > > > > >> without
> > > > > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> > > > specific
> > > > > > > > > benefits
> > > > > > > > > >> of
> > > > > > > > > >> > > this
> > > > > > > > > >> > > > > >> KIP to
> > > > > > > > > >> > > > > >> > > > user
> > > > > > > > > >> > > > > >> > > > > or
> > > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
> > this
> > > > KIP
> > > > > > > > > decreases
> > > > > > > > > >> > > > average
> > > > > > > > > >> > > > > >> > > latency,
> > > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably
> of
> > > > > > exception
> > > > > > > > > >> exposed
> > > > > > > > > >> > to
> > > > > > > > > >> > > > > >> client
> > > > > > > > > >> > > > > >> > > etc.
> > > > > > > > > >> > > > > >> > > > It
> > > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > > > > experience
> > > > > > > > only
> > > > > > > > > >> when
> > > > > > > > > >> > > > there
> > > > > > > > > >> > > > > is
> > > > > > > > > >> > > > > >> > > issue
> > > > > > > > > >> > > > > >> > > > > with
> > > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in
> > the
> > > > > > request
> > > > > > > > > queue
> > > > > > > > > >> > due
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > > >> slow
> > > > > > > > > >> > > > > >> > > disk
> > > > > > > > > >> > > > > >> > > > as
> > > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
> > this
> > > > KIP
> > > > > > > also
> > > > > > > > > >> useful
> > > > > > > > > >> > > when
> > > > > > > > > >> > > > > >> there
> > > > > > > > > >> > > > > >> > is
> > > > > > > > > >> > > > > >> > > > no
> > > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
> > might
> > > be
> > > > > > > helpful
> > > > > > > > > to
> > > > > > > > > >> > > clarify
> > > > > > > > > >> > > > > >> this
> > > > > > > > > >> > > > > >> > to
> > > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > > > > >> > > > > >> > > > > > Dong
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
> > Lucas
> > > > > Wang <
> > > > > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > > > > >> > > > > >> > >
> > > > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting
> the
> > > > > > experiment
> > > > > > > > > >> results.
> > > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
> > impact
> > > > > > > achieved
> > > > > > > > by
> > > > > > > > > >> > > > > implementing
> > > > > > > > > >> > > > > >> > the
> > > > > > > > > >> > > > > >> > > > > > proposed
> > > > > > > > > >> > > > > >> > > > > > > change:
> > > > > > > > > >> > > > > >> > > > > > > https://docs.google.com/
> > document/d/
> > > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=
> > sharing
> > > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
> > time
> > > > and
> > > > > > let
> > > > > > > > me
> > > > > > > > > >> know
> > > > > > > > > >> > > your
> > > > > > > > > >> > > > > >> > > feedback.
> > > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> > > > Harsha <
> > > > > > > > > >> > > kafka@harsha.io>
> > > > > > > > > >> > > > > >> wrote:
> > > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
> > take a
> > > > > look
> > > > > > > > might
> > > > > > > > > >> suit
> > > > > > > > > >> > > our
> > > > > > > > > >> > > > > >> > > > requirements
> > > > > > > > > >> > > > > >> > > > > > > > better.
> > > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
> > PM,
> > > > > Lucas
> > > > > > > > Wang <
> > > > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > > > > >> > > > > >> > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly,
> the
> > > > > > > replication
> > > > > > > > > >> quota
> > > > > > > > > >> > > > > mechanism
> > > > > > > > > >> > > > > >> > > > proposed
> > > > > > > > > >> > > > > >> > > > > > in
> > > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > > > > scenario.
> > > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28
> > AM,
> > > > > > Harsha <
> > > > > > > > > >> > > > > kafka@harsha.io
> > > > > > > > > >> > > > > >> >
> > > > > > > > > >> > > > > >> > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > > > > >> > > > > >> > > > > > > > > > One more question, any
> > thoughts
> > > > on
> > > > > > > making
> > > > > > > > > >> this
> > > > > > > > > >> > > > > >> configurable
> > > > > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of
> > > data
> > > > > > > requests
> > > > > > > > > to
> > > > > > > > > >> be
> > > > > > > > > >> > > > > >> > prioritized.
> > > > > > > > > >> > > > > >> > > > For
> > > > > > > > > >> > > > > >> > > > > > > > example
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster
> > when
> > > we
> > > > > > take
> > > > > > > > out
> > > > > > > > > a
> > > > > > > > > >> > > broker
> > > > > > > > > >> > > > > and
> > > > > > > > > >> > > > > >> > bring
> > > > > > > > > >> > > > > >> > > > new
> > > > > > > > > >> > > > > >> > > > > > one
> > > > > > > > > >> > > > > >> > > > > > > > it
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > will try to become follower
> > and
> > > > > have
> > > > > > > lot
> > > > > > > > of
> > > > > > > > > >> > fetch
> > > > > > > > > >> > > > > >> requests
> > > > > > > > > >> > > > > >> > to
> > > > > > > > > >> > > > > >> > > > > other
> > > > > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > > > > >> > > > > >> > > > > > > > > > in clusters. This will
> > > negatively
> > > > > > > effect
> > > > > > > > > the
> > > > > > > > > >> > > > > >> > > application/client
> > > > > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > > > > >> > > > > >> > > > > > > > > > We are also exploring the
> > > similar
> > > > > > > > solution
> > > > > > > > > to
> > > > > > > > > >> > > > > >> de-prioritize
> > > > > > > > > >> > > > > >> > > if
> > > > > > > > > >> > > > > >> > > > a
> > > > > > > > > >> > > > > >> > > > > > new
> > > > > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > > > > requests,
> > > > > > we
> > > > > > > > are
> > > > > > > > > >> ok
> > > > > > > > > >> > > with
> > > > > > > > > >> > > > > the
> > > > > > > > > >> > > > > >> > > replica
> > > > > > > > > >> > > > > >> > > > > to
> > > > > > > > > >> > > > > >> > > > > > be
> > > > > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> > > > should
> > > > > > > > > prioritize
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > client
> > > > > > > > > >> > > > > >> > > > > requests.
> > > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at
> > 11:35
> > > > AM
> > > > > > > Lucas
> > > > > > > > > Wang
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed
> > > response.
> > > > > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented
> the
> > > > > feature
> > > > > > > > yet,
> > > > > > > > > >> so no
> > > > > > > > > >> > > > > >> > experimental
> > > > > > > > > >> > > > > >> > > > > > results
> > > > > > > > > >> > > > > >> > > > > > > > so
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out
> > in
> > > > the
> > > > > > > > > following
> > > > > > > > > >> > days.
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely
> right
> > > that
> > > > > the
> > > > > > > > > >> priority
> > > > > > > > > >> > > queue
> > > > > > > > > >> > > > > >> does
> > > > > > > > > >> > > > > >> > not
> > > > > > > > > >> > > > > >> > > > > > > > completely
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > > > > >> > > > > >> > > > > > > > > > > data requests being
> > processed
> > > > > ahead
> > > > > > > of
> > > > > > > > > >> > > controller
> > > > > > > > > >> > > > > >> > requests.
> > > > > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect
> > it
> > > to
> > > > > > > greatly
> > > > > > > > > >> > mitigate
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> > effect
> > > > > > > > > >> > > > > >> > > > of
> > > > > > > > > >> > > > > >> > > > > > > stable
> > > > > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it
> > out
> > > > and
> > > > > > post
> > > > > > > > the
> > > > > > > > > >> > > results
> > > > > > > > > >> > > > > >> when I
> > > > > > > > > >> > > > > >> > > have
> > > > > > > > > >> > > > > >> > > > > it.
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at
> > 5:44
> > > > AM,
> > > > > > Eno
> > > > > > > > > >> Thereska
> > > > > > > > > >> > <
> > > > > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay,
> just
> > > > had a
> > > > > > > look
> > > > > > > > at
> > > > > > > > > >> > this.
> > > > > > > > > >> > > A
> > > > > > > > > >> > > > > >> couple
> > > > > > > > > >> > > > > >> > of
> > > > > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any
> > > positive
> > > > > > > change
> > > > > > > > > >> after
> > > > > > > > > >> > > > > >> implementing
> > > > > > > > > >> > > > > >> > > > this
> > > > > > > > > >> > > > > >> > > > > > KIP?
> > > > > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have
> any
> > > > > > > > experimental
> > > > > > > > > >> > results
> > > > > > > > > >> > > > > that
> > > > > > > > > >> > > > > >> > show
> > > > > > > > > >> > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > > > > >> > > > > >> > > > > > > > > of
> > > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually
> not
> > > > > > > sufficient
> > > > > > > > in
> > > > > > > > > >> > > > addressing
> > > > > > > > > >> > > > > >> the
> > > > > > > > > >> > > > > >> > > > > problem
> > > > > > > > > >> > > > > >> > > > > > > the
> > > > > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> > > > priority
> > > > > > > > queues,
> > > > > > > > > >> you
> > > > > > > > > >> > > will
> > > > > > > > > >> > > > > >> > sometimes
> > > > > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > > > > >> > > > > >> > > > > > > > > have
> > > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > > >> > > > > >> > > > > > > > > > > > case that data plane
> > > requests
> > > > > > will
> > > > > > > be
> > > > > > > > > >> ahead
> > > > > > > > > >> > of
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > >> > > control
> > > > > > > > > >> > > > > >> > > > > > plane
> > > > > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > > > > >> > > > > >> > > > > > > > > > > > This happens because
> the
> > > > system
> > > > > > > might
> > > > > > > > > >> have
> > > > > > > > > >> > > > already
> > > > > > > > > >> > > > > >> > > started
> > > > > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > > >> > > > > >> > > > > > > > > > > > data plane requests
> > before
> > > > the
> > > > > > > > control
> > > > > > > > > >> plane
> > > > > > > > > >> > > > ones
> > > > > > > > > >> > > > > >> > > arrived.
> > > > > > > > > >> > > > > >> > > > So
> > > > > > > > > >> > > > > >> > > > > > it
> > > > > > > > > >> > > > > >> > > > > > > > > would
> > > > > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of
> > the
> > > > > > problem
> > > > > > > > this
> > > > > > > > > >> KIP
> > > > > > > > > >> > > > > >> addresses.
> > > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at
> > > 4:44
> > > > > PM,
> > > > > > > Ted
> > > > > > > > > Yu <
> > > > > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018
> at
> > > > 8:42
> > > > > > AM,
> > > > > > > > > Lucas
> > > > > > > > > >> > Wang
> > > > > > > > > >> > > <
> > > > > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> > > > suggestion.
> > > > > > I've
> > > > > > > > > >> updated
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > KIP.
> > > > > > > > > >> > > > > >> > > Please
> > > > > > > > > >> > > > > >> > > > > > take
> > > > > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14,
> 2018
> > at
> > > > > 6:34
> > > > > > > PM,
> > > > > > > > > Ted
> > > > > > > > > >> Yu
> > > > > > > > > >> > <
> > > > > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > > > > KafkaConfig.scala
> > > > > > > > :
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > val
> > > QueuedMaxRequests =
> > > > > 500
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good
> if
> > > you
> > > > > can
> > > > > > > > > include
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > >> default
> > > > > > > > > >> > > > > >> > > value
> > > > > > > > > >> > > > > >> > > > > for
> > > > > > > > > >> > > > > >> > > > > > > > this
> > > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Dong,

Sure. Regarding the 2nd case you mentioned
"- If the controller has not received response for R1 before it is
disconnected, it will re-send R1 followed by R2 after it is re-connected to
the broker."

with the max inflight request set to 1, after the connection is
re-established, the controller won't send R2
before it gets a response for R1, right? Plus the controller is using
blocking calls for each request, i.e.
NetworkClientUtils.sendAndReceive, with infinite retries for each request
within the same instance of RequestSendThread.
So within the same instance of RequestSendThread, sending out multiple
different requests seems impossible.

However, based on the comments in the source code, it seems multiple
requests can happen if
the broker loses its zk session, and then reconnects with zookeeper,
multiple generations of RequestSendThreads can trigger multiple different
requests.
In that case, we cannot prevent out-of-order processing even with the queue
since those multiple requests are from different connections.
Broker generations can help in those cases, but I won't dive into that
discussion.
Is that right?

Lucas

On Wed, Jul 18, 2018 at 9:08 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Lucas,
>
> I think for now we can probably discuss based on the existing Kafka's
> design where controller to a broker is hard coded to be 1. It looks like
> Becket has provided a good example in which requests from the same
> controller can be processed out of order.
>
> Thanks,
> Dong
>
> On Wed, Jul 18, 2018 at 8:35 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > @Becket and Dong,
> > I think currently the ordering guarantee is achieved because
> > the max inflight request from the controller to a broker is hard coded to
> > be 1.
> >
> > If let's hypothetically say the max inflight requests is > 1, then I
> think
> > Dong
> > is right to say that even the separate queue cannot guarantee ordered
> > processing,
> > For example, Req1 and Req2 are sent to a broker, and after a connection
> > reconnection,
> > both requests are sent again, causing the broker to have 4 requests in
> the
> > following order
> > Req2 > Req1 > Req2 > Req1.
> >
> > In summary, it seems using the dequeue should not cause problems with
> > out-of-order processing.
> > Is that right?
> >
> > Lucas
> >
> > On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Becket,
> > >
> > > It seems that the requests from the old controller will be discarded
> due
> > to
> > > old controller epoch. It is not clear whether this is a problem.
> > >
> > > And if this out-of-order processing of controller requests is a
> problem,
> > it
> > > seems like an existing problem which also applies to the multi-queue
> > based
> > > design. So it is probably not a concern specific to the use of deque.
> > Does
> > > that sound reasonable?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Hi Mayuresh/Joel,
> > > >
> > > > Using the request channel as a dequeue was bright up some time ago
> when
> > > we
> > > > initially thinking of prioritizing the request. The concern was that
> > the
> > > > controller requests are supposed to be processed in order. If we can
> > > ensure
> > > > that there is one controller request in the request channel, the
> order
> > is
> > > > not a concern. But in cases that there are more than one controller
> > > request
> > > > inserted into the queue, the controller request order may change and
> > > cause
> > > > problem. For example, think about the following sequence:
> > > > 1. Controller successfully sent a request R1 to broker
> > > > 2. Broker receives R1 and put the request to the head of the request
> > > queue.
> > > > 3. Controller to broker connection failed and the controller
> > reconnected
> > > to
> > > > the broker.
> > > > 4. Controller sends a request R2 to the broker
> > > > 5. Broker receives R2 and add it to the head of the request queue.
> > > > Now on the broker side, R2 will be processed before R1 is processed,
> > > which
> > > > may cause problem.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > > >
> > > > > @Mayuresh - I like your idea. It appears to be a simpler less
> > invasive
> > > > > alternative and it should work. Jun/Becket/others, do you see any
> > > > pitfalls
> > > > > with this approach?
> > > > >
> > > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > @Mayuresh,
> > > > > > That's a very interesting idea that I haven't thought before.
> > > > > > It seems to solve our problem at hand pretty well, and also
> > > > > > avoids the need to have a new size metric and capacity config
> > > > > > for the controller request queue. In fact, if we were to adopt
> > > > > > this design, there is no public interface change, and we
> > > > > > probably don't need a KIP.
> > > > > > Also implementation wise, it seems
> > > > > > the java class LinkedBlockingQueue can readily satisfy the
> > > requirement
> > > > > > by supporting a capacity, and also allowing inserting at both
> ends.
> > > > > >
> > > > > > My only concern is that this design is tied to the coincidence
> that
> > > > > > we have two request priorities and there are two ends to a deque.
> > > > > > Hence by using the proposed design, it seems the network layer is
> > > > > > more tightly coupled with upper layer logic, e.g. if we were to
> add
> > > > > > an extra priority level in the future for some reason, we would
> > > > probably
> > > > > > need to go back to the design of separate queues, one for each
> > > priority
> > > > > > level.
> > > > > >
> > > > > > In summary, I'm ok with both designs and lean toward your
> suggested
> > > > > > approach.
> > > > > > Let's hear what others think.
> > > > > >
> > > > > > @Becket,
> > > > > > In light of Mayuresh's suggested new design, I'm answering your
> > > > question
> > > > > > only in the context
> > > > > > of the current KIP design: I think your suggestion makes sense,
> and
> > > I'm
> > > > > ok
> > > > > > with removing the capacity config and
> > > > > > just relying on the default value of 20 being sufficient enough.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > > gharatmayuresh15@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > >
> > > > > > > Seems like the main intent here is to prioritize the controller
> > > > request
> > > > > > > over any other requests.
> > > > > > > In that case, we can change the request queue to a dequeue,
> where
> > > you
> > > > > > > always insert the normal requests (produce, consume,..etc) to
> the
> > > end
> > > > > of
> > > > > > > the dequeue, but if its a controller request, you insert it to
> > the
> > > > head
> > > > > > of
> > > > > > > the queue. This ensures that the controller request will be
> given
> > > > > higher
> > > > > > > priority over other requests.
> > > > > > >
> > > > > > > Also since we only read one request from the socket and mute it
> > and
> > > > > only
> > > > > > > unmute it after handling the request, this would ensure that we
> > > don't
> > > > > > > handle controller requests out of order.
> > > > > > >
> > > > > > > With this approach we can avoid the second queue and the
> > additional
> > > > > > config
> > > > > > > for the size of the queue.
> > > > > > >
> > > > > > > What do you think ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hey Joel,
> > > > > > > >
> > > > > > > > Thank for the detail explanation. I agree the current design
> > > makes
> > > > > > sense.
> > > > > > > > My confusion is about whether the new config for the
> controller
> > > > queue
> > > > > > > > capacity is necessary. I cannot think of a case in which
> users
> > > > would
> > > > > > > change
> > > > > > > > it.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > I guess my question can be rephrased to "do we expect user
> to
> > > > ever
> > > > > > > change
> > > > > > > > > the controller request queue capacity"? If we agree that 20
> > is
> > > > > > already
> > > > > > > a
> > > > > > > > > very generous default number and we do not expect user to
> > > change
> > > > > it,
> > > > > > is
> > > > > > > > it
> > > > > > > > > still necessary to expose this as a config?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> @Becket
> > > > > > > > >> 1. Thanks for the comment. You are right that normally
> there
> > > > > should
> > > > > > be
> > > > > > > > >> just
> > > > > > > > >> one controller request because of muting,
> > > > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > > > controller
> > > > > > > > >> requests.
> > > > > > > > >> I went through the KIP again, and I'm not sure which part
> > > > conveys
> > > > > > that
> > > > > > > > >> info.
> > > > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > > > >>
> > > > > > > > >> 2. Though it should not happen in normal conditions, the
> > > current
> > > > > > > design
> > > > > > > > >> does not preclude multiple controllers running
> > > > > > > > >> at the same time, hence if we don't have the controller
> > queue
> > > > > > capacity
> > > > > > > > >> config and simply make its capacity to be 1,
> > > > > > > > >> network threads handling requests from different
> controllers
> > > > will
> > > > > be
> > > > > > > > >> blocked during those troublesome times,
> > > > > > > > >> which is probably not what we want. On the other hand,
> > adding
> > > > the
> > > > > > > extra
> > > > > > > > >> config with a default value, say 20, guards us from issues
> > in
> > > > > those
> > > > > > > > >> troublesome times, and IMO there isn't much downside of
> > adding
> > > > the
> > > > > > > extra
> > > > > > > > >> config.
> > > > > > > > >>
> > > > > > > > >> @Mayuresh
> > > > > > > > >> Good catch, this sentence is an obsolete statement based
> on
> > a
> > > > > > previous
> > > > > > > > >> design. I've revised the wording in the KIP.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Lucas
> > > > > > > > >>
> > > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Lucas,
> > > > > > > > >> >
> > > > > > > > >> > Thanks for the KIP.
> > > > > > > > >> > I am trying to understand why you think "The memory
> > > > consumption
> > > > > > can
> > > > > > > > rise
> > > > > > > > >> > given the total number of queued requests can go up to
> 2x"
> > > in
> > > > > the
> > > > > > > > impact
> > > > > > > > >> > section. Normally the requests from controller to a
> Broker
> > > are
> > > > > not
> > > > > > > > high
> > > > > > > > >> > volume, right ?
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> >
> > > > > > > > >> > Mayuresh
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > > becket.qin@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control
> plane
> > > from
> > > > > the
> > > > > > > > data
> > > > > > > > >> > plane
> > > > > > > > >> > > makes a lot of sense.
> > > > > > > > >> > >
> > > > > > > > >> > > In the KIP you mentioned that the controller request
> > queue
> > > > may
> > > > > > > have
> > > > > > > > >> many
> > > > > > > > >> > > requests in it. Will this be a common case? The
> > controller
> > > > > > > requests
> > > > > > > > >> still
> > > > > > > > >> > > goes through the SocketServer. The SocketServer will
> > mute
> > > > the
> > > > > > > > channel
> > > > > > > > >> > once
> > > > > > > > >> > > a request is read and put into the request channel. So
> > > > > assuming
> > > > > > > > there
> > > > > > > > >> is
> > > > > > > > >> > > only one connection between controller and each
> broker,
> > on
> > > > the
> > > > > > > > broker
> > > > > > > > >> > side,
> > > > > > > > >> > > there should be only one controller request in the
> > > > controller
> > > > > > > > request
> > > > > > > > >> > queue
> > > > > > > > >> > > at any given time. If that is the case, do we need a
> > > > separate
> > > > > > > > >> controller
> > > > > > > > >> > > request queue capacity config? The default value 20
> > means
> > > > that
> > > > > > we
> > > > > > > > >> expect
> > > > > > > > >> > > there are 20 controller switches to happen in a short
> > > period
> > > > > of
> > > > > > > > time.
> > > > > > > > >> I
> > > > > > > > >> > am
> > > > > > > > >> > > not sure whether someone should increase the
> controller
> > > > > request
> > > > > > > > queue
> > > > > > > > >> > > capacity to handle such case, as it seems indicating
> > > > something
> > > > > > > very
> > > > > > > > >> wrong
> > > > > > > > >> > > has happened.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > >
> > > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > > lindong28@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Thanks for the update Lucas.
> > > > > > > > >> > > >
> > > > > > > > >> > > > I think the motivation section is intuitive. It will
> > be
> > > > good
> > > > > > to
> > > > > > > > >> learn
> > > > > > > > >> > > more
> > > > > > > > >> > > > about the comments from other reviewers.
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Hi Dong,
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > > > explaining
> > > > > > > the
> > > > > > > > >> > cases
> > > > > > > > >> > > > that
> > > > > > > > >> > > > > would have user impacts.
> > > > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Thanks,
> > > > > > > > >> > > > > Lucas
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com
> > > > > > > > >> >
> > > > > > > > >> > > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > Hi Dong,
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > The simulation of disk being slow is merely for
> me
> > > to
> > > > > > easily
> > > > > > > > >> > > construct
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > > testing scenario
> > > > > > > > >> > > > > > with a backlog of produce requests. In
> production,
> > > > other
> > > > > > > than
> > > > > > > > >> the
> > > > > > > > >> > > disk
> > > > > > > > >> > > > > > being slow, a backlog of
> > > > > > > > >> > > > > > produce requests may also be caused by high
> > produce
> > > > QPS.
> > > > > > > > >> > > > > > In that case, we may not want to kill the broker
> > and
> > > > > > that's
> > > > > > > > when
> > > > > > > > >> > this
> > > > > > > > >> > > > KIP
> > > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > > > >> > > > > > and non-JBOD setup.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Going back to your previous question about each
> > > > > > > ProduceRequest
> > > > > > > > >> > > covering
> > > > > > > > >> > > > > 20
> > > > > > > > >> > > > > > partitions that are randomly
> > > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > > > enqueued
> > > > > > > that
> > > > > > > > >> > tries
> > > > > > > > >> > > to
> > > > > > > > >> > > > > > switch the current broker, say broker0, from
> > leader
> > > to
> > > > > > > > follower
> > > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For
> the
> > > > sake
> > > > > of
> > > > > > > > >> > argument,
> > > > > > > > >> > > > > > let's also assume the other brokers, say
> broker1,
> > > have
> > > > > > > > *stopped*
> > > > > > > > >> > > > fetching
> > > > > > > > >> > > > > > from
> > > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
> > -1
> > > > > (ALL)
> > > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests
> ahead
> > of
> > > > > > > > >> LeaderAndISR
> > > > > > > > >> > > will
> > > > > > > > >> > > > be
> > > > > > > > >> > > > > > put into the purgatory,
> > > > > > > > >> > > > > >         and since they'll never be replicated to
> > > other
> > > > > > > brokers
> > > > > > > > >> > > (because
> > > > > > > > >> > > > > of
> > > > > > > > >> > > > > > the assumption made above), they will
> > > > > > > > >> > > > > >         be completed either when the
> LeaderAndISR
> > > > > request
> > > > > > is
> > > > > > > > >> > > processed
> > > > > > > > >> > > > or
> > > > > > > > >> > > > > > when the timeout happens.
> > > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > > > transition
> > > > > > the
> > > > > > > > >> > > partition
> > > > > > > > >> > > > > > test-0 to become a follower,
> > > > > > > > >> > > > > >         after the current broker sees the
> > > replication
> > > > of
> > > > > > the
> > > > > > > > >> > > remaining
> > > > > > > > >> > > > 19
> > > > > > > > >> > > > > > partitions, it can send a response indicating
> that
> > > > > > > > >> > > > > >         it's no longer the leader for the
> > "test-0".
> > > > > > > > >> > > > > >   To see the latency difference between 1.1 and
> > 1.2,
> > > > > let's
> > > > > > > say
> > > > > > > > >> > there
> > > > > > > > >> > > > are
> > > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
> > and
> > > > > there
> > > > > > > are
> > > > > > > > 8
> > > > > > > > >> io
> > > > > > > > >> > > > > threads,
> > > > > > > > >> > > > > >   so each io thread will process approximately
> > 3000
> > > > > > produce
> > > > > > > > >> > requests.
> > > > > > > > >> > > > Now
> > > > > > > > >> > > > > > let's investigate the io thread that finally
> > > processed
> > > > > the
> > > > > > > > >> > > > LeaderAndISR.
> > > > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> > > time
> > > > > when
> > > > > > > > their
> > > > > > > > >> > > > > remaining
> > > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
> > the
> > > > > > > > LeaderAndISR
> > > > > > > > >> > > > request
> > > > > > > > >> > > > > is
> > > > > > > > >> > > > > > processed at time t3000.
> > > > > > > > >> > > > > >   Without this KIP, the 1st produce request
> would
> > > have
> > > > > > > waited
> > > > > > > > an
> > > > > > > > >> > > extra
> > > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an
> extra
> > > > time
> > > > > of
> > > > > > > > >> t3000 -
> > > > > > > > >> > > t1,
> > > > > > > > >> > > > > etc.
> > > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> > bigger
> > > > for
> > > > > > the
> > > > > > > > >> > earlier
> > > > > > > > >> > > > > > produce requests than for the later ones. For
> the
> > > same
> > > > > > > reason,
> > > > > > > > >> the
> > > > > > > > >> > > more
> > > > > > > > >> > > > > > ProduceRequests queued
> > > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we
> > get
> > > > > > (capped
> > > > > > > > by
> > > > > > > > >> the
> > > > > > > > >> > > > > > produce timeout).
> > > > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0
> or
> > > > > acks=1
> > > > > > > > >> > > > > >   There will be no latency differences in this
> > case,
> > > > but
> > > > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > > > test-0
> > > > > in
> > > > > > > the
> > > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will
> be
> > > > > appended
> > > > > > > to
> > > > > > > > >> the
> > > > > > > > >> > > local
> > > > > > > > >> > > > > log,
> > > > > > > > >> > > > > >         and eventually be truncated after
> > processing
> > > > the
> > > > > > > > >> > > LeaderAndISR.
> > > > > > > > >> > > > > > This is what's referred to as
> > > > > > > > >> > > > > >         "some unofficial definition of data loss
> > in
> > > > > terms
> > > > > > of
> > > > > > > > >> > messages
> > > > > > > > >> > > > > > beyond the high watermark".
> > > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> > > since
> > > > if
> > > > > > the
> > > > > > > > >> > > > LeaderAndISR
> > > > > > > > >> > > > > > is immediately processed, the response to
> > producers
> > > > will
> > > > > > > have
> > > > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > > > producers
> > > > > > > to
> > > > > > > > >> retry
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > This explanation above is the benefit for
> reducing
> > > the
> > > > > > > latency
> > > > > > > > >> of a
> > > > > > > > >> > > > > broker
> > > > > > > > >> > > > > > becoming the follower,
> > > > > > > > >> > > > > > closely related is reducing the latency of a
> > broker
> > > > > > becoming
> > > > > > > > the
> > > > > > > > >> > > > leader.
> > > > > > > > >> > > > > > In this case, the benefit is even more obvious,
> if
> > > > other
> > > > > > > > brokers
> > > > > > > > >> > have
> > > > > > > > >> > > > > > resigned leadership, and the
> > > > > > > > >> > > > > > current broker should take leadership. Any delay
> > in
> > > > > > > processing
> > > > > > > > >> the
> > > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
> > this
> > > > can
> > > > > > > cause
> > > > > > > > >> > failed
> > > > > > > > >> > > > > > produce requests if the retries are
> > > > > > > > >> > > > > > exhausted.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Another two types of controller requests are
> > > > > > UpdateMetadata
> > > > > > > > and
> > > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> > follows:
> > > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> > > means
> > > > > > > clients
> > > > > > > > >> > > receiving
> > > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
> > info
> > > > > > > > >> > > > > > for certain partitions, and the effect is more
> > > retries
> > > > > or
> > > > > > > even
> > > > > > > > >> > fatal
> > > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > For StopReplica requests, a long queuing time
> may
> > > > > degrade
> > > > > > > the
> > > > > > > > >> > > > performance
> > > > > > > > >> > > > > > of topic deletion.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > > > >> > DescribeLogDirsRequest,
> > > > > > > > >> > > > you
> > > > > > > > >> > > > > > are right
> > > > > > > > >> > > > > > that this KIP cannot help with the latency in
> > > getting
> > > > > the
> > > > > > > log
> > > > > > > > >> dirs
> > > > > > > > >> > > > info,
> > > > > > > > >> > > > > > and it's only relevant
> > > > > > > > >> > > > > > when controller requests are involved.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regards,
> > > > > > > > >> > > > > > Lucas
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > > > lindong28@gmail.com
> > > > > > > > >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > >> Hey Jun,
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Thanks much for the comments. It is good point.
> > So
> > > > the
> > > > > > > > feature
> > > > > > > > >> may
> > > > > > > > >> > > be
> > > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> > > below.
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Hey Lucas,
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Do you think this feature is also useful for
> > > non-JBOD
> > > > > > setup
> > > > > > > > or
> > > > > > > > >> it
> > > > > > > > >> > is
> > > > > > > > >> > > > > only
> > > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > > > understand
> > > > > > > > this.
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> When the broker is setup using JBOD, in order
> to
> > > move
> > > > > > > leaders
> > > > > > > > >> on
> > > > > > > > >> > the
> > > > > > > > >> > > > > >> failed
> > > > > > > > >> > > > > >> disk to other disks, the system operator first
> > > needs
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > >> > list
> > > > > > > > >> > > > of
> > > > > > > > >> > > > > >> partitions on the failed disk. This is
> currently
> > > > > achieved
> > > > > > > > using
> > > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > > > >> DescribeLogDirsRequest
> > > > > > > > >> > to
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> broker. If we only prioritize the controller
> > > > requests,
> > > > > > then
> > > > > > > > the
> > > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > > > >> > > > > >> may still take a long time to be processed by
> the
> > > > > broker.
> > > > > > > So
> > > > > > > > >> the
> > > > > > > > >> > > > overall
> > > > > > > > >> > > > > >> time to move leaders away from the failed disk
> > may
> > > > > still
> > > > > > be
> > > > > > > > >> long
> > > > > > > > >> > > even
> > > > > > > > >> > > > > with
> > > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> Thanks,
> > > > > > > > >> > > > > >> Dong
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > > > >> lucasatucla@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >>
> > > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > @Dong,
> > > > > > > > >> > > > > >> > Since both of the two comments in your
> previous
> > > > email
> > > > > > are
> > > > > > > > >> about
> > > > > > > > >> > > the
> > > > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
> > that
> > > > > this
> > > > > > > KIP
> > > > > > > > >> can
> > > > > > > > >> > be
> > > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > Regards,
> > > > > > > > >> > > > > >> > Lucas
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > > > jun@confluent.io>
> > > > > > > > >> > wrote:
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> > probably
> > > > > > should
> > > > > > > > just
> > > > > > > > >> > kill
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > > broker. In that case, this KIP may not
> help.
> > If
> > > > > only
> > > > > > > one
> > > > > > > > of
> > > > > > > > >> > the
> > > > > > > > >> > > > > disks
> > > > > > > > >> > > > > >> on
> > > > > > > > >> > > > > >> > a
> > > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
> > disk
> > > > and
> > > > > > move
> > > > > > > > the
> > > > > > > > >> > > > leaders
> > > > > > > > >> > > > > on
> > > > > > > > >> > > > > >> > that
> > > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
> > able
> > > > to
> > > > > > > > process
> > > > > > > > >> the
> > > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > > > >> > > > > >> > > requests faster will potentially help the
> > > > producers
> > > > > > > > recover
> > > > > > > > >> > > > quicker.
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > Thanks,
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > Jun
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > > > >> lindong28@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > > questions
> > > > > > below.
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest
> covers
> > 20
> > > > > > > > partitions
> > > > > > > > >> > that
> > > > > > > > >> > > > are
> > > > > > > > >> > > > > >> > > randomly
> > > > > > > > >> > > > > >> > > > distributed across all partitions, then
> > each
> > > > > > > > >> ProduceRequest
> > > > > > > > >> > > will
> > > > > > > > >> > > > > >> likely
> > > > > > > > >> > > > > >> > > > cover some partitions for which the
> broker
> > is
> > > > > still
> > > > > > > > >> leader
> > > > > > > > >> > > after
> > > > > > > > >> > > > > it
> > > > > > > > >> > > > > >> > > quickly
> > > > > > > > >> > > > > >> > > > processes the
> > > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will
> still
> > > be
> > > > > slow
> > > > > > > in
> > > > > > > > >> > > > processing
> > > > > > > > >> > > > > >> these
> > > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
> > very
> > > > > high
> > > > > > > with
> > > > > > > > >> this
> > > > > > > > >> > > > KIP.
> > > > > > > > >> > > > > It
> > > > > > > > >> > > > > >> > > seems
> > > > > > > > >> > > > > >> > > > that most ProduceRequest will still
> timeout
> > > > after
> > > > > > 30
> > > > > > > > >> > seconds.
> > > > > > > > >> > > Is
> > > > > > > > >> > > > > >> this
> > > > > > > > >> > > > > >> > > > understanding correct?
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> > > still
> > > > > > > timeout
> > > > > > > > >> after
> > > > > > > > >> > > 30
> > > > > > > > >> > > > > >> > seconds,
> > > > > > > > >> > > > > >> > > > then it is less clear how this KIP
> reduces
> > > > > average
> > > > > > > > >> produce
> > > > > > > > >> > > > > latency.
> > > > > > > > >> > > > > >> Can
> > > > > > > > >> > > > > >> > > you
> > > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
> > this
> > > > KIP?
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Not sure why system operator directly
> cares
> > > > > number
> > > > > > of
> > > > > > > > >> > > truncated
> > > > > > > > >> > > > > >> > messages.
> > > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > > > throughput
> > > > > > > or
> > > > > > > > >> > reduce
> > > > > > > > >> > > > > >> message
> > > > > > > > >> > > > > >> > > > duplication? It will be good to
> understand
> > > > this.
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > Thanks,
> > > > > > > > >> > > > > >> > > > Dong
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang
> <
> > > > > > > > >> > > lucasatucla@gmail.com
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > >> > wrote:
> > > > > > > > >> > > > > >> > > >
> > > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Thanks for your valuable comments.
> Please
> > > see
> > > > > my
> > > > > > > > reply
> > > > > > > > >> > > below.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> > partition.
> > > > Now
> > > > > > > let's
> > > > > > > > >> > > consider
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > >> more
> > > > > > > > >> > > > > >> > > > common
> > > > > > > > >> > > > > >> > > > > scenario
> > > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > > > partitions.
> > > > > > And
> > > > > > > > >> let's
> > > > > > > > >> > > say
> > > > > > > > >> > > > > for
> > > > > > > > >> > > > > >> > some
> > > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > > > >> > > > > >> > > > > The number of leader partitions on
> > broker0
> > > is
> > > > > so
> > > > > > > > large,
> > > > > > > > >> > say
> > > > > > > > >> > > > 10K,
> > > > > > > > >> > > > > >> that
> > > > > > > > >> > > > > >> > > the
> > > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > > > >> > > > > >> > > > > and the operator would like to shift
> the
> > > > > > leadership
> > > > > > > > >> for a
> > > > > > > > >> > > lot
> > > > > > > > >> > > > of
> > > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > > > >> > > > > >> > > > > either manually or through some service
> > > like
> > > > > > cruise
> > > > > > > > >> > control.
> > > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> > leadership
> > > > > > > > transitions
> > > > > > > > >> > > finish
> > > > > > > > >> > > > > >> more
> > > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> > > becoming
> > > > > more
> > > > > > > > >> > balanced,
> > > > > > > > >> > > > > >> > > > > but all existing producers
> corresponding
> > to
> > > > the
> > > > > > 9K
> > > > > > > > >> > > partitions
> > > > > > > > >> > > > > will
> > > > > > > > >> > > > > >> > get
> > > > > > > > >> > > > > >> > > > the
> > > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> > > thanks
> > > > to
> > > > > > the
> > > > > > > > >> > batched
> > > > > > > > >> > > > > async
> > > > > > > > >> > > > > >> ZK
> > > > > > > > >> > > > > >> > > > > operations.
> > > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> > during
> > > > such
> > > > > > > > >> > troublesome
> > > > > > > > >> > > > > times.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc
> have
> > > > shown
> > > > > > > that
> > > > > > > > >> with
> > > > > > > > >> > > this
> > > > > > > > >> > > > > KIP
> > > > > > > > >> > > > > >> > many
> > > > > > > > >> > > > > >> > > > > producers
> > > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > > NotLeaderForPartition,
> > > > > > > > based
> > > > > > > > >> on
> > > > > > > > >> > > > which
> > > > > > > > >> > > > > >> they
> > > > > > > > >> > > > > >> > > > retry
> > > > > > > > >> > > > > >> > > > > immediately.
> > > > > > > > >> > > > > >> > > > > Therefore the latency (~14
> seconds+quick
> > > > retry)
> > > > > > for
> > > > > > > > >> their
> > > > > > > > >> > > > single
> > > > > > > > >> > > > > >> > > message
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > much smaller
> > > > > > > > >> > > > > >> > > > > compared with the case of timing out
> > > without
> > > > > the
> > > > > > > KIP
> > > > > > > > >> (30
> > > > > > > > >> > > > seconds
> > > > > > > > >> > > > > >> for
> > > > > > > > >> > > > > >> > > > timing
> > > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > > > >> > > > > >> > > > > One might argue that reducing the
> timing
> > > out
> > > > on
> > > > > > the
> > > > > > > > >> > producer
> > > > > > > > >> > > > > side
> > > > > > > > >> > > > > >> can
> > > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > > > drawbacks[1].
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show
> the
> > > > > number
> > > > > > of
> > > > > > > > >> > > truncated
> > > > > > > > >> > > > > >> > messages
> > > > > > > > >> > > > > >> > > on
> > > > > > > > >> > > > > >> > > > > brokers,
> > > > > > > > >> > > > > >> > > > > with the experiments done in the Google
> > > Doc,
> > > > it
> > > > > > > > should
> > > > > > > > >> be
> > > > > > > > >> > > easy
> > > > > > > > >> > > > > to
> > > > > > > > >> > > > > >> see
> > > > > > > > >> > > > > >> > > > that
> > > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > > > up-to-date
> > > > > > > > >> metadata
> > > > > > > > >> > > > avoids
> > > > > > > > >> > > > > >> > > appending
> > > > > > > > >> > > > > >> > > > > of messages
> > > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
> > talk
> > > > to a
> > > > > > > > system
> > > > > > > > >> > > > operator
> > > > > > > > >> > > > > >> and
> > > > > > > > >> > > > > >> > ask
> > > > > > > > >> > > > > >> > > > > whether
> > > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
> > most
> > > > > likely
> > > > > > > the
> > > > > > > > >> > answer
> > > > > > > > >> > > > is
> > > > > > > > >> > > > > >> yes.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> > > might
> > > > be
> > > > > > > > >> helpful to
> > > > > > > > >> > > > > >> construct
> > > > > > > > >> > > > > >> > > some
> > > > > > > > >> > > > > >> > > > > formulas.
> > > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going
> back
> > to
> > > > the
> > > > > > > case
> > > > > > > > >> where
> > > > > > > > >> > > > there
> > > > > > > > >> > > > > >> is
> > > > > > > > >> > > > > >> > > only
> > > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > > > >> > > > > >> > > > > Following the experiments in the Google
> > > Doc,
> > > > > > let's
> > > > > > > > say
> > > > > > > > >> > > broker0
> > > > > > > > >> > > > > >> > becomes
> > > > > > > > >> > > > > >> > > > the
> > > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > > > requests
> > > > > > in
> > > > > > > > its
> > > > > > > > >> > > > request
> > > > > > > > >> > > > > >> > queue.
> > > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> > > this
> > > > > KIP,
> > > > > > > > >> broker0
> > > > > > > > >> > > can
> > > > > > > > >> > > > > >> reply
> > > > > > > > >> > > > > >> > > with
> > > > > > > > >> > > > > >> > > > an
> > > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > > processing
> > > > > > time
> > > > > > > of
> > > > > > > > >> > > replying
> > > > > > > > >> > > > > >> with
> > > > > > > > >> > > > > >> > > such
> > > > > > > > >> > > > > >> > > > an
> > > > > > > > >> > > > > >> > > > > error message.
> > > > > > > > >> > > > > >> > > > > Without this KIP, the broker will need
> to
> > > > > append
> > > > > > > > >> messages
> > > > > > > > >> > to
> > > > > > > > >> > > > > >> > segments,
> > > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > > processing
> > > > > > time
> > > > > > > > for
> > > > > > > > >> > such
> > > > > > > > >> > > > > logic.
> > > > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > > > without
> > > > > > > this
> > > > > > > > >> KIP
> > > > > > > > >> > is
> > > > > > > > >> > > N
> > > > > > > > >> > > > *
> > > > > > > > >> > > > > >> (M2 -
> > > > > > > > >> > > > > >> > > > M1) /
> > > > > > > > >> > > > > >> > > > > 2.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> > > than
> > > > > M1,
> > > > > > > > which
> > > > > > > > >> > means
> > > > > > > > >> > > > as
> > > > > > > > >> > > > > >> long
> > > > > > > > >> > > > > >> > > as N
> > > > > > > > >> > > > > >> > > > > is positive,
> > > > > > > > >> > > > > >> > > > > we would see improvements on the
> average
> > > > > latency.
> > > > > > > > >> > > > > >> > > > > There does not need to be significant
> > > backlog
> > > > > of
> > > > > > > > >> requests
> > > > > > > > >> > in
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > > request
> > > > > > > > >> > > > > >> > > > > queue,
> > > > > > > > >> > > > > >> > > > > or severe degradation of disk
> performance
> > > to
> > > > > have
> > > > > > > the
> > > > > > > > >> > > > > improvement.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > Regards,
> > > > > > > > >> > > > > >> > > > > Lucas
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout
> on
> > > the
> > > > > > > > producer
> > > > > > > > >> > side
> > > > > > > > >> > > > can
> > > > > > > > >> > > > > >> > trigger
> > > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > > > overloaded,
> > > > > > > > >> > > > exacerbating
> > > > > > > > >> > > > > >> the
> > > > > > > > >> > > > > >> > > > > situation.
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong
> Lin
> > <
> > > > > > > > >> > > lindong28@gmail.com
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > >> > wrote:
> > > > > > > > >> > > > > >> > > > >
> > > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > > documentation
> > > > of
> > > > > > the
> > > > > > > > >> > > > experiment.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Initially I also think having a
> > separate
> > > > > queue
> > > > > > > for
> > > > > > > > >> > > > controller
> > > > > > > > >> > > > > >> > > requests
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in
> the
> > > > > summary
> > > > > > > > >> section
> > > > > > > > >> > of
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > Google
> > > > > > > > >> > > > > >> > > > > doc,
> > > > > > > > >> > > > > >> > > > > > controller requests are generally
> more
> > > > > > important
> > > > > > > > than
> > > > > > > > >> > data
> > > > > > > > >> > > > > >> requests
> > > > > > > > >> > > > > >> > > and
> > > > > > > > >> > > > > >> > > > > we
> > > > > > > > >> > > > > >> > > > > > probably want controller requests to
> be
> > > > > > processed
> > > > > > > > >> > sooner.
> > > > > > > > >> > > > But
> > > > > > > > >> > > > > >> then
> > > > > > > > >> > > > > >> > > Eno
> > > > > > > > >> > > > > >> > > > > has
> > > > > > > > >> > > > > >> > > > > > two very good questions which I am
> not
> > > sure
> > > > > the
> > > > > > > > >> Google
> > > > > > > > >> > doc
> > > > > > > > >> > > > has
> > > > > > > > >> > > > > >> > > answered
> > > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > > > following
> > > > > > > > >> questions?
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> > > actual
> > > > > > > benefit
> > > > > > > > of
> > > > > > > > >> > > > KIP-291
> > > > > > > > >> > > > > to
> > > > > > > > >> > > > > >> > > users.
> > > > > > > > >> > > > > >> > > > > The
> > > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > > > simulates
> > > > > > the
> > > > > > > > >> > scenario
> > > > > > > > >> > > > that
> > > > > > > > >> > > > > >> > broker
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due
> > to
> > > > e.g.
> > > > > > > slow
> > > > > > > > >> disk.
> > > > > > > > >> > > It
> > > > > > > > >> > > > > >> > currently
> > > > > > > > >> > > > > >> > > > > > assumes that there is only 1
> partition.
> > > But
> > > > > in
> > > > > > > the
> > > > > > > > >> > common
> > > > > > > > >> > > > > >> scenario,
> > > > > > > > >> > > > > >> > > it
> > > > > > > > >> > > > > >> > > > is
> > > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> > there
> > > > are
> > > > > > many
> > > > > > > > >> other
> > > > > > > > >> > > > > >> partitions
> > > > > > > > >> > > > > >> > > that
> > > > > > > > >> > > > > >> > > > > are
> > > > > > > > >> > > > > >> > > > > > also actively produced to and
> > > > ProduceRequest
> > > > > to
> > > > > > > > these
> > > > > > > > >> > > > > partition
> > > > > > > > >> > > > > >> > also
> > > > > > > > >> > > > > >> > > > > takes
> > > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So
> even
> > > if
> > > > > > > broker0
> > > > > > > > >> can
> > > > > > > > >> > > > become
> > > > > > > > >> > > > > >> > > follower
> > > > > > > > >> > > > > >> > > > > for
> > > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably
> still
> > > > needs
> > > > > > to
> > > > > > > > >> process
> > > > > > > > >> > > the
> > > > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > > > ProduceRequests
> > > > > > > > >> > cover
> > > > > > > > >> > > > > other
> > > > > > > > >> > > > > >> > > > > partitions.
> > > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> > > timeout
> > > > > > after
> > > > > > > > 30
> > > > > > > > >> > > seconds
> > > > > > > > >> > > > > and
> > > > > > > > >> > > > > >> > most
> > > > > > > > >> > > > > >> > > > > > clients will still likely timeout
> after
> > > 30
> > > > > > > seconds.
> > > > > > > > >> Then
> > > > > > > > >> > > it
> > > > > > > > >> > > > is
> > > > > > > > >> > > > > >> not
> > > > > > > > >> > > > > >> > > > > > obviously what is the benefit to
> client
> > > > since
> > > > > > > > client
> > > > > > > > >> > will
> > > > > > > > >> > > > > >> timeout
> > > > > > > > >> > > > > >> > > after
> > > > > > > > >> > > > > >> > > > > 30
> > > > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting
> > to
> > > > > > broker1,
> > > > > > > > >> with
> > > > > > > > >> > or
> > > > > > > > >> > > > > >> without
> > > > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> > > specific
> > > > > > > > benefits
> > > > > > > > >> of
> > > > > > > > >> > > this
> > > > > > > > >> > > > > >> KIP to
> > > > > > > > >> > > > > >> > > > user
> > > > > > > > >> > > > > >> > > > > or
> > > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether
> this
> > > KIP
> > > > > > > > decreases
> > > > > > > > >> > > > average
> > > > > > > > >> > > > > >> > > latency,
> > > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > > > exception
> > > > > > > > >> exposed
> > > > > > > > >> > to
> > > > > > > > >> > > > > >> client
> > > > > > > > >> > > > > >> > > etc.
> > > > > > > > >> > > > > >> > > > It
> > > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > > > experience
> > > > > > > only
> > > > > > > > >> when
> > > > > > > > >> > > > there
> > > > > > > > >> > > > > is
> > > > > > > > >> > > > > >> > > issue
> > > > > > > > >> > > > > >> > > > > with
> > > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in
> the
> > > > > request
> > > > > > > > queue
> > > > > > > > >> > due
> > > > > > > > >> > > to
> > > > > > > > >> > > > > >> slow
> > > > > > > > >> > > > > >> > > disk
> > > > > > > > >> > > > > >> > > > as
> > > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is
> this
> > > KIP
> > > > > > also
> > > > > > > > >> useful
> > > > > > > > >> > > when
> > > > > > > > >> > > > > >> there
> > > > > > > > >> > > > > >> > is
> > > > > > > > >> > > > > >> > > > no
> > > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It
> might
> > be
> > > > > > helpful
> > > > > > > > to
> > > > > > > > >> > > clarify
> > > > > > > > >> > > > > >> this
> > > > > > > > >> > > > > >> > to
> > > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > > > >> > > > > >> > > > > > Dong
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM,
> Lucas
> > > > Wang <
> > > > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > > > >> > > > > >> > >
> > > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > > > experiment
> > > > > > > > >> results.
> > > > > > > > >> > > > > >> > > > > > > Here is a link to the positive
> impact
> > > > > > achieved
> > > > > > > by
> > > > > > > > >> > > > > implementing
> > > > > > > > >> > > > > >> > the
> > > > > > > > >> > > > > >> > > > > > proposed
> > > > > > > > >> > > > > >> > > > > > > change:
> > > > > > > > >> > > > > >> > > > > > > https://docs.google.com/
> document/d/
> > > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=
> sharing
> > > > > > > > >> > > > > >> > > > > > > Please take a look when you have
> time
> > > and
> > > > > let
> > > > > > > me
> > > > > > > > >> know
> > > > > > > > >> > > your
> > > > > > > > >> > > > > >> > > feedback.
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> > > Harsha <
> > > > > > > > >> > > kafka@harsha.io>
> > > > > > > > >> > > > > >> wrote:
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will
> take a
> > > > look
> > > > > > > might
> > > > > > > > >> suit
> > > > > > > > >> > > our
> > > > > > > > >> > > > > >> > > > requirements
> > > > > > > > >> > > > > >> > > > > > > > better.
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52
> PM,
> > > > Lucas
> > > > > > > Wang <
> > > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > > > >> > > > > >> > > > > >
> > > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > > > replication
> > > > > > > > >> quota
> > > > > > > > >> > > > > mechanism
> > > > > > > > >> > > > > >> > > > proposed
> > > > > > > > >> > > > > >> > > > > > in
> > > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > > > scenario.
> > > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28
> AM,
> > > > > Harsha <
> > > > > > > > >> > > > > kafka@harsha.io
> > > > > > > > >> > > > > >> >
> > > > > > > > >> > > > > >> > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > > > >> > > > > >> > > > > > > > > > One more question, any
> thoughts
> > > on
> > > > > > making
> > > > > > > > >> this
> > > > > > > > >> > > > > >> configurable
> > > > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of
> > data
> > > > > > requests
> > > > > > > > to
> > > > > > > > >> be
> > > > > > > > >> > > > > >> > prioritized.
> > > > > > > > >> > > > > >> > > > For
> > > > > > > > >> > > > > >> > > > > > > > example
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster
> when
> > we
> > > > > take
> > > > > > > out
> > > > > > > > a
> > > > > > > > >> > > broker
> > > > > > > > >> > > > > and
> > > > > > > > >> > > > > >> > bring
> > > > > > > > >> > > > > >> > > > new
> > > > > > > > >> > > > > >> > > > > > one
> > > > > > > > >> > > > > >> > > > > > > > it
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > will try to become follower
> and
> > > > have
> > > > > > lot
> > > > > > > of
> > > > > > > > >> > fetch
> > > > > > > > >> > > > > >> requests
> > > > > > > > >> > > > > >> > to
> > > > > > > > >> > > > > >> > > > > other
> > > > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > > > >> > > > > >> > > > > > > > > > in clusters. This will
> > negatively
> > > > > > effect
> > > > > > > > the
> > > > > > > > >> > > > > >> > > application/client
> > > > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > > > >> > > > > >> > > > > > > > > > We are also exploring the
> > similar
> > > > > > > solution
> > > > > > > > to
> > > > > > > > >> > > > > >> de-prioritize
> > > > > > > > >> > > > > >> > > if
> > > > > > > > >> > > > > >> > > > a
> > > > > > > > >> > > > > >> > > > > > new
> > > > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > > > requests,
> > > > > we
> > > > > > > are
> > > > > > > > >> ok
> > > > > > > > >> > > with
> > > > > > > > >> > > > > the
> > > > > > > > >> > > > > >> > > replica
> > > > > > > > >> > > > > >> > > > > to
> > > > > > > > >> > > > > >> > > > > > be
> > > > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> > > should
> > > > > > > > prioritize
> > > > > > > > >> > the
> > > > > > > > >> > > > > client
> > > > > > > > >> > > > > >> > > > > requests.
> > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at
> 11:35
> > > AM
> > > > > > Lucas
> > > > > > > > Wang
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed
> > response.
> > > > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> > > > feature
> > > > > > > yet,
> > > > > > > > >> so no
> > > > > > > > >> > > > > >> > experimental
> > > > > > > > >> > > > > >> > > > > > results
> > > > > > > > >> > > > > >> > > > > > > > so
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out
> in
> > > the
> > > > > > > > following
> > > > > > > > >> > days.
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right
> > that
> > > > the
> > > > > > > > >> priority
> > > > > > > > >> > > queue
> > > > > > > > >> > > > > >> does
> > > > > > > > >> > > > > >> > not
> > > > > > > > >> > > > > >> > > > > > > > completely
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > > > >> > > > > >> > > > > > > > > > > data requests being
> processed
> > > > ahead
> > > > > > of
> > > > > > > > >> > > controller
> > > > > > > > >> > > > > >> > requests.
> > > > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect
> it
> > to
> > > > > > greatly
> > > > > > > > >> > mitigate
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > effect
> > > > > > > > >> > > > > >> > > > of
> > > > > > > > >> > > > > >> > > > > > > stable
> > > > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it
> out
> > > and
> > > > > post
> > > > > > > the
> > > > > > > > >> > > results
> > > > > > > > >> > > > > >> when I
> > > > > > > > >> > > > > >> > > have
> > > > > > > > >> > > > > >> > > > > it.
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at
> 5:44
> > > AM,
> > > > > Eno
> > > > > > > > >> Thereska
> > > > > > > > >> > <
> > > > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just
> > > had a
> > > > > > look
> > > > > > > at
> > > > > > > > >> > this.
> > > > > > > > >> > > A
> > > > > > > > >> > > > > >> couple
> > > > > > > > >> > > > > >> > of
> > > > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any
> > positive
> > > > > > change
> > > > > > > > >> after
> > > > > > > > >> > > > > >> implementing
> > > > > > > > >> > > > > >> > > > this
> > > > > > > > >> > > > > >> > > > > > KIP?
> > > > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > > > > experimental
> > > > > > > > >> > results
> > > > > > > > >> > > > > that
> > > > > > > > >> > > > > >> > show
> > > > > > > > >> > > > > >> > > > the
> > > > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > > > >> > > > > >> > > > > > > > > of
> > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > > > > sufficient
> > > > > > > in
> > > > > > > > >> > > > addressing
> > > > > > > > >> > > > > >> the
> > > > > > > > >> > > > > >> > > > > problem
> > > > > > > > >> > > > > >> > > > > > > the
> > > > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> > > priority
> > > > > > > queues,
> > > > > > > > >> you
> > > > > > > > >> > > will
> > > > > > > > >> > > > > >> > sometimes
> > > > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > > > >> > > > > >> > > > > > > > > have
> > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > >> > > > > >> > > > > > > > > > > > case that data plane
> > requests
> > > > > will
> > > > > > be
> > > > > > > > >> ahead
> > > > > > > > >> > of
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > >> > > control
> > > > > > > > >> > > > > >> > > > > > plane
> > > > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > > > >> > > > > >> > > > > > > > > > > > This happens because the
> > > system
> > > > > > might
> > > > > > > > >> have
> > > > > > > > >> > > > already
> > > > > > > > >> > > > > >> > > started
> > > > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > > >> > > > > >> > > > > > > > > > > > data plane requests
> before
> > > the
> > > > > > > control
> > > > > > > > >> plane
> > > > > > > > >> > > > ones
> > > > > > > > >> > > > > >> > > arrived.
> > > > > > > > >> > > > > >> > > > So
> > > > > > > > >> > > > > >> > > > > > it
> > > > > > > > >> > > > > >> > > > > > > > > would
> > > > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of
> the
> > > > > problem
> > > > > > > this
> > > > > > > > >> KIP
> > > > > > > > >> > > > > >> addresses.
> > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at
> > 4:44
> > > > PM,
> > > > > > Ted
> > > > > > > > Yu <
> > > > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > > > >> > > > > >> > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at
> > > 8:42
> > > > > AM,
> > > > > > > > Lucas
> > > > > > > > >> > Wang
> > > > > > > > >> > > <
> > > > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> > > suggestion.
> > > > > I've
> > > > > > > > >> updated
> > > > > > > > >> > > the
> > > > > > > > >> > > > > KIP.
> > > > > > > > >> > > > > >> > > Please
> > > > > > > > >> > > > > >> > > > > > take
> > > > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018
> at
> > > > 6:34
> > > > > > PM,
> > > > > > > > Ted
> > > > > > > > >> Yu
> > > > > > > > >> > <
> > > > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > > > KafkaConfig.scala
> > > > > > > :
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > > val
> > QueuedMaxRequests =
> > > > 500
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if
> > you
> > > > can
> > > > > > > > include
> > > > > > > > >> > the
> > > > > > > > >> > > > > >> default
> > > > > > > > >> > > > > >> > > value
> > > > > > > > >> > > > > >> > > > > for
> > > > > > > > >> > > > > >> > > > > > > > this
> > > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > > >>
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

I think for now we can probably discuss based on the existing Kafka's
design where controller to a broker is hard coded to be 1. It looks like
Becket has provided a good example in which requests from the same
controller can be processed out of order.

Thanks,
Dong

On Wed, Jul 18, 2018 at 8:35 PM, Lucas Wang <lu...@gmail.com> wrote:

> @Becket and Dong,
> I think currently the ordering guarantee is achieved because
> the max inflight request from the controller to a broker is hard coded to
> be 1.
>
> If let's hypothetically say the max inflight requests is > 1, then I think
> Dong
> is right to say that even the separate queue cannot guarantee ordered
> processing,
> For example, Req1 and Req2 are sent to a broker, and after a connection
> reconnection,
> both requests are sent again, causing the broker to have 4 requests in the
> following order
> Req2 > Req1 > Req2 > Req1.
>
> In summary, it seems using the dequeue should not cause problems with
> out-of-order processing.
> Is that right?
>
> Lucas
>
> On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Becket,
> >
> > It seems that the requests from the old controller will be discarded due
> to
> > old controller epoch. It is not clear whether this is a problem.
> >
> > And if this out-of-order processing of controller requests is a problem,
> it
> > seems like an existing problem which also applies to the multi-queue
> based
> > design. So it is probably not a concern specific to the use of deque.
> Does
> > that sound reasonable?
> >
> > Thanks,
> > Dong
> >
> >
> > On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com> wrote:
> >
> > > Hi Mayuresh/Joel,
> > >
> > > Using the request channel as a dequeue was bright up some time ago when
> > we
> > > initially thinking of prioritizing the request. The concern was that
> the
> > > controller requests are supposed to be processed in order. If we can
> > ensure
> > > that there is one controller request in the request channel, the order
> is
> > > not a concern. But in cases that there are more than one controller
> > request
> > > inserted into the queue, the controller request order may change and
> > cause
> > > problem. For example, think about the following sequence:
> > > 1. Controller successfully sent a request R1 to broker
> > > 2. Broker receives R1 and put the request to the head of the request
> > queue.
> > > 3. Controller to broker connection failed and the controller
> reconnected
> > to
> > > the broker.
> > > 4. Controller sends a request R2 to the broker
> > > 5. Broker receives R2 and add it to the head of the request queue.
> > > Now on the broker side, R2 will be processed before R1 is processed,
> > which
> > > may cause problem.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > @Mayuresh - I like your idea. It appears to be a simpler less
> invasive
> > > > alternative and it should work. Jun/Becket/others, do you see any
> > > pitfalls
> > > > with this approach?
> > > >
> > > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > @Mayuresh,
> > > > > That's a very interesting idea that I haven't thought before.
> > > > > It seems to solve our problem at hand pretty well, and also
> > > > > avoids the need to have a new size metric and capacity config
> > > > > for the controller request queue. In fact, if we were to adopt
> > > > > this design, there is no public interface change, and we
> > > > > probably don't need a KIP.
> > > > > Also implementation wise, it seems
> > > > > the java class LinkedBlockingQueue can readily satisfy the
> > requirement
> > > > > by supporting a capacity, and also allowing inserting at both ends.
> > > > >
> > > > > My only concern is that this design is tied to the coincidence that
> > > > > we have two request priorities and there are two ends to a deque.
> > > > > Hence by using the proposed design, it seems the network layer is
> > > > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > > > an extra priority level in the future for some reason, we would
> > > probably
> > > > > need to go back to the design of separate queues, one for each
> > priority
> > > > > level.
> > > > >
> > > > > In summary, I'm ok with both designs and lean toward your suggested
> > > > > approach.
> > > > > Let's hear what others think.
> > > > >
> > > > > @Becket,
> > > > > In light of Mayuresh's suggested new design, I'm answering your
> > > question
> > > > > only in the context
> > > > > of the current KIP design: I think your suggestion makes sense, and
> > I'm
> > > > ok
> > > > > with removing the capacity config and
> > > > > just relying on the default value of 20 being sufficient enough.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > > gharatmayuresh15@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > >
> > > > > > Seems like the main intent here is to prioritize the controller
> > > request
> > > > > > over any other requests.
> > > > > > In that case, we can change the request queue to a dequeue, where
> > you
> > > > > > always insert the normal requests (produce, consume,..etc) to the
> > end
> > > > of
> > > > > > the dequeue, but if its a controller request, you insert it to
> the
> > > head
> > > > > of
> > > > > > the queue. This ensures that the controller request will be given
> > > > higher
> > > > > > priority over other requests.
> > > > > >
> > > > > > Also since we only read one request from the socket and mute it
> and
> > > > only
> > > > > > unmute it after handling the request, this would ensure that we
> > don't
> > > > > > handle controller requests out of order.
> > > > > >
> > > > > > With this approach we can avoid the second queue and the
> additional
> > > > > config
> > > > > > for the size of the queue.
> > > > > >
> > > > > > What do you think ?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hey Joel,
> > > > > > >
> > > > > > > Thank for the detail explanation. I agree the current design
> > makes
> > > > > sense.
> > > > > > > My confusion is about whether the new config for the controller
> > > queue
> > > > > > > capacity is necessary. I cannot think of a case in which users
> > > would
> > > > > > change
> > > > > > > it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Lucas,
> > > > > > > >
> > > > > > > > I guess my question can be rephrased to "do we expect user to
> > > ever
> > > > > > change
> > > > > > > > the controller request queue capacity"? If we agree that 20
> is
> > > > > already
> > > > > > a
> > > > > > > > very generous default number and we do not expect user to
> > change
> > > > it,
> > > > > is
> > > > > > > it
> > > > > > > > still necessary to expose this as a config?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> @Becket
> > > > > > > >> 1. Thanks for the comment. You are right that normally there
> > > > should
> > > > > be
> > > > > > > >> just
> > > > > > > >> one controller request because of muting,
> > > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > > controller
> > > > > > > >> requests.
> > > > > > > >> I went through the KIP again, and I'm not sure which part
> > > conveys
> > > > > that
> > > > > > > >> info.
> > > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > > >>
> > > > > > > >> 2. Though it should not happen in normal conditions, the
> > current
> > > > > > design
> > > > > > > >> does not preclude multiple controllers running
> > > > > > > >> at the same time, hence if we don't have the controller
> queue
> > > > > capacity
> > > > > > > >> config and simply make its capacity to be 1,
> > > > > > > >> network threads handling requests from different controllers
> > > will
> > > > be
> > > > > > > >> blocked during those troublesome times,
> > > > > > > >> which is probably not what we want. On the other hand,
> adding
> > > the
> > > > > > extra
> > > > > > > >> config with a default value, say 20, guards us from issues
> in
> > > > those
> > > > > > > >> troublesome times, and IMO there isn't much downside of
> adding
> > > the
> > > > > > extra
> > > > > > > >> config.
> > > > > > > >>
> > > > > > > >> @Mayuresh
> > > > > > > >> Good catch, this sentence is an obsolete statement based on
> a
> > > > > previous
> > > > > > > >> design. I've revised the wording in the KIP.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Lucas
> > > > > > > >>
> > > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Lucas,
> > > > > > > >> >
> > > > > > > >> > Thanks for the KIP.
> > > > > > > >> > I am trying to understand why you think "The memory
> > > consumption
> > > > > can
> > > > > > > rise
> > > > > > > >> > given the total number of queued requests can go up to 2x"
> > in
> > > > the
> > > > > > > impact
> > > > > > > >> > section. Normally the requests from controller to a Broker
> > are
> > > > not
> > > > > > > high
> > > > > > > >> > volume, right ?
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > Mayuresh
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
> > from
> > > > the
> > > > > > > data
> > > > > > > >> > plane
> > > > > > > >> > > makes a lot of sense.
> > > > > > > >> > >
> > > > > > > >> > > In the KIP you mentioned that the controller request
> queue
> > > may
> > > > > > have
> > > > > > > >> many
> > > > > > > >> > > requests in it. Will this be a common case? The
> controller
> > > > > > requests
> > > > > > > >> still
> > > > > > > >> > > goes through the SocketServer. The SocketServer will
> mute
> > > the
> > > > > > > channel
> > > > > > > >> > once
> > > > > > > >> > > a request is read and put into the request channel. So
> > > > assuming
> > > > > > > there
> > > > > > > >> is
> > > > > > > >> > > only one connection between controller and each broker,
> on
> > > the
> > > > > > > broker
> > > > > > > >> > side,
> > > > > > > >> > > there should be only one controller request in the
> > > controller
> > > > > > > request
> > > > > > > >> > queue
> > > > > > > >> > > at any given time. If that is the case, do we need a
> > > separate
> > > > > > > >> controller
> > > > > > > >> > > request queue capacity config? The default value 20
> means
> > > that
> > > > > we
> > > > > > > >> expect
> > > > > > > >> > > there are 20 controller switches to happen in a short
> > period
> > > > of
> > > > > > > time.
> > > > > > > >> I
> > > > > > > >> > am
> > > > > > > >> > > not sure whether someone should increase the controller
> > > > request
> > > > > > > queue
> > > > > > > >> > > capacity to handle such case, as it seems indicating
> > > something
> > > > > > very
> > > > > > > >> wrong
> > > > > > > >> > > has happened.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > >
> > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > > lindong28@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Thanks for the update Lucas.
> > > > > > > >> > > >
> > > > > > > >> > > > I think the motivation section is intuitive. It will
> be
> > > good
> > > > > to
> > > > > > > >> learn
> > > > > > > >> > > more
> > > > > > > >> > > > about the comments from other reviewers.
> > > > > > > >> > > >
> > > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi Dong,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > > explaining
> > > > > > the
> > > > > > > >> > cases
> > > > > > > >> > > > that
> > > > > > > >> > > > > would have user impacts.
> > > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thanks,
> > > > > > > >> > > > > Lucas
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > >> >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Hi Dong,
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > The simulation of disk being slow is merely for me
> > to
> > > > > easily
> > > > > > > >> > > construct
> > > > > > > >> > > > a
> > > > > > > >> > > > > > testing scenario
> > > > > > > >> > > > > > with a backlog of produce requests. In production,
> > > other
> > > > > > than
> > > > > > > >> the
> > > > > > > >> > > disk
> > > > > > > >> > > > > > being slow, a backlog of
> > > > > > > >> > > > > > produce requests may also be caused by high
> produce
> > > QPS.
> > > > > > > >> > > > > > In that case, we may not want to kill the broker
> and
> > > > > that's
> > > > > > > when
> > > > > > > >> > this
> > > > > > > >> > > > KIP
> > > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > > >> > > > > > and non-JBOD setup.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Going back to your previous question about each
> > > > > > ProduceRequest
> > > > > > > >> > > covering
> > > > > > > >> > > > > 20
> > > > > > > >> > > > > > partitions that are randomly
> > > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > > enqueued
> > > > > > that
> > > > > > > >> > tries
> > > > > > > >> > > to
> > > > > > > >> > > > > > switch the current broker, say broker0, from
> leader
> > to
> > > > > > > follower
> > > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> > > sake
> > > > of
> > > > > > > >> > argument,
> > > > > > > >> > > > > > let's also assume the other brokers, say broker1,
> > have
> > > > > > > *stopped*
> > > > > > > >> > > > fetching
> > > > > > > >> > > > > > from
> > > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > > >> > > > > > 1. If the enqueued produce requests have acks =
> -1
> > > > (ALL)
> > > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead
> of
> > > > > > > >> LeaderAndISR
> > > > > > > >> > > will
> > > > > > > >> > > > be
> > > > > > > >> > > > > > put into the purgatory,
> > > > > > > >> > > > > >         and since they'll never be replicated to
> > other
> > > > > > brokers
> > > > > > > >> > > (because
> > > > > > > >> > > > > of
> > > > > > > >> > > > > > the assumption made above), they will
> > > > > > > >> > > > > >         be completed either when the LeaderAndISR
> > > > request
> > > > > is
> > > > > > > >> > > processed
> > > > > > > >> > > > or
> > > > > > > >> > > > > > when the timeout happens.
> > > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > > transition
> > > > > the
> > > > > > > >> > > partition
> > > > > > > >> > > > > > test-0 to become a follower,
> > > > > > > >> > > > > >         after the current broker sees the
> > replication
> > > of
> > > > > the
> > > > > > > >> > > remaining
> > > > > > > >> > > > 19
> > > > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > > > >> > > > > >         it's no longer the leader for the
> "test-0".
> > > > > > > >> > > > > >   To see the latency difference between 1.1 and
> 1.2,
> > > > let's
> > > > > > say
> > > > > > > >> > there
> > > > > > > >> > > > are
> > > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR,
> and
> > > > there
> > > > > > are
> > > > > > > 8
> > > > > > > >> io
> > > > > > > >> > > > > threads,
> > > > > > > >> > > > > >   so each io thread will process approximately
> 3000
> > > > > produce
> > > > > > > >> > requests.
> > > > > > > >> > > > Now
> > > > > > > >> > > > > > let's investigate the io thread that finally
> > processed
> > > > the
> > > > > > > >> > > > LeaderAndISR.
> > > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> > time
> > > > when
> > > > > > > their
> > > > > > > >> > > > > remaining
> > > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and
> the
> > > > > > > LeaderAndISR
> > > > > > > >> > > > request
> > > > > > > >> > > > > is
> > > > > > > >> > > > > > processed at time t3000.
> > > > > > > >> > > > > >   Without this KIP, the 1st produce request would
> > have
> > > > > > waited
> > > > > > > an
> > > > > > > >> > > extra
> > > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> > > time
> > > > of
> > > > > > > >> t3000 -
> > > > > > > >> > > t1,
> > > > > > > >> > > > > etc.
> > > > > > > >> > > > > >   Roughly speaking, the latency difference is
> bigger
> > > for
> > > > > the
> > > > > > > >> > earlier
> > > > > > > >> > > > > > produce requests than for the later ones. For the
> > same
> > > > > > reason,
> > > > > > > >> the
> > > > > > > >> > > more
> > > > > > > >> > > > > > ProduceRequests queued
> > > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we
> get
> > > > > (capped
> > > > > > > by
> > > > > > > >> the
> > > > > > > >> > > > > > produce timeout).
> > > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > > > acks=1
> > > > > > > >> > > > > >   There will be no latency differences in this
> case,
> > > but
> > > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > > test-0
> > > > in
> > > > > > the
> > > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > > > appended
> > > > > > to
> > > > > > > >> the
> > > > > > > >> > > local
> > > > > > > >> > > > > log,
> > > > > > > >> > > > > >         and eventually be truncated after
> processing
> > > the
> > > > > > > >> > > LeaderAndISR.
> > > > > > > >> > > > > > This is what's referred to as
> > > > > > > >> > > > > >         "some unofficial definition of data loss
> in
> > > > terms
> > > > > of
> > > > > > > >> > messages
> > > > > > > >> > > > > > beyond the high watermark".
> > > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> > since
> > > if
> > > > > the
> > > > > > > >> > > > LeaderAndISR
> > > > > > > >> > > > > > is immediately processed, the response to
> producers
> > > will
> > > > > > have
> > > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > > producers
> > > > > > to
> > > > > > > >> retry
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > This explanation above is the benefit for reducing
> > the
> > > > > > latency
> > > > > > > >> of a
> > > > > > > >> > > > > broker
> > > > > > > >> > > > > > becoming the follower,
> > > > > > > >> > > > > > closely related is reducing the latency of a
> broker
> > > > > becoming
> > > > > > > the
> > > > > > > >> > > > leader.
> > > > > > > >> > > > > > In this case, the benefit is even more obvious, if
> > > other
> > > > > > > brokers
> > > > > > > >> > have
> > > > > > > >> > > > > > resigned leadership, and the
> > > > > > > >> > > > > > current broker should take leadership. Any delay
> in
> > > > > > processing
> > > > > > > >> the
> > > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > > >> > > > > > by clients as unavailability. In extreme cases,
> this
> > > can
> > > > > > cause
> > > > > > > >> > failed
> > > > > > > >> > > > > > produce requests if the retries are
> > > > > > > >> > > > > > exhausted.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Another two types of controller requests are
> > > > > UpdateMetadata
> > > > > > > and
> > > > > > > >> > > > > > StopReplica, which I'll briefly discuss as
> follows:
> > > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> > means
> > > > > > clients
> > > > > > > >> > > receiving
> > > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership
> info
> > > > > > > >> > > > > > for certain partitions, and the effect is more
> > retries
> > > > or
> > > > > > even
> > > > > > > >> > fatal
> > > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > > > degrade
> > > > > > the
> > > > > > > >> > > > performance
> > > > > > > >> > > > > > of topic deletion.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > > >> > DescribeLogDirsRequest,
> > > > > > > >> > > > you
> > > > > > > >> > > > > > are right
> > > > > > > >> > > > > > that this KIP cannot help with the latency in
> > getting
> > > > the
> > > > > > log
> > > > > > > >> dirs
> > > > > > > >> > > > info,
> > > > > > > >> > > > > > and it's only relevant
> > > > > > > >> > > > > > when controller requests are involved.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regards,
> > > > > > > >> > > > > > Lucas
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > > lindong28@gmail.com
> > > > > > > >
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >> Hey Jun,
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Thanks much for the comments. It is good point.
> So
> > > the
> > > > > > > feature
> > > > > > > >> may
> > > > > > > >> > > be
> > > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> > below.
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Hey Lucas,
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Do you think this feature is also useful for
> > non-JBOD
> > > > > setup
> > > > > > > or
> > > > > > > >> it
> > > > > > > >> > is
> > > > > > > >> > > > > only
> > > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > > understand
> > > > > > > this.
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> When the broker is setup using JBOD, in order to
> > move
> > > > > > leaders
> > > > > > > >> on
> > > > > > > >> > the
> > > > > > > >> > > > > >> failed
> > > > > > > >> > > > > >> disk to other disks, the system operator first
> > needs
> > > to
> > > > > get
> > > > > > > the
> > > > > > > >> > list
> > > > > > > >> > > > of
> > > > > > > >> > > > > >> partitions on the failed disk. This is currently
> > > > achieved
> > > > > > > using
> > > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > > >> DescribeLogDirsRequest
> > > > > > > >> > to
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> broker. If we only prioritize the controller
> > > requests,
> > > > > then
> > > > > > > the
> > > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > > >> > > > > >> may still take a long time to be processed by the
> > > > broker.
> > > > > > So
> > > > > > > >> the
> > > > > > > >> > > > overall
> > > > > > > >> > > > > >> time to move leaders away from the failed disk
> may
> > > > still
> > > > > be
> > > > > > > >> long
> > > > > > > >> > > even
> > > > > > > >> > > > > with
> > > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> Thanks,
> > > > > > > >> > > > > >> Dong
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > > >> lucasatucla@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >>
> > > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > @Dong,
> > > > > > > >> > > > > >> > Since both of the two comments in your previous
> > > email
> > > > > are
> > > > > > > >> about
> > > > > > > >> > > the
> > > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > > >> > > > > >> > in light of Jun's last comment, do you agree
> that
> > > > this
> > > > > > KIP
> > > > > > > >> can
> > > > > > > >> > be
> > > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > Regards,
> > > > > > > >> > > > > >> > Lucas
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > > jun@confluent.io>
> > > > > > > >> > wrote:
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > If all disks on a broker are slow, one
> probably
> > > > > should
> > > > > > > just
> > > > > > > >> > kill
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > broker. In that case, this KIP may not help.
> If
> > > > only
> > > > > > one
> > > > > > > of
> > > > > > > >> > the
> > > > > > > >> > > > > disks
> > > > > > > >> > > > > >> on
> > > > > > > >> > > > > >> > a
> > > > > > > >> > > > > >> > > broker is slow, one may want to fail that
> disk
> > > and
> > > > > move
> > > > > > > the
> > > > > > > >> > > > leaders
> > > > > > > >> > > > > on
> > > > > > > >> > > > > >> > that
> > > > > > > >> > > > > >> > > disk to other brokers. In that case, being
> able
> > > to
> > > > > > > process
> > > > > > > >> the
> > > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > > >> > > > > >> > > requests faster will potentially help the
> > > producers
> > > > > > > recover
> > > > > > > >> > > > quicker.
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Thanks,
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > Jun
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > > >> lindong28@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> > questions
> > > > > below.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers
> 20
> > > > > > > partitions
> > > > > > > >> > that
> > > > > > > >> > > > are
> > > > > > > >> > > > > >> > > randomly
> > > > > > > >> > > > > >> > > > distributed across all partitions, then
> each
> > > > > > > >> ProduceRequest
> > > > > > > >> > > will
> > > > > > > >> > > > > >> likely
> > > > > > > >> > > > > >> > > > cover some partitions for which the broker
> is
> > > > still
> > > > > > > >> leader
> > > > > > > >> > > after
> > > > > > > >> > > > > it
> > > > > > > >> > > > > >> > > quickly
> > > > > > > >> > > > > >> > > > processes the
> > > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still
> > be
> > > > slow
> > > > > > in
> > > > > > > >> > > > processing
> > > > > > > >> > > > > >> these
> > > > > > > >> > > > > >> > > > ProduceRequest and request will still be
> very
> > > > high
> > > > > > with
> > > > > > > >> this
> > > > > > > >> > > > KIP.
> > > > > > > >> > > > > It
> > > > > > > >> > > > > >> > > seems
> > > > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> > > after
> > > > > 30
> > > > > > > >> > seconds.
> > > > > > > >> > > Is
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > > > understanding correct?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> > still
> > > > > > timeout
> > > > > > > >> after
> > > > > > > >> > > 30
> > > > > > > >> > > > > >> > seconds,
> > > > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > > > average
> > > > > > > >> produce
> > > > > > > >> > > > > latency.
> > > > > > > >> > > > > >> Can
> > > > > > > >> > > > > >> > > you
> > > > > > > >> > > > > >> > > > clarify what metrics can be improved by
> this
> > > KIP?
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > > > number
> > > > > of
> > > > > > > >> > > truncated
> > > > > > > >> > > > > >> > messages.
> > > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > > throughput
> > > > > > or
> > > > > > > >> > reduce
> > > > > > > >> > > > > >> message
> > > > > > > >> > > > > >> > > > duplication? It will be good to understand
> > > this.
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > Thanks,
> > > > > > > >> > > > > >> > > > Dong
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > > > >> > > lucasatucla@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > >> > wrote:
> > > > > > > >> > > > > >> > > >
> > > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
> > see
> > > > my
> > > > > > > reply
> > > > > > > >> > > below.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1
> partition.
> > > Now
> > > > > > let's
> > > > > > > >> > > consider
> > > > > > > >> > > > a
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > common
> > > > > > > >> > > > > >> > > > > scenario
> > > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > > partitions.
> > > > > And
> > > > > > > >> let's
> > > > > > > >> > > say
> > > > > > > >> > > > > for
> > > > > > > >> > > > > >> > some
> > > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > > >> > > > > >> > > > > The number of leader partitions on
> broker0
> > is
> > > > so
> > > > > > > large,
> > > > > > > >> > say
> > > > > > > >> > > > 10K,
> > > > > > > >> > > > > >> that
> > > > > > > >> > > > > >> > > the
> > > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > > > leadership
> > > > > > > >> for a
> > > > > > > >> > > lot
> > > > > > > >> > > > of
> > > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > > >> > > > > >> > > > > either manually or through some service
> > like
> > > > > cruise
> > > > > > > >> > control.
> > > > > > > >> > > > > >> > > > > With this KIP, not only will the
> leadership
> > > > > > > transitions
> > > > > > > >> > > finish
> > > > > > > >> > > > > >> more
> > > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> > becoming
> > > > more
> > > > > > > >> > balanced,
> > > > > > > >> > > > > >> > > > > but all existing producers corresponding
> to
> > > the
> > > > > 9K
> > > > > > > >> > > partitions
> > > > > > > >> > > > > will
> > > > > > > >> > > > > >> > get
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> > thanks
> > > to
> > > > > the
> > > > > > > >> > batched
> > > > > > > >> > > > > async
> > > > > > > >> > > > > >> ZK
> > > > > > > >> > > > > >> > > > > operations.
> > > > > > > >> > > > > >> > > > > To me it's a useful feature to have
> during
> > > such
> > > > > > > >> > troublesome
> > > > > > > >> > > > > times.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> > > shown
> > > > > > that
> > > > > > > >> with
> > > > > > > >> > > this
> > > > > > > >> > > > > KIP
> > > > > > > >> > > > > >> > many
> > > > > > > >> > > > > >> > > > > producers
> > > > > > > >> > > > > >> > > > > receive an explicit error
> > > > NotLeaderForPartition,
> > > > > > > based
> > > > > > > >> on
> > > > > > > >> > > > which
> > > > > > > >> > > > > >> they
> > > > > > > >> > > > > >> > > > retry
> > > > > > > >> > > > > >> > > > > immediately.
> > > > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> > > retry)
> > > > > for
> > > > > > > >> their
> > > > > > > >> > > > single
> > > > > > > >> > > > > >> > > message
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > much smaller
> > > > > > > >> > > > > >> > > > > compared with the case of timing out
> > without
> > > > the
> > > > > > KIP
> > > > > > > >> (30
> > > > > > > >> > > > seconds
> > > > > > > >> > > > > >> for
> > > > > > > >> > > > > >> > > > timing
> > > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > > >> > > > > >> > > > > One might argue that reducing the timing
> > out
> > > on
> > > > > the
> > > > > > > >> > producer
> > > > > > > >> > > > > side
> > > > > > > >> > > > > >> can
> > > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > > drawbacks[1].
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > > > number
> > > > > of
> > > > > > > >> > > truncated
> > > > > > > >> > > > > >> > messages
> > > > > > > >> > > > > >> > > on
> > > > > > > >> > > > > >> > > > > brokers,
> > > > > > > >> > > > > >> > > > > with the experiments done in the Google
> > Doc,
> > > it
> > > > > > > should
> > > > > > > >> be
> > > > > > > >> > > easy
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> see
> > > > > > > >> > > > > >> > > > that
> > > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > > up-to-date
> > > > > > > >> metadata
> > > > > > > >> > > > avoids
> > > > > > > >> > > > > >> > > appending
> > > > > > > >> > > > > >> > > > > of messages
> > > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we
> talk
> > > to a
> > > > > > > system
> > > > > > > >> > > > operator
> > > > > > > >> > > > > >> and
> > > > > > > >> > > > > >> > ask
> > > > > > > >> > > > > >> > > > > whether
> > > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet
> most
> > > > likely
> > > > > > the
> > > > > > > >> > answer
> > > > > > > >> > > > is
> > > > > > > >> > > > > >> yes.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> > might
> > > be
> > > > > > > >> helpful to
> > > > > > > >> > > > > >> construct
> > > > > > > >> > > > > >> > > some
> > > > > > > >> > > > > >> > > > > formulas.
> > > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back
> to
> > > the
> > > > > > case
> > > > > > > >> where
> > > > > > > >> > > > there
> > > > > > > >> > > > > >> is
> > > > > > > >> > > > > >> > > only
> > > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > > >> > > > > >> > > > > Following the experiments in the Google
> > Doc,
> > > > > let's
> > > > > > > say
> > > > > > > >> > > broker0
> > > > > > > >> > > > > >> > becomes
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > > requests
> > > > > in
> > > > > > > its
> > > > > > > >> > > > request
> > > > > > > >> > > > > >> > queue.
> > > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> > this
> > > > KIP,
> > > > > > > >> broker0
> > > > > > > >> > > can
> > > > > > > >> > > > > >> reply
> > > > > > > >> > > > > >> > > with
> > > > > > > >> > > > > >> > > > an
> > > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> > processing
> > > > > time
> > > > > > of
> > > > > > > >> > > replying
> > > > > > > >> > > > > >> with
> > > > > > > >> > > > > >> > > such
> > > > > > > >> > > > > >> > > > an
> > > > > > > >> > > > > >> > > > > error message.
> > > > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > > > append
> > > > > > > >> messages
> > > > > > > >> > to
> > > > > > > >> > > > > >> > segments,
> > > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> > processing
> > > > > time
> > > > > > > for
> > > > > > > >> > such
> > > > > > > >> > > > > logic.
> > > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > > without
> > > > > > this
> > > > > > > >> KIP
> > > > > > > >> > is
> > > > > > > >> > > N
> > > > > > > >> > > > *
> > > > > > > >> > > > > >> (M2 -
> > > > > > > >> > > > > >> > > > M1) /
> > > > > > > >> > > > > >> > > > > 2.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> > than
> > > > M1,
> > > > > > > which
> > > > > > > >> > means
> > > > > > > >> > > > as
> > > > > > > >> > > > > >> long
> > > > > > > >> > > > > >> > > as N
> > > > > > > >> > > > > >> > > > > is positive,
> > > > > > > >> > > > > >> > > > > we would see improvements on the average
> > > > latency.
> > > > > > > >> > > > > >> > > > > There does not need to be significant
> > backlog
> > > > of
> > > > > > > >> requests
> > > > > > > >> > in
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > request
> > > > > > > >> > > > > >> > > > > queue,
> > > > > > > >> > > > > >> > > > > or severe degradation of disk performance
> > to
> > > > have
> > > > > > the
> > > > > > > >> > > > > improvement.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > Regards,
> > > > > > > >> > > > > >> > > > > Lucas
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
> > the
> > > > > > > producer
> > > > > > > >> > side
> > > > > > > >> > > > can
> > > > > > > >> > > > > >> > trigger
> > > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > > overloaded,
> > > > > > > >> > > > exacerbating
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > situation.
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin
> <
> > > > > > > >> > > lindong28@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > >> > wrote:
> > > > > > > >> > > > > >> > > > >
> > > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> > documentation
> > > of
> > > > > the
> > > > > > > >> > > > experiment.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Initially I also think having a
> separate
> > > > queue
> > > > > > for
> > > > > > > >> > > > controller
> > > > > > > >> > > > > >> > > requests
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > > > summary
> > > > > > > >> section
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > Google
> > > > > > > >> > > > > >> > > > > doc,
> > > > > > > >> > > > > >> > > > > > controller requests are generally more
> > > > > important
> > > > > > > than
> > > > > > > >> > data
> > > > > > > >> > > > > >> requests
> > > > > > > >> > > > > >> > > and
> > > > > > > >> > > > > >> > > > > we
> > > > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > > > processed
> > > > > > > >> > sooner.
> > > > > > > >> > > > But
> > > > > > > >> > > > > >> then
> > > > > > > >> > > > > >> > > Eno
> > > > > > > >> > > > > >> > > > > has
> > > > > > > >> > > > > >> > > > > > two very good questions which I am not
> > sure
> > > > the
> > > > > > > >> Google
> > > > > > > >> > doc
> > > > > > > >> > > > has
> > > > > > > >> > > > > >> > > answered
> > > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > > following
> > > > > > > >> questions?
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> > actual
> > > > > > benefit
> > > > > > > of
> > > > > > > >> > > > KIP-291
> > > > > > > >> > > > > to
> > > > > > > >> > > > > >> > > users.
> > > > > > > >> > > > > >> > > > > The
> > > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > > simulates
> > > > > the
> > > > > > > >> > scenario
> > > > > > > >> > > > that
> > > > > > > >> > > > > >> > broker
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due
> to
> > > e.g.
> > > > > > slow
> > > > > > > >> disk.
> > > > > > > >> > > It
> > > > > > > >> > > > > >> > currently
> > > > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
> > But
> > > > in
> > > > > > the
> > > > > > > >> > common
> > > > > > > >> > > > > >> scenario,
> > > > > > > >> > > > > >> > > it
> > > > > > > >> > > > > >> > > > is
> > > > > > > >> > > > > >> > > > > > probably reasonable to assume that
> there
> > > are
> > > > > many
> > > > > > > >> other
> > > > > > > >> > > > > >> partitions
> > > > > > > >> > > > > >> > > that
> > > > > > > >> > > > > >> > > > > are
> > > > > > > >> > > > > >> > > > > > also actively produced to and
> > > ProduceRequest
> > > > to
> > > > > > > these
> > > > > > > >> > > > > partition
> > > > > > > >> > > > > >> > also
> > > > > > > >> > > > > >> > > > > takes
> > > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even
> > if
> > > > > > broker0
> > > > > > > >> can
> > > > > > > >> > > > become
> > > > > > > >> > > > > >> > > follower
> > > > > > > >> > > > > >> > > > > for
> > > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> > > needs
> > > > > to
> > > > > > > >> process
> > > > > > > >> > > the
> > > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > > ProduceRequests
> > > > > > > >> > cover
> > > > > > > >> > > > > other
> > > > > > > >> > > > > >> > > > > partitions.
> > > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> > timeout
> > > > > after
> > > > > > > 30
> > > > > > > >> > > seconds
> > > > > > > >> > > > > and
> > > > > > > >> > > > > >> > most
> > > > > > > >> > > > > >> > > > > > clients will still likely timeout after
> > 30
> > > > > > seconds.
> > > > > > > >> Then
> > > > > > > >> > > it
> > > > > > > >> > > > is
> > > > > > > >> > > > > >> not
> > > > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> > > since
> > > > > > > client
> > > > > > > >> > will
> > > > > > > >> > > > > >> timeout
> > > > > > > >> > > > > >> > > after
> > > > > > > >> > > > > >> > > > > 30
> > > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting
> to
> > > > > broker1,
> > > > > > > >> with
> > > > > > > >> > or
> > > > > > > >> > > > > >> without
> > > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> > specific
> > > > > > > benefits
> > > > > > > >> of
> > > > > > > >> > > this
> > > > > > > >> > > > > >> KIP to
> > > > > > > >> > > > > >> > > > user
> > > > > > > >> > > > > >> > > > > or
> > > > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
> > KIP
> > > > > > > decreases
> > > > > > > >> > > > average
> > > > > > > >> > > > > >> > > latency,
> > > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > > exception
> > > > > > > >> exposed
> > > > > > > >> > to
> > > > > > > >> > > > > >> client
> > > > > > > >> > > > > >> > > etc.
> > > > > > > >> > > > > >> > > > It
> > > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > > experience
> > > > > > only
> > > > > > > >> when
> > > > > > > >> > > > there
> > > > > > > >> > > > > is
> > > > > > > >> > > > > >> > > issue
> > > > > > > >> > > > > >> > > > > with
> > > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > > > request
> > > > > > > queue
> > > > > > > >> > due
> > > > > > > >> > > to
> > > > > > > >> > > > > >> slow
> > > > > > > >> > > > > >> > > disk
> > > > > > > >> > > > > >> > > > as
> > > > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
> > KIP
> > > > > also
> > > > > > > >> useful
> > > > > > > >> > > when
> > > > > > > >> > > > > >> there
> > > > > > > >> > > > > >> > is
> > > > > > > >> > > > > >> > > > no
> > > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might
> be
> > > > > helpful
> > > > > > > to
> > > > > > > >> > > clarify
> > > > > > > >> > > > > >> this
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > > >> > > > > >> > > > > > Dong
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> > > Wang <
> > > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > > >> > > > > >> > >
> > > > > > > >> > > > > >> > > > > wrote:
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > > experiment
> > > > > > > >> results.
> > > > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > > > achieved
> > > > > > by
> > > > > > > >> > > > > implementing
> > > > > > > >> > > > > >> > the
> > > > > > > >> > > > > >> > > > > > proposed
> > > > > > > >> > > > > >> > > > > > > change:
> > > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > > >> > > > > >> > > > > > > Please take a look when you have time
> > and
> > > > let
> > > > > > me
> > > > > > > >> know
> > > > > > > >> > > your
> > > > > > > >> > > > > >> > > feedback.
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> > Harsha <
> > > > > > > >> > > kafka@harsha.io>
> > > > > > > >> > > > > >> wrote:
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> > > look
> > > > > > might
> > > > > > > >> suit
> > > > > > > >> > > our
> > > > > > > >> > > > > >> > > > requirements
> > > > > > > >> > > > > >> > > > > > > > better.
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> > > Lucas
> > > > > > Wang <
> > > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > > >> > > > > >> > > > > >
> > > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > > replication
> > > > > > > >> quota
> > > > > > > >> > > > > mechanism
> > > > > > > >> > > > > >> > > > proposed
> > > > > > > >> > > > > >> > > > > > in
> > > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > > scenario.
> > > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > > > Harsha <
> > > > > > > >> > > > > kafka@harsha.io
> > > > > > > >> > > > > >> >
> > > > > > > >> > > > > >> > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts
> > on
> > > > > making
> > > > > > > >> this
> > > > > > > >> > > > > >> configurable
> > > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of
> data
> > > > > requests
> > > > > > > to
> > > > > > > >> be
> > > > > > > >> > > > > >> > prioritized.
> > > > > > > >> > > > > >> > > > For
> > > > > > > >> > > > > >> > > > > > > > example
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when
> we
> > > > take
> > > > > > out
> > > > > > > a
> > > > > > > >> > > broker
> > > > > > > >> > > > > and
> > > > > > > >> > > > > >> > bring
> > > > > > > >> > > > > >> > > > new
> > > > > > > >> > > > > >> > > > > > one
> > > > > > > >> > > > > >> > > > > > > > it
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> > > have
> > > > > lot
> > > > > > of
> > > > > > > >> > fetch
> > > > > > > >> > > > > >> requests
> > > > > > > >> > > > > >> > to
> > > > > > > >> > > > > >> > > > > other
> > > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > > >> > > > > >> > > > > > > > > > in clusters. This will
> negatively
> > > > > effect
> > > > > > > the
> > > > > > > >> > > > > >> > > application/client
> > > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > > We are also exploring the
> similar
> > > > > > solution
> > > > > > > to
> > > > > > > >> > > > > >> de-prioritize
> > > > > > > >> > > > > >> > > if
> > > > > > > >> > > > > >> > > > a
> > > > > > > >> > > > > >> > > > > > new
> > > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > > requests,
> > > > we
> > > > > > are
> > > > > > > >> ok
> > > > > > > >> > > with
> > > > > > > >> > > > > the
> > > > > > > >> > > > > >> > > replica
> > > > > > > >> > > > > >> > > > > to
> > > > > > > >> > > > > >> > > > > > be
> > > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> > should
> > > > > > > prioritize
> > > > > > > >> > the
> > > > > > > >> > > > > client
> > > > > > > >> > > > > >> > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35
> > AM
> > > > > Lucas
> > > > > > > Wang
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed
> response.
> > > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> > > feature
> > > > > > yet,
> > > > > > > >> so no
> > > > > > > >> > > > > >> > experimental
> > > > > > > >> > > > > >> > > > > > results
> > > > > > > >> > > > > >> > > > > > > > so
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in
> > the
> > > > > > > following
> > > > > > > >> > days.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right
> that
> > > the
> > > > > > > >> priority
> > > > > > > >> > > queue
> > > > > > > >> > > > > >> does
> > > > > > > >> > > > > >> > not
> > > > > > > >> > > > > >> > > > > > > > completely
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> > > ahead
> > > > > of
> > > > > > > >> > > controller
> > > > > > > >> > > > > >> > requests.
> > > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it
> to
> > > > > greatly
> > > > > > > >> > mitigate
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > effect
> > > > > > > >> > > > > >> > > > of
> > > > > > > >> > > > > >> > > > > > > stable
> > > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out
> > and
> > > > post
> > > > > > the
> > > > > > > >> > > results
> > > > > > > >> > > > > >> when I
> > > > > > > >> > > > > >> > > have
> > > > > > > >> > > > > >> > > > > it.
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44
> > AM,
> > > > Eno
> > > > > > > >> Thereska
> > > > > > > >> > <
> > > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > > >> > > > > >> > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just
> > had a
> > > > > look
> > > > > > at
> > > > > > > >> > this.
> > > > > > > >> > > A
> > > > > > > >> > > > > >> couple
> > > > > > > >> > > > > >> > of
> > > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any
> positive
> > > > > change
> > > > > > > >> after
> > > > > > > >> > > > > >> implementing
> > > > > > > >> > > > > >> > > > this
> > > > > > > >> > > > > >> > > > > > KIP?
> > > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > > > experimental
> > > > > > > >> > results
> > > > > > > >> > > > > that
> > > > > > > >> > > > > >> > show
> > > > > > > >> > > > > >> > > > the
> > > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > > >> > > > > >> > > > > > > > > of
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > > > sufficient
> > > > > > in
> > > > > > > >> > > > addressing
> > > > > > > >> > > > > >> the
> > > > > > > >> > > > > >> > > > > problem
> > > > > > > >> > > > > >> > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> > priority
> > > > > > queues,
> > > > > > > >> you
> > > > > > > >> > > will
> > > > > > > >> > > > > >> > sometimes
> > > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > > >> > > > > >> > > > > > > > > have
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > case that data plane
> requests
> > > > will
> > > > > be
> > > > > > > >> ahead
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > > >> > > control
> > > > > > > >> > > > > >> > > > > > plane
> > > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > > >> > > > > >> > > > > > > > > > > > This happens because the
> > system
> > > > > might
> > > > > > > >> have
> > > > > > > >> > > > already
> > > > > > > >> > > > > >> > > started
> > > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > > >> > > > > >> > > > > > > > > > > > data plane requests before
> > the
> > > > > > control
> > > > > > > >> plane
> > > > > > > >> > > > ones
> > > > > > > >> > > > > >> > > arrived.
> > > > > > > >> > > > > >> > > > So
> > > > > > > >> > > > > >> > > > > > it
> > > > > > > >> > > > > >> > > > > > > > > would
> > > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > > > problem
> > > > > > this
> > > > > > > >> KIP
> > > > > > > >> > > > > >> addresses.
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at
> 4:44
> > > PM,
> > > > > Ted
> > > > > > > Yu <
> > > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at
> > 8:42
> > > > AM,
> > > > > > > Lucas
> > > > > > > >> > Wang
> > > > > > > >> > > <
> > > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> > suggestion.
> > > > I've
> > > > > > > >> updated
> > > > > > > >> > > the
> > > > > > > >> > > > > KIP.
> > > > > > > >> > > > > >> > > Please
> > > > > > > >> > > > > >> > > > > > take
> > > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > > 6:34
> > > > > PM,
> > > > > > > Ted
> > > > > > > >> Yu
> > > > > > > >> > <
> > > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > > KafkaConfig.scala
> > > > > > :
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > val
> QueuedMaxRequests =
> > > 500
> > > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if
> you
> > > can
> > > > > > > include
> > > > > > > >> > the
> > > > > > > >> > > > > >> default
> > > > > > > >> > > > > >> > > value
> > > > > > > >> > > > > >> > > > > for
> > > > > > > >> > > > > >> > > > > > > > this
> > > > > > > >> > > > > >> > > > > > > > >
> > > > > > > >>
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Becket and Dong,
I think currently the ordering guarantee is achieved because
the max inflight request from the controller to a broker is hard coded to
be 1.

If let's hypothetically say the max inflight requests is > 1, then I think
Dong
is right to say that even the separate queue cannot guarantee ordered
processing,
For example, Req1 and Req2 are sent to a broker, and after a connection
reconnection,
both requests are sent again, causing the broker to have 4 requests in the
following order
Req2 > Req1 > Req2 > Req1.

In summary, it seems using the dequeue should not cause problems with
out-of-order processing.
Is that right?

Lucas

On Wed, Jul 18, 2018 at 6:24 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Becket,
>
> It seems that the requests from the old controller will be discarded due to
> old controller epoch. It is not clear whether this is a problem.
>
> And if this out-of-order processing of controller requests is a problem, it
> seems like an existing problem which also applies to the multi-queue based
> design. So it is probably not a concern specific to the use of deque. Does
> that sound reasonable?
>
> Thanks,
> Dong
>
>
> On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com> wrote:
>
> > Hi Mayuresh/Joel,
> >
> > Using the request channel as a dequeue was bright up some time ago when
> we
> > initially thinking of prioritizing the request. The concern was that the
> > controller requests are supposed to be processed in order. If we can
> ensure
> > that there is one controller request in the request channel, the order is
> > not a concern. But in cases that there are more than one controller
> request
> > inserted into the queue, the controller request order may change and
> cause
> > problem. For example, think about the following sequence:
> > 1. Controller successfully sent a request R1 to broker
> > 2. Broker receives R1 and put the request to the head of the request
> queue.
> > 3. Controller to broker connection failed and the controller reconnected
> to
> > the broker.
> > 4. Controller sends a request R2 to the broker
> > 5. Broker receives R2 and add it to the head of the request queue.
> > Now on the broker side, R2 will be processed before R1 is processed,
> which
> > may cause problem.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:
> >
> > > @Mayuresh - I like your idea. It appears to be a simpler less invasive
> > > alternative and it should work. Jun/Becket/others, do you see any
> > pitfalls
> > > with this approach?
> > >
> > > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > > @Mayuresh,
> > > > That's a very interesting idea that I haven't thought before.
> > > > It seems to solve our problem at hand pretty well, and also
> > > > avoids the need to have a new size metric and capacity config
> > > > for the controller request queue. In fact, if we were to adopt
> > > > this design, there is no public interface change, and we
> > > > probably don't need a KIP.
> > > > Also implementation wise, it seems
> > > > the java class LinkedBlockingQueue can readily satisfy the
> requirement
> > > > by supporting a capacity, and also allowing inserting at both ends.
> > > >
> > > > My only concern is that this design is tied to the coincidence that
> > > > we have two request priorities and there are two ends to a deque.
> > > > Hence by using the proposed design, it seems the network layer is
> > > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > > an extra priority level in the future for some reason, we would
> > probably
> > > > need to go back to the design of separate queues, one for each
> priority
> > > > level.
> > > >
> > > > In summary, I'm ok with both designs and lean toward your suggested
> > > > approach.
> > > > Let's hear what others think.
> > > >
> > > > @Becket,
> > > > In light of Mayuresh's suggested new design, I'm answering your
> > question
> > > > only in the context
> > > > of the current KIP design: I think your suggestion makes sense, and
> I'm
> > > ok
> > > > with removing the capacity config and
> > > > just relying on the default value of 20 being sufficient enough.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > > gharatmayuresh15@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Lucas,
> > > > >
> > > > > Seems like the main intent here is to prioritize the controller
> > request
> > > > > over any other requests.
> > > > > In that case, we can change the request queue to a dequeue, where
> you
> > > > > always insert the normal requests (produce, consume,..etc) to the
> end
> > > of
> > > > > the dequeue, but if its a controller request, you insert it to the
> > head
> > > > of
> > > > > the queue. This ensures that the controller request will be given
> > > higher
> > > > > priority over other requests.
> > > > >
> > > > > Also since we only read one request from the socket and mute it and
> > > only
> > > > > unmute it after handling the request, this would ensure that we
> don't
> > > > > handle controller requests out of order.
> > > > >
> > > > > With this approach we can avoid the second queue and the additional
> > > > config
> > > > > for the size of the queue.
> > > > >
> > > > > What do you think ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > >
> > > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hey Joel,
> > > > > >
> > > > > > Thank for the detail explanation. I agree the current design
> makes
> > > > sense.
> > > > > > My confusion is about whether the new config for the controller
> > queue
> > > > > > capacity is necessary. I cannot think of a case in which users
> > would
> > > > > change
> > > > > > it.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > >
> > > > > > > I guess my question can be rephrased to "do we expect user to
> > ever
> > > > > change
> > > > > > > the controller request queue capacity"? If we agree that 20 is
> > > > already
> > > > > a
> > > > > > > very generous default number and we do not expect user to
> change
> > > it,
> > > > is
> > > > > > it
> > > > > > > still necessary to expose this as a config?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> @Becket
> > > > > > >> 1. Thanks for the comment. You are right that normally there
> > > should
> > > > be
> > > > > > >> just
> > > > > > >> one controller request because of muting,
> > > > > > >> and I had NOT intended to say there would be many enqueued
> > > > controller
> > > > > > >> requests.
> > > > > > >> I went through the KIP again, and I'm not sure which part
> > conveys
> > > > that
> > > > > > >> info.
> > > > > > >> I'd be happy to revise if you point it out the section.
> > > > > > >>
> > > > > > >> 2. Though it should not happen in normal conditions, the
> current
> > > > > design
> > > > > > >> does not preclude multiple controllers running
> > > > > > >> at the same time, hence if we don't have the controller queue
> > > > capacity
> > > > > > >> config and simply make its capacity to be 1,
> > > > > > >> network threads handling requests from different controllers
> > will
> > > be
> > > > > > >> blocked during those troublesome times,
> > > > > > >> which is probably not what we want. On the other hand, adding
> > the
> > > > > extra
> > > > > > >> config with a default value, say 20, guards us from issues in
> > > those
> > > > > > >> troublesome times, and IMO there isn't much downside of adding
> > the
> > > > > extra
> > > > > > >> config.
> > > > > > >>
> > > > > > >> @Mayuresh
> > > > > > >> Good catch, this sentence is an obsolete statement based on a
> > > > previous
> > > > > > >> design. I've revised the wording in the KIP.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Lucas
> > > > > > >>
> > > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Hi Lucas,
> > > > > > >> >
> > > > > > >> > Thanks for the KIP.
> > > > > > >> > I am trying to understand why you think "The memory
> > consumption
> > > > can
> > > > > > rise
> > > > > > >> > given the total number of queued requests can go up to 2x"
> in
> > > the
> > > > > > impact
> > > > > > >> > section. Normally the requests from controller to a Broker
> are
> > > not
> > > > > > high
> > > > > > >> > volume, right ?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Mayuresh
> > > > > > >> >
> > > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane
> from
> > > the
> > > > > > data
> > > > > > >> > plane
> > > > > > >> > > makes a lot of sense.
> > > > > > >> > >
> > > > > > >> > > In the KIP you mentioned that the controller request queue
> > may
> > > > > have
> > > > > > >> many
> > > > > > >> > > requests in it. Will this be a common case? The controller
> > > > > requests
> > > > > > >> still
> > > > > > >> > > goes through the SocketServer. The SocketServer will mute
> > the
> > > > > > channel
> > > > > > >> > once
> > > > > > >> > > a request is read and put into the request channel. So
> > > assuming
> > > > > > there
> > > > > > >> is
> > > > > > >> > > only one connection between controller and each broker, on
> > the
> > > > > > broker
> > > > > > >> > side,
> > > > > > >> > > there should be only one controller request in the
> > controller
> > > > > > request
> > > > > > >> > queue
> > > > > > >> > > at any given time. If that is the case, do we need a
> > separate
> > > > > > >> controller
> > > > > > >> > > request queue capacity config? The default value 20 means
> > that
> > > > we
> > > > > > >> expect
> > > > > > >> > > there are 20 controller switches to happen in a short
> period
> > > of
> > > > > > time.
> > > > > > >> I
> > > > > > >> > am
> > > > > > >> > > not sure whether someone should increase the controller
> > > request
> > > > > > queue
> > > > > > >> > > capacity to handle such case, as it seems indicating
> > something
> > > > > very
> > > > > > >> wrong
> > > > > > >> > > has happened.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > >
> > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > > lindong28@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Thanks for the update Lucas.
> > > > > > >> > > >
> > > > > > >> > > > I think the motivation section is intuitive. It will be
> > good
> > > > to
> > > > > > >> learn
> > > > > > >> > > more
> > > > > > >> > > > about the comments from other reviewers.
> > > > > > >> > > >
> > > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi Dong,
> > > > > > >> > > > >
> > > > > > >> > > > > I've updated the motivation section of the KIP by
> > > explaining
> > > > > the
> > > > > > >> > cases
> > > > > > >> > > > that
> > > > > > >> > > > > would have user impacts.
> > > > > > >> > > > > Please take a look at let me know your comments.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Lucas
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi Dong,
> > > > > > >> > > > > >
> > > > > > >> > > > > > The simulation of disk being slow is merely for me
> to
> > > > easily
> > > > > > >> > > construct
> > > > > > >> > > > a
> > > > > > >> > > > > > testing scenario
> > > > > > >> > > > > > with a backlog of produce requests. In production,
> > other
> > > > > than
> > > > > > >> the
> > > > > > >> > > disk
> > > > > > >> > > > > > being slow, a backlog of
> > > > > > >> > > > > > produce requests may also be caused by high produce
> > QPS.
> > > > > > >> > > > > > In that case, we may not want to kill the broker and
> > > > that's
> > > > > > when
> > > > > > >> > this
> > > > > > >> > > > KIP
> > > > > > >> > > > > > can be useful, both for JBOD
> > > > > > >> > > > > > and non-JBOD setup.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Going back to your previous question about each
> > > > > ProduceRequest
> > > > > > >> > > covering
> > > > > > >> > > > > 20
> > > > > > >> > > > > > partitions that are randomly
> > > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > > enqueued
> > > > > that
> > > > > > >> > tries
> > > > > > >> > > to
> > > > > > >> > > > > > switch the current broker, say broker0, from leader
> to
> > > > > > follower
> > > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> > sake
> > > of
> > > > > > >> > argument,
> > > > > > >> > > > > > let's also assume the other brokers, say broker1,
> have
> > > > > > *stopped*
> > > > > > >> > > > fetching
> > > > > > >> > > > > > from
> > > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
> > > (ALL)
> > > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > > > > >> LeaderAndISR
> > > > > > >> > > will
> > > > > > >> > > > be
> > > > > > >> > > > > > put into the purgatory,
> > > > > > >> > > > > >         and since they'll never be replicated to
> other
> > > > > brokers
> > > > > > >> > > (because
> > > > > > >> > > > > of
> > > > > > >> > > > > > the assumption made above), they will
> > > > > > >> > > > > >         be completed either when the LeaderAndISR
> > > request
> > > > is
> > > > > > >> > > processed
> > > > > > >> > > > or
> > > > > > >> > > > > > when the timeout happens.
> > > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> > transition
> > > > the
> > > > > > >> > > partition
> > > > > > >> > > > > > test-0 to become a follower,
> > > > > > >> > > > > >         after the current broker sees the
> replication
> > of
> > > > the
> > > > > > >> > > remaining
> > > > > > >> > > > 19
> > > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > > >> > > > > >         it's no longer the leader for the "test-0".
> > > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
> > > let's
> > > > > say
> > > > > > >> > there
> > > > > > >> > > > are
> > > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
> > > there
> > > > > are
> > > > > > 8
> > > > > > >> io
> > > > > > >> > > > > threads,
> > > > > > >> > > > > >   so each io thread will process approximately 3000
> > > > produce
> > > > > > >> > requests.
> > > > > > >> > > > Now
> > > > > > >> > > > > > let's investigate the io thread that finally
> processed
> > > the
> > > > > > >> > > > LeaderAndISR.
> > > > > > >> > > > > >   For the 3000 produce requests, if we model the
> time
> > > when
> > > > > > their
> > > > > > >> > > > > remaining
> > > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > > > > LeaderAndISR
> > > > > > >> > > > request
> > > > > > >> > > > > is
> > > > > > >> > > > > > processed at time t3000.
> > > > > > >> > > > > >   Without this KIP, the 1st produce request would
> have
> > > > > waited
> > > > > > an
> > > > > > >> > > extra
> > > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> > time
> > > of
> > > > > > >> t3000 -
> > > > > > >> > > t1,
> > > > > > >> > > > > etc.
> > > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
> > for
> > > > the
> > > > > > >> > earlier
> > > > > > >> > > > > > produce requests than for the later ones. For the
> same
> > > > > reason,
> > > > > > >> the
> > > > > > >> > > more
> > > > > > >> > > > > > ProduceRequests queued
> > > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> > > > (capped
> > > > > > by
> > > > > > >> the
> > > > > > >> > > > > > produce timeout).
> > > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > > acks=1
> > > > > > >> > > > > >   There will be no latency differences in this case,
> > but
> > > > > > >> > > > > >   2.1 without this KIP, the records of partition
> > test-0
> > > in
> > > > > the
> > > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > > appended
> > > > > to
> > > > > > >> the
> > > > > > >> > > local
> > > > > > >> > > > > log,
> > > > > > >> > > > > >         and eventually be truncated after processing
> > the
> > > > > > >> > > LeaderAndISR.
> > > > > > >> > > > > > This is what's referred to as
> > > > > > >> > > > > >         "some unofficial definition of data loss in
> > > terms
> > > > of
> > > > > > >> > messages
> > > > > > >> > > > > > beyond the high watermark".
> > > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect
> since
> > if
> > > > the
> > > > > > >> > > > LeaderAndISR
> > > > > > >> > > > > > is immediately processed, the response to producers
> > will
> > > > > have
> > > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > > producers
> > > > > to
> > > > > > >> retry
> > > > > > >> > > > > >
> > > > > > >> > > > > > This explanation above is the benefit for reducing
> the
> > > > > latency
> > > > > > >> of a
> > > > > > >> > > > > broker
> > > > > > >> > > > > > becoming the follower,
> > > > > > >> > > > > > closely related is reducing the latency of a broker
> > > > becoming
> > > > > > the
> > > > > > >> > > > leader.
> > > > > > >> > > > > > In this case, the benefit is even more obvious, if
> > other
> > > > > > brokers
> > > > > > >> > have
> > > > > > >> > > > > > resigned leadership, and the
> > > > > > >> > > > > > current broker should take leadership. Any delay in
> > > > > processing
> > > > > > >> the
> > > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > > >> > > > > > by clients as unavailability. In extreme cases, this
> > can
> > > > > cause
> > > > > > >> > failed
> > > > > > >> > > > > > produce requests if the retries are
> > > > > > >> > > > > > exhausted.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Another two types of controller requests are
> > > > UpdateMetadata
> > > > > > and
> > > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > > >> > > > > > For UpdateMetadata requests, delayed processing
> means
> > > > > clients
> > > > > > >> > > receiving
> > > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > > >> > > > > > for certain partitions, and the effect is more
> retries
> > > or
> > > > > even
> > > > > > >> > fatal
> > > > > > >> > > > > > failure if the retries are exhausted.
> > > > > > >> > > > > >
> > > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > > degrade
> > > > > the
> > > > > > >> > > > performance
> > > > > > >> > > > > > of topic deletion.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regarding your last question of the delay for
> > > > > > >> > DescribeLogDirsRequest,
> > > > > > >> > > > you
> > > > > > >> > > > > > are right
> > > > > > >> > > > > > that this KIP cannot help with the latency in
> getting
> > > the
> > > > > log
> > > > > > >> dirs
> > > > > > >> > > > info,
> > > > > > >> > > > > > and it's only relevant
> > > > > > >> > > > > > when controller requests are involved.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regards,
> > > > > > >> > > > > > Lucas
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > > lindong28@gmail.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > >> Hey Jun,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Thanks much for the comments. It is good point. So
> > the
> > > > > > feature
> > > > > > >> may
> > > > > > >> > > be
> > > > > > >> > > > > >> useful for JBOD use-case. I have one question
> below.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Hey Lucas,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Do you think this feature is also useful for
> non-JBOD
> > > > setup
> > > > > > or
> > > > > > >> it
> > > > > > >> > is
> > > > > > >> > > > > only
> > > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > > understand
> > > > > > this.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> When the broker is setup using JBOD, in order to
> move
> > > > > leaders
> > > > > > >> on
> > > > > > >> > the
> > > > > > >> > > > > >> failed
> > > > > > >> > > > > >> disk to other disks, the system operator first
> needs
> > to
> > > > get
> > > > > > the
> > > > > > >> > list
> > > > > > >> > > > of
> > > > > > >> > > > > >> partitions on the failed disk. This is currently
> > > achieved
> > > > > > using
> > > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > > >> DescribeLogDirsRequest
> > > > > > >> > to
> > > > > > >> > > > the
> > > > > > >> > > > > >> broker. If we only prioritize the controller
> > requests,
> > > > then
> > > > > > the
> > > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > > >> > > > > >> may still take a long time to be processed by the
> > > broker.
> > > > > So
> > > > > > >> the
> > > > > > >> > > > overall
> > > > > > >> > > > > >> time to move leaders away from the failed disk may
> > > still
> > > > be
> > > > > > >> long
> > > > > > >> > > even
> > > > > > >> > > > > with
> > > > > > >> > > > > >> this KIP. What do you think?
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Thanks,
> > > > > > >> > > > > >> Dong
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > > >> lucasatucla@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > @Dong,
> > > > > > >> > > > > >> > Since both of the two comments in your previous
> > email
> > > > are
> > > > > > >> about
> > > > > > >> > > the
> > > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
> > > this
> > > > > KIP
> > > > > > >> can
> > > > > > >> > be
> > > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > > >> > > > > >> > Please let me know, thanks!
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > Regards,
> > > > > > >> > > > > >> > Lucas
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > > jun@confluent.io>
> > > > > > >> > wrote:
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
> > > > should
> > > > > > just
> > > > > > >> > kill
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
> > > only
> > > > > one
> > > > > > of
> > > > > > >> > the
> > > > > > >> > > > > disks
> > > > > > >> > > > > >> on
> > > > > > >> > > > > >> > a
> > > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
> > and
> > > > move
> > > > > > the
> > > > > > >> > > > leaders
> > > > > > >> > > > > on
> > > > > > >> > > > > >> > that
> > > > > > >> > > > > >> > > disk to other brokers. In that case, being able
> > to
> > > > > > process
> > > > > > >> the
> > > > > > >> > > > > >> > LeaderAndIsr
> > > > > > >> > > > > >> > > requests faster will potentially help the
> > producers
> > > > > > recover
> > > > > > >> > > > quicker.
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Thanks,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Jun
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > > >> lindong28@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > > Hey Lucas,
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Thanks for the reply. Some follow up
> questions
> > > > below.
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > > > > partitions
> > > > > > >> > that
> > > > > > >> > > > are
> > > > > > >> > > > > >> > > randomly
> > > > > > >> > > > > >> > > > distributed across all partitions, then each
> > > > > > >> ProduceRequest
> > > > > > >> > > will
> > > > > > >> > > > > >> likely
> > > > > > >> > > > > >> > > > cover some partitions for which the broker is
> > > still
> > > > > > >> leader
> > > > > > >> > > after
> > > > > > >> > > > > it
> > > > > > >> > > > > >> > > quickly
> > > > > > >> > > > > >> > > > processes the
> > > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still
> be
> > > slow
> > > > > in
> > > > > > >> > > > processing
> > > > > > >> > > > > >> these
> > > > > > >> > > > > >> > > > ProduceRequest and request will still be very
> > > high
> > > > > with
> > > > > > >> this
> > > > > > >> > > > KIP.
> > > > > > >> > > > > It
> > > > > > >> > > > > >> > > seems
> > > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> > after
> > > > 30
> > > > > > >> > seconds.
> > > > > > >> > > Is
> > > > > > >> > > > > >> this
> > > > > > >> > > > > >> > > > understanding correct?
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will
> still
> > > > > timeout
> > > > > > >> after
> > > > > > >> > > 30
> > > > > > >> > > > > >> > seconds,
> > > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > > average
> > > > > > >> produce
> > > > > > >> > > > > latency.
> > > > > > >> > > > > >> Can
> > > > > > >> > > > > >> > > you
> > > > > > >> > > > > >> > > > clarify what metrics can be improved by this
> > KIP?
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > > number
> > > > of
> > > > > > >> > > truncated
> > > > > > >> > > > > >> > messages.
> > > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > > throughput
> > > > > or
> > > > > > >> > reduce
> > > > > > >> > > > > >> message
> > > > > > >> > > > > >> > > > duplication? It will be good to understand
> > this.
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > Thanks,
> > > > > > >> > > > > >> > > > Dong
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > > >> > > lucasatucla@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > >> > wrote:
> > > > > > >> > > > > >> > > >
> > > > > > >> > > > > >> > > > > Hi Dong,
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please
> see
> > > my
> > > > > > reply
> > > > > > >> > > below.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
> > Now
> > > > > let's
> > > > > > >> > > consider
> > > > > > >> > > > a
> > > > > > >> > > > > >> more
> > > > > > >> > > > > >> > > > common
> > > > > > >> > > > > >> > > > > scenario
> > > > > > >> > > > > >> > > > > where broker0 is the leader of many
> > partitions.
> > > > And
> > > > > > >> let's
> > > > > > >> > > say
> > > > > > >> > > > > for
> > > > > > >> > > > > >> > some
> > > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > > >> > > > > >> > > > > The number of leader partitions on broker0
> is
> > > so
> > > > > > large,
> > > > > > >> > say
> > > > > > >> > > > 10K,
> > > > > > >> > > > > >> that
> > > > > > >> > > > > >> > > the
> > > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > > leadership
> > > > > > >> for a
> > > > > > >> > > lot
> > > > > > >> > > > of
> > > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > > >> > > > > >> > > > > either manually or through some service
> like
> > > > cruise
> > > > > > >> > control.
> > > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > > > > transitions
> > > > > > >> > > finish
> > > > > > >> > > > > >> more
> > > > > > >> > > > > >> > > > > quickly, helping the cluster itself
> becoming
> > > more
> > > > > > >> > balanced,
> > > > > > >> > > > > >> > > > > but all existing producers corresponding to
> > the
> > > > 9K
> > > > > > >> > > partitions
> > > > > > >> > > > > will
> > > > > > >> > > > > >> > get
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > > >> > > > > >> > > > > rather than relying on their timeout,
> thanks
> > to
> > > > the
> > > > > > >> > batched
> > > > > > >> > > > > async
> > > > > > >> > > > > >> ZK
> > > > > > >> > > > > >> > > > > operations.
> > > > > > >> > > > > >> > > > > To me it's a useful feature to have during
> > such
> > > > > > >> > troublesome
> > > > > > >> > > > > times.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> > shown
> > > > > that
> > > > > > >> with
> > > > > > >> > > this
> > > > > > >> > > > > KIP
> > > > > > >> > > > > >> > many
> > > > > > >> > > > > >> > > > > producers
> > > > > > >> > > > > >> > > > > receive an explicit error
> > > NotLeaderForPartition,
> > > > > > based
> > > > > > >> on
> > > > > > >> > > > which
> > > > > > >> > > > > >> they
> > > > > > >> > > > > >> > > > retry
> > > > > > >> > > > > >> > > > > immediately.
> > > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> > retry)
> > > > for
> > > > > > >> their
> > > > > > >> > > > single
> > > > > > >> > > > > >> > > message
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > much smaller
> > > > > > >> > > > > >> > > > > compared with the case of timing out
> without
> > > the
> > > > > KIP
> > > > > > >> (30
> > > > > > >> > > > seconds
> > > > > > >> > > > > >> for
> > > > > > >> > > > > >> > > > timing
> > > > > > >> > > > > >> > > > > out + quick retry).
> > > > > > >> > > > > >> > > > > One might argue that reducing the timing
> out
> > on
> > > > the
> > > > > > >> > producer
> > > > > > >> > > > > side
> > > > > > >> > > > > >> can
> > > > > > >> > > > > >> > > > > achieve the same result,
> > > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > > drawbacks[1].
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > > number
> > > > of
> > > > > > >> > > truncated
> > > > > > >> > > > > >> > messages
> > > > > > >> > > > > >> > > on
> > > > > > >> > > > > >> > > > > brokers,
> > > > > > >> > > > > >> > > > > with the experiments done in the Google
> Doc,
> > it
> > > > > > should
> > > > > > >> be
> > > > > > >> > > easy
> > > > > > >> > > > > to
> > > > > > >> > > > > >> see
> > > > > > >> > > > > >> > > > that
> > > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> > up-to-date
> > > > > > >> metadata
> > > > > > >> > > > avoids
> > > > > > >> > > > > >> > > appending
> > > > > > >> > > > > >> > > > > of messages
> > > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
> > to a
> > > > > > system
> > > > > > >> > > > operator
> > > > > > >> > > > > >> and
> > > > > > >> > > > > >> > ask
> > > > > > >> > > > > >> > > > > whether
> > > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
> > > likely
> > > > > the
> > > > > > >> > answer
> > > > > > >> > > > is
> > > > > > >> > > > > >> yes.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > 3. To answer your question, I think it
> might
> > be
> > > > > > >> helpful to
> > > > > > >> > > > > >> construct
> > > > > > >> > > > > >> > > some
> > > > > > >> > > > > >> > > > > formulas.
> > > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
> > the
> > > > > case
> > > > > > >> where
> > > > > > >> > > > there
> > > > > > >> > > > > >> is
> > > > > > >> > > > > >> > > only
> > > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > > >> > > > > >> > > > > Following the experiments in the Google
> Doc,
> > > > let's
> > > > > > say
> > > > > > >> > > broker0
> > > > > > >> > > > > >> > becomes
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > follower at time t0,
> > > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > > requests
> > > > in
> > > > > > its
> > > > > > >> > > > request
> > > > > > >> > > > > >> > queue.
> > > > > > >> > > > > >> > > > > With the up-to-date metadata brought by
> this
> > > KIP,
> > > > > > >> broker0
> > > > > > >> > > can
> > > > > > >> > > > > >> reply
> > > > > > >> > > > > >> > > with
> > > > > > >> > > > > >> > > > an
> > > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > > >> > > > > >> > > > > let's use M1 to denote the average
> processing
> > > > time
> > > > > of
> > > > > > >> > > replying
> > > > > > >> > > > > >> with
> > > > > > >> > > > > >> > > such
> > > > > > >> > > > > >> > > > an
> > > > > > >> > > > > >> > > > > error message.
> > > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > > append
> > > > > > >> messages
> > > > > > >> > to
> > > > > > >> > > > > >> > segments,
> > > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > > >> > > > > >> > > > > let's use M2 to denote the average
> processing
> > > > time
> > > > > > for
> > > > > > >> > such
> > > > > > >> > > > > logic.
> > > > > > >> > > > > >> > > > > Then the average extra latency incurred
> > without
> > > > > this
> > > > > > >> KIP
> > > > > > >> > is
> > > > > > >> > > N
> > > > > > >> > > > *
> > > > > > >> > > > > >> (M2 -
> > > > > > >> > > > > >> > > > M1) /
> > > > > > >> > > > > >> > > > > 2.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > In practice, M2 should always be larger
> than
> > > M1,
> > > > > > which
> > > > > > >> > means
> > > > > > >> > > > as
> > > > > > >> > > > > >> long
> > > > > > >> > > > > >> > > as N
> > > > > > >> > > > > >> > > > > is positive,
> > > > > > >> > > > > >> > > > > we would see improvements on the average
> > > latency.
> > > > > > >> > > > > >> > > > > There does not need to be significant
> backlog
> > > of
> > > > > > >> requests
> > > > > > >> > in
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > request
> > > > > > >> > > > > >> > > > > queue,
> > > > > > >> > > > > >> > > > > or severe degradation of disk performance
> to
> > > have
> > > > > the
> > > > > > >> > > > > improvement.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > Regards,
> > > > > > >> > > > > >> > > > > Lucas
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on
> the
> > > > > > producer
> > > > > > >> > side
> > > > > > >> > > > can
> > > > > > >> > > > > >> > trigger
> > > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > > overloaded,
> > > > > > >> > > > exacerbating
> > > > > > >> > > > > >> the
> > > > > > >> > > > > >> > > > > situation.
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > > > > >> > > lindong28@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > >> > wrote:
> > > > > > >> > > > > >> > > > >
> > > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Thanks much for the detailed
> documentation
> > of
> > > > the
> > > > > > >> > > > experiment.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Initially I also think having a separate
> > > queue
> > > > > for
> > > > > > >> > > > controller
> > > > > > >> > > > > >> > > requests
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > > summary
> > > > > > >> section
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > > >> > Google
> > > > > > >> > > > > >> > > > > doc,
> > > > > > >> > > > > >> > > > > > controller requests are generally more
> > > > important
> > > > > > than
> > > > > > >> > data
> > > > > > >> > > > > >> requests
> > > > > > >> > > > > >> > > and
> > > > > > >> > > > > >> > > > > we
> > > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > > processed
> > > > > > >> > sooner.
> > > > > > >> > > > But
> > > > > > >> > > > > >> then
> > > > > > >> > > > > >> > > Eno
> > > > > > >> > > > > >> > > > > has
> > > > > > >> > > > > >> > > > > > two very good questions which I am not
> sure
> > > the
> > > > > > >> Google
> > > > > > >> > doc
> > > > > > >> > > > has
> > > > > > >> > > > > >> > > answered
> > > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> > following
> > > > > > >> questions?
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 1) It is not very clear what is the
> actual
> > > > > benefit
> > > > > > of
> > > > > > >> > > > KIP-291
> > > > > > >> > > > > to
> > > > > > >> > > > > >> > > users.
> > > > > > >> > > > > >> > > > > The
> > > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> > simulates
> > > > the
> > > > > > >> > scenario
> > > > > > >> > > > that
> > > > > > >> > > > > >> > broker
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
> > e.g.
> > > > > slow
> > > > > > >> disk.
> > > > > > >> > > It
> > > > > > >> > > > > >> > currently
> > > > > > >> > > > > >> > > > > > assumes that there is only 1 partition.
> But
> > > in
> > > > > the
> > > > > > >> > common
> > > > > > >> > > > > >> scenario,
> > > > > > >> > > > > >> > > it
> > > > > > >> > > > > >> > > > is
> > > > > > >> > > > > >> > > > > > probably reasonable to assume that there
> > are
> > > > many
> > > > > > >> other
> > > > > > >> > > > > >> partitions
> > > > > > >> > > > > >> > > that
> > > > > > >> > > > > >> > > > > are
> > > > > > >> > > > > >> > > > > > also actively produced to and
> > ProduceRequest
> > > to
> > > > > > these
> > > > > > >> > > > > partition
> > > > > > >> > > > > >> > also
> > > > > > >> > > > > >> > > > > takes
> > > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even
> if
> > > > > broker0
> > > > > > >> can
> > > > > > >> > > > become
> > > > > > >> > > > > >> > > follower
> > > > > > >> > > > > >> > > > > for
> > > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> > needs
> > > > to
> > > > > > >> process
> > > > > > >> > > the
> > > > > > >> > > > > >> > > > > ProduceRequest
> > > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > > ProduceRequests
> > > > > > >> > cover
> > > > > > >> > > > > other
> > > > > > >> > > > > >> > > > > partitions.
> > > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still
> timeout
> > > > after
> > > > > > 30
> > > > > > >> > > seconds
> > > > > > >> > > > > and
> > > > > > >> > > > > >> > most
> > > > > > >> > > > > >> > > > > > clients will still likely timeout after
> 30
> > > > > seconds.
> > > > > > >> Then
> > > > > > >> > > it
> > > > > > >> > > > is
> > > > > > >> > > > > >> not
> > > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> > since
> > > > > > client
> > > > > > >> > will
> > > > > > >> > > > > >> timeout
> > > > > > >> > > > > >> > > after
> > > > > > >> > > > > >> > > > > 30
> > > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> > > > broker1,
> > > > > > >> with
> > > > > > >> > or
> > > > > > >> > > > > >> without
> > > > > > >> > > > > >> > > > > KIP-291.
> > > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the
> specific
> > > > > > benefits
> > > > > > >> of
> > > > > > >> > > this
> > > > > > >> > > > > >> KIP to
> > > > > > >> > > > > >> > > > user
> > > > > > >> > > > > >> > > > > or
> > > > > > >> > > > > >> > > > > > system administrator, e.g. whether this
> KIP
> > > > > > decreases
> > > > > > >> > > > average
> > > > > > >> > > > > >> > > latency,
> > > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > > exception
> > > > > > >> exposed
> > > > > > >> > to
> > > > > > >> > > > > >> client
> > > > > > >> > > > > >> > > etc.
> > > > > > >> > > > > >> > > > It
> > > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> > experience
> > > > > only
> > > > > > >> when
> > > > > > >> > > > there
> > > > > > >> > > > > is
> > > > > > >> > > > > >> > > issue
> > > > > > >> > > > > >> > > > > with
> > > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > > request
> > > > > > queue
> > > > > > >> > due
> > > > > > >> > > to
> > > > > > >> > > > > >> slow
> > > > > > >> > > > > >> > > disk
> > > > > > >> > > > > >> > > > as
> > > > > > >> > > > > >> > > > > > described in the Google doc? Or is this
> KIP
> > > > also
> > > > > > >> useful
> > > > > > >> > > when
> > > > > > >> > > > > >> there
> > > > > > >> > > > > >> > is
> > > > > > >> > > > > >> > > > no
> > > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> > > > helpful
> > > > > > to
> > > > > > >> > > clarify
> > > > > > >> > > > > >> this
> > > > > > >> > > > > >> > to
> > > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > Thanks much,
> > > > > > >> > > > > >> > > > > > Dong
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> > Wang <
> > > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > > > wrote:
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > > experiment
> > > > > > >> results.
> > > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > > achieved
> > > > > by
> > > > > > >> > > > > implementing
> > > > > > >> > > > > >> > the
> > > > > > >> > > > > >> > > > > > proposed
> > > > > > >> > > > > >> > > > > > > change:
> > > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > >> > > > > >> > > > > > > Please take a look when you have time
> and
> > > let
> > > > > me
> > > > > > >> know
> > > > > > >> > > your
> > > > > > >> > > > > >> > > feedback.
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > Regards,
> > > > > > >> > > > > >> > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM,
> Harsha <
> > > > > > >> > > kafka@harsha.io>
> > > > > > >> > > > > >> wrote:
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> > look
> > > > > might
> > > > > > >> suit
> > > > > > >> > > our
> > > > > > >> > > > > >> > > > requirements
> > > > > > >> > > > > >> > > > > > > > better.
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > Harsha
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> > Lucas
> > > > > Wang <
> > > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > > >> > > > > >> > > > > >
> > > > > > >> > > > > >> > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > > replication
> > > > > > >> quota
> > > > > > >> > > > > mechanism
> > > > > > >> > > > > >> > > > proposed
> > > > > > >> > > > > >> > > > > > in
> > > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> > scenario.
> > > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > > Harsha <
> > > > > > >> > > > > kafka@harsha.io
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > > > wrote:
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts
> on
> > > > making
> > > > > > >> this
> > > > > > >> > > > > >> configurable
> > > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> > > > requests
> > > > > > to
> > > > > > >> be
> > > > > > >> > > > > >> > prioritized.
> > > > > > >> > > > > >> > > > For
> > > > > > >> > > > > >> > > > > > > > example
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
> > > take
> > > > > out
> > > > > > a
> > > > > > >> > > broker
> > > > > > >> > > > > and
> > > > > > >> > > > > >> > bring
> > > > > > >> > > > > >> > > > new
> > > > > > >> > > > > >> > > > > > one
> > > > > > >> > > > > >> > > > > > > > it
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> > have
> > > > lot
> > > > > of
> > > > > > >> > fetch
> > > > > > >> > > > > >> requests
> > > > > > >> > > > > >> > to
> > > > > > >> > > > > >> > > > > other
> > > > > > >> > > > > >> > > > > > > > > leaders
> > > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> > > > effect
> > > > > > the
> > > > > > >> > > > > >> > > application/client
> > > > > > >> > > > > >> > > > > > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > > > > solution
> > > > > > to
> > > > > > >> > > > > >> de-prioritize
> > > > > > >> > > > > >> > > if
> > > > > > >> > > > > >> > > > a
> > > > > > >> > > > > >> > > > > > new
> > > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> > requests,
> > > we
> > > > > are
> > > > > > >> ok
> > > > > > >> > > with
> > > > > > >> > > > > the
> > > > > > >> > > > > >> > > replica
> > > > > > >> > > > > >> > > > > to
> > > > > > >> > > > > >> > > > > > be
> > > > > > >> > > > > >> > > > > > > > > > taking time but the leaders
> should
> > > > > > prioritize
> > > > > > >> > the
> > > > > > >> > > > > client
> > > > > > >> > > > > >> > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35
> AM
> > > > Lucas
> > > > > > Wang
> > > > > > >> > > wrote:
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> > feature
> > > > > yet,
> > > > > > >> so no
> > > > > > >> > > > > >> > experimental
> > > > > > >> > > > > >> > > > > > results
> > > > > > >> > > > > >> > > > > > > > so
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in
> the
> > > > > > following
> > > > > > >> > days.
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
> > the
> > > > > > >> priority
> > > > > > >> > > queue
> > > > > > >> > > > > >> does
> > > > > > >> > > > > >> > not
> > > > > > >> > > > > >> > > > > > > > completely
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> > ahead
> > > > of
> > > > > > >> > > controller
> > > > > > >> > > > > >> > requests.
> > > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> > > > greatly
> > > > > > >> > mitigate
> > > > > > >> > > > the
> > > > > > >> > > > > >> > effect
> > > > > > >> > > > > >> > > > of
> > > > > > >> > > > > >> > > > > > > stable
> > > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out
> and
> > > post
> > > > > the
> > > > > > >> > > results
> > > > > > >> > > > > >> when I
> > > > > > >> > > > > >> > > have
> > > > > > >> > > > > >> > > > > it.
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44
> AM,
> > > Eno
> > > > > > >> Thereska
> > > > > > >> > <
> > > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > > >> > > > > >> > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just
> had a
> > > > look
> > > > > at
> > > > > > >> > this.
> > > > > > >> > > A
> > > > > > >> > > > > >> couple
> > > > > > >> > > > > >> > of
> > > > > > >> > > > > >> > > > > > > > questions:
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> > > > change
> > > > > > >> after
> > > > > > >> > > > > >> implementing
> > > > > > >> > > > > >> > > > this
> > > > > > >> > > > > >> > > > > > KIP?
> > > > > > >> > > > > >> > > > > > > > > I'm
> > > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > > experimental
> > > > > > >> > results
> > > > > > >> > > > > that
> > > > > > >> > > > > >> > show
> > > > > > >> > > > > >> > > > the
> > > > > > >> > > > > >> > > > > > > > benefit
> > > > > > >> > > > > >> > > > > > > > > of
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > > sufficient
> > > > > in
> > > > > > >> > > > addressing
> > > > > > >> > > > > >> the
> > > > > > >> > > > > >> > > > > problem
> > > > > > >> > > > > >> > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > KIP
> > > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with
> priority
> > > > > queues,
> > > > > > >> you
> > > > > > >> > > will
> > > > > > >> > > > > >> > sometimes
> > > > > > >> > > > > >> > > > > > > (often?)
> > > > > > >> > > > > >> > > > > > > > > have
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
> > > will
> > > > be
> > > > > > >> ahead
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > > >> > > control
> > > > > > >> > > > > >> > > > > > plane
> > > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > > >> > > > > >> > > > > > > > > > > > This happens because the
> system
> > > > might
> > > > > > >> have
> > > > > > >> > > > already
> > > > > > >> > > > > >> > > started
> > > > > > >> > > > > >> > > > > > > > > processing
> > > > > > >> > > > > >> > > > > > > > > > > the
> > > > > > >> > > > > >> > > > > > > > > > > > data plane requests before
> the
> > > > > control
> > > > > > >> plane
> > > > > > >> > > > ones
> > > > > > >> > > > > >> > > arrived.
> > > > > > >> > > > > >> > > > So
> > > > > > >> > > > > >> > > > > > it
> > > > > > >> > > > > >> > > > > > > > > would
> > > > > > >> > > > > >> > > > > > > > > > > be
> > > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > > problem
> > > > > this
> > > > > > >> KIP
> > > > > > >> > > > > >> addresses.
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
> > PM,
> > > > Ted
> > > > > > Yu <
> > > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > >
> > > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at
> 8:42
> > > AM,
> > > > > > Lucas
> > > > > > >> > Wang
> > > > > > >> > > <
> > > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the
> suggestion.
> > > I've
> > > > > > >> updated
> > > > > > >> > > the
> > > > > > >> > > > > KIP.
> > > > > > >> > > > > >> > > Please
> > > > > > >> > > > > >> > > > > > take
> > > > > > >> > > > > >> > > > > > > > > > another
> > > > > > >> > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > 6:34
> > > > PM,
> > > > > > Ted
> > > > > > >> Yu
> > > > > > >> > <
> > > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > > KafkaConfig.scala
> > > > > :
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
> > 500
> > > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
> > can
> > > > > > include
> > > > > > >> > the
> > > > > > >> > > > > >> default
> > > > > > >> > > > > >> > > value
> > > > > > >> > > > > >> > > > > for
> > > > > > >> > > > > >> > > > > > > > this
> > > > > > >> > > > > >> > > > > > > > >
> > > > > > >>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Becket,

It seems that the requests from the old controller will be discarded due to
old controller epoch. It is not clear whether this is a problem.

And if this out-of-order processing of controller requests is a problem, it
seems like an existing problem which also applies to the multi-queue based
design. So it is probably not a concern specific to the use of deque. Does
that sound reasonable?

Thanks,
Dong


On Wed, 18 Jul 2018 at 6:17 PM Becket Qin <be...@gmail.com> wrote:

> Hi Mayuresh/Joel,
>
> Using the request channel as a dequeue was bright up some time ago when we
> initially thinking of prioritizing the request. The concern was that the
> controller requests are supposed to be processed in order. If we can ensure
> that there is one controller request in the request channel, the order is
> not a concern. But in cases that there are more than one controller request
> inserted into the queue, the controller request order may change and cause
> problem. For example, think about the following sequence:
> 1. Controller successfully sent a request R1 to broker
> 2. Broker receives R1 and put the request to the head of the request queue.
> 3. Controller to broker connection failed and the controller reconnected to
> the broker.
> 4. Controller sends a request R2 to the broker
> 5. Broker receives R2 and add it to the head of the request queue.
> Now on the broker side, R2 will be processed before R1 is processed, which
> may cause problem.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:
>
> > @Mayuresh - I like your idea. It appears to be a simpler less invasive
> > alternative and it should work. Jun/Becket/others, do you see any
> pitfalls
> > with this approach?
> >
> > On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > > @Mayuresh,
> > > That's a very interesting idea that I haven't thought before.
> > > It seems to solve our problem at hand pretty well, and also
> > > avoids the need to have a new size metric and capacity config
> > > for the controller request queue. In fact, if we were to adopt
> > > this design, there is no public interface change, and we
> > > probably don't need a KIP.
> > > Also implementation wise, it seems
> > > the java class LinkedBlockingQueue can readily satisfy the requirement
> > > by supporting a capacity, and also allowing inserting at both ends.
> > >
> > > My only concern is that this design is tied to the coincidence that
> > > we have two request priorities and there are two ends to a deque.
> > > Hence by using the proposed design, it seems the network layer is
> > > more tightly coupled with upper layer logic, e.g. if we were to add
> > > an extra priority level in the future for some reason, we would
> probably
> > > need to go back to the design of separate queues, one for each priority
> > > level.
> > >
> > > In summary, I'm ok with both designs and lean toward your suggested
> > > approach.
> > > Let's hear what others think.
> > >
> > > @Becket,
> > > In light of Mayuresh's suggested new design, I'm answering your
> question
> > > only in the context
> > > of the current KIP design: I think your suggestion makes sense, and I'm
> > ok
> > > with removing the capacity config and
> > > just relying on the default value of 20 being sufficient enough.
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com
> > > > wrote:
> > >
> > > > Hi Lucas,
> > > >
> > > > Seems like the main intent here is to prioritize the controller
> request
> > > > over any other requests.
> > > > In that case, we can change the request queue to a dequeue, where you
> > > > always insert the normal requests (produce, consume,..etc) to the end
> > of
> > > > the dequeue, but if its a controller request, you insert it to the
> head
> > > of
> > > > the queue. This ensures that the controller request will be given
> > higher
> > > > priority over other requests.
> > > >
> > > > Also since we only read one request from the socket and mute it and
> > only
> > > > unmute it after handling the request, this would ensure that we don't
> > > > handle controller requests out of order.
> > > >
> > > > With this approach we can avoid the second queue and the additional
> > > config
> > > > for the size of the queue.
> > > >
> > > > What do you think ?
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > >
> > > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Hey Joel,
> > > > >
> > > > > Thank for the detail explanation. I agree the current design makes
> > > sense.
> > > > > My confusion is about whether the new config for the controller
> queue
> > > > > capacity is necessary. I cannot think of a case in which users
> would
> > > > change
> > > > > it.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > >
> > > > > > I guess my question can be rephrased to "do we expect user to
> ever
> > > > change
> > > > > > the controller request queue capacity"? If we agree that 20 is
> > > already
> > > > a
> > > > > > very generous default number and we do not expect user to change
> > it,
> > > is
> > > > > it
> > > > > > still necessary to expose this as a config?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> @Becket
> > > > > >> 1. Thanks for the comment. You are right that normally there
> > should
> > > be
> > > > > >> just
> > > > > >> one controller request because of muting,
> > > > > >> and I had NOT intended to say there would be many enqueued
> > > controller
> > > > > >> requests.
> > > > > >> I went through the KIP again, and I'm not sure which part
> conveys
> > > that
> > > > > >> info.
> > > > > >> I'd be happy to revise if you point it out the section.
> > > > > >>
> > > > > >> 2. Though it should not happen in normal conditions, the current
> > > > design
> > > > > >> does not preclude multiple controllers running
> > > > > >> at the same time, hence if we don't have the controller queue
> > > capacity
> > > > > >> config and simply make its capacity to be 1,
> > > > > >> network threads handling requests from different controllers
> will
> > be
> > > > > >> blocked during those troublesome times,
> > > > > >> which is probably not what we want. On the other hand, adding
> the
> > > > extra
> > > > > >> config with a default value, say 20, guards us from issues in
> > those
> > > > > >> troublesome times, and IMO there isn't much downside of adding
> the
> > > > extra
> > > > > >> config.
> > > > > >>
> > > > > >> @Mayuresh
> > > > > >> Good catch, this sentence is an obsolete statement based on a
> > > previous
> > > > > >> design. I've revised the wording in the KIP.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Lucas
> > > > > >>
> > > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > > >>
> > > > > >> > Hi Lucas,
> > > > > >> >
> > > > > >> > Thanks for the KIP.
> > > > > >> > I am trying to understand why you think "The memory
> consumption
> > > can
> > > > > rise
> > > > > >> > given the total number of queued requests can go up to 2x" in
> > the
> > > > > impact
> > > > > >> > section. Normally the requests from controller to a Broker are
> > not
> > > > > high
> > > > > >> > volume, right ?
> > > > > >> >
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Mayuresh
> > > > > >> >
> > > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> > becket.qin@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks for the KIP, Lucas. Separating the control plane from
> > the
> > > > > data
> > > > > >> > plane
> > > > > >> > > makes a lot of sense.
> > > > > >> > >
> > > > > >> > > In the KIP you mentioned that the controller request queue
> may
> > > > have
> > > > > >> many
> > > > > >> > > requests in it. Will this be a common case? The controller
> > > > requests
> > > > > >> still
> > > > > >> > > goes through the SocketServer. The SocketServer will mute
> the
> > > > > channel
> > > > > >> > once
> > > > > >> > > a request is read and put into the request channel. So
> > assuming
> > > > > there
> > > > > >> is
> > > > > >> > > only one connection between controller and each broker, on
> the
> > > > > broker
> > > > > >> > side,
> > > > > >> > > there should be only one controller request in the
> controller
> > > > > request
> > > > > >> > queue
> > > > > >> > > at any given time. If that is the case, do we need a
> separate
> > > > > >> controller
> > > > > >> > > request queue capacity config? The default value 20 means
> that
> > > we
> > > > > >> expect
> > > > > >> > > there are 20 controller switches to happen in a short period
> > of
> > > > > time.
> > > > > >> I
> > > > > >> > am
> > > > > >> > > not sure whether someone should increase the controller
> > request
> > > > > queue
> > > > > >> > > capacity to handle such case, as it seems indicating
> something
> > > > very
> > > > > >> wrong
> > > > > >> > > has happened.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> > lindong28@gmail.com>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Thanks for the update Lucas.
> > > > > >> > > >
> > > > > >> > > > I think the motivation section is intuitive. It will be
> good
> > > to
> > > > > >> learn
> > > > > >> > > more
> > > > > >> > > > about the comments from other reviewers.
> > > > > >> > > >
> > > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Dong,
> > > > > >> > > > >
> > > > > >> > > > > I've updated the motivation section of the KIP by
> > explaining
> > > > the
> > > > > >> > cases
> > > > > >> > > > that
> > > > > >> > > > > would have user impacts.
> > > > > >> > > > > Please take a look at let me know your comments.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Lucas
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi Dong,
> > > > > >> > > > > >
> > > > > >> > > > > > The simulation of disk being slow is merely for me to
> > > easily
> > > > > >> > > construct
> > > > > >> > > > a
> > > > > >> > > > > > testing scenario
> > > > > >> > > > > > with a backlog of produce requests. In production,
> other
> > > > than
> > > > > >> the
> > > > > >> > > disk
> > > > > >> > > > > > being slow, a backlog of
> > > > > >> > > > > > produce requests may also be caused by high produce
> QPS.
> > > > > >> > > > > > In that case, we may not want to kill the broker and
> > > that's
> > > > > when
> > > > > >> > this
> > > > > >> > > > KIP
> > > > > >> > > > > > can be useful, both for JBOD
> > > > > >> > > > > > and non-JBOD setup.
> > > > > >> > > > > >
> > > > > >> > > > > > Going back to your previous question about each
> > > > ProduceRequest
> > > > > >> > > covering
> > > > > >> > > > > 20
> > > > > >> > > > > > partitions that are randomly
> > > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> > enqueued
> > > > that
> > > > > >> > tries
> > > > > >> > > to
> > > > > >> > > > > > switch the current broker, say broker0, from leader to
> > > > > follower
> > > > > >> > > > > > *for one of the partitions*, say *test-0*. For the
> sake
> > of
> > > > > >> > argument,
> > > > > >> > > > > > let's also assume the other brokers, say broker1, have
> > > > > *stopped*
> > > > > >> > > > fetching
> > > > > >> > > > > > from
> > > > > >> > > > > > the current broker, i.e. broker0.
> > > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
> > (ALL)
> > > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > > > >> LeaderAndISR
> > > > > >> > > will
> > > > > >> > > > be
> > > > > >> > > > > > put into the purgatory,
> > > > > >> > > > > >         and since they'll never be replicated to other
> > > > brokers
> > > > > >> > > (because
> > > > > >> > > > > of
> > > > > >> > > > > > the assumption made above), they will
> > > > > >> > > > > >         be completed either when the LeaderAndISR
> > request
> > > is
> > > > > >> > > processed
> > > > > >> > > > or
> > > > > >> > > > > > when the timeout happens.
> > > > > >> > > > > >   1.2 With this KIP, broker0 will immediately
> transition
> > > the
> > > > > >> > > partition
> > > > > >> > > > > > test-0 to become a follower,
> > > > > >> > > > > >         after the current broker sees the replication
> of
> > > the
> > > > > >> > > remaining
> > > > > >> > > > 19
> > > > > >> > > > > > partitions, it can send a response indicating that
> > > > > >> > > > > >         it's no longer the leader for the "test-0".
> > > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
> > let's
> > > > say
> > > > > >> > there
> > > > > >> > > > are
> > > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
> > there
> > > > are
> > > > > 8
> > > > > >> io
> > > > > >> > > > > threads,
> > > > > >> > > > > >   so each io thread will process approximately 3000
> > > produce
> > > > > >> > requests.
> > > > > >> > > > Now
> > > > > >> > > > > > let's investigate the io thread that finally processed
> > the
> > > > > >> > > > LeaderAndISR.
> > > > > >> > > > > >   For the 3000 produce requests, if we model the time
> > when
> > > > > their
> > > > > >> > > > > remaining
> > > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > > > LeaderAndISR
> > > > > >> > > > request
> > > > > >> > > > > is
> > > > > >> > > > > > processed at time t3000.
> > > > > >> > > > > >   Without this KIP, the 1st produce request would have
> > > > waited
> > > > > an
> > > > > >> > > extra
> > > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra
> time
> > of
> > > > > >> t3000 -
> > > > > >> > > t1,
> > > > > >> > > > > etc.
> > > > > >> > > > > >   Roughly speaking, the latency difference is bigger
> for
> > > the
> > > > > >> > earlier
> > > > > >> > > > > > produce requests than for the later ones. For the same
> > > > reason,
> > > > > >> the
> > > > > >> > > more
> > > > > >> > > > > > ProduceRequests queued
> > > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> > > (capped
> > > > > by
> > > > > >> the
> > > > > >> > > > > > produce timeout).
> > > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> > acks=1
> > > > > >> > > > > >   There will be no latency differences in this case,
> but
> > > > > >> > > > > >   2.1 without this KIP, the records of partition
> test-0
> > in
> > > > the
> > > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> > appended
> > > > to
> > > > > >> the
> > > > > >> > > local
> > > > > >> > > > > log,
> > > > > >> > > > > >         and eventually be truncated after processing
> the
> > > > > >> > > LeaderAndISR.
> > > > > >> > > > > > This is what's referred to as
> > > > > >> > > > > >         "some unofficial definition of data loss in
> > terms
> > > of
> > > > > >> > messages
> > > > > >> > > > > > beyond the high watermark".
> > > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect since
> if
> > > the
> > > > > >> > > > LeaderAndISR
> > > > > >> > > > > > is immediately processed, the response to producers
> will
> > > > have
> > > > > >> > > > > >         the NotLeaderForPartition error, causing
> > producers
> > > > to
> > > > > >> retry
> > > > > >> > > > > >
> > > > > >> > > > > > This explanation above is the benefit for reducing the
> > > > latency
> > > > > >> of a
> > > > > >> > > > > broker
> > > > > >> > > > > > becoming the follower,
> > > > > >> > > > > > closely related is reducing the latency of a broker
> > > becoming
> > > > > the
> > > > > >> > > > leader.
> > > > > >> > > > > > In this case, the benefit is even more obvious, if
> other
> > > > > brokers
> > > > > >> > have
> > > > > >> > > > > > resigned leadership, and the
> > > > > >> > > > > > current broker should take leadership. Any delay in
> > > > processing
> > > > > >> the
> > > > > >> > > > > > LeaderAndISR will be perceived
> > > > > >> > > > > > by clients as unavailability. In extreme cases, this
> can
> > > > cause
> > > > > >> > failed
> > > > > >> > > > > > produce requests if the retries are
> > > > > >> > > > > > exhausted.
> > > > > >> > > > > >
> > > > > >> > > > > > Another two types of controller requests are
> > > UpdateMetadata
> > > > > and
> > > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > >> > > > > > For UpdateMetadata requests, delayed processing means
> > > > clients
> > > > > >> > > receiving
> > > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > >> > > > > > for certain partitions, and the effect is more retries
> > or
> > > > even
> > > > > >> > fatal
> > > > > >> > > > > > failure if the retries are exhausted.
> > > > > >> > > > > >
> > > > > >> > > > > > For StopReplica requests, a long queuing time may
> > degrade
> > > > the
> > > > > >> > > > performance
> > > > > >> > > > > > of topic deletion.
> > > > > >> > > > > >
> > > > > >> > > > > > Regarding your last question of the delay for
> > > > > >> > DescribeLogDirsRequest,
> > > > > >> > > > you
> > > > > >> > > > > > are right
> > > > > >> > > > > > that this KIP cannot help with the latency in getting
> > the
> > > > log
> > > > > >> dirs
> > > > > >> > > > info,
> > > > > >> > > > > > and it's only relevant
> > > > > >> > > > > > when controller requests are involved.
> > > > > >> > > > > >
> > > > > >> > > > > > Regards,
> > > > > >> > > > > > Lucas
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > > lindong28@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> Hey Jun,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Thanks much for the comments. It is good point. So
> the
> > > > > feature
> > > > > >> may
> > > > > >> > > be
> > > > > >> > > > > >> useful for JBOD use-case. I have one question below.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Hey Lucas,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Do you think this feature is also useful for non-JBOD
> > > setup
> > > > > or
> > > > > >> it
> > > > > >> > is
> > > > > >> > > > > only
> > > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> > understand
> > > > > this.
> > > > > >> > > > > >>
> > > > > >> > > > > >> When the broker is setup using JBOD, in order to move
> > > > leaders
> > > > > >> on
> > > > > >> > the
> > > > > >> > > > > >> failed
> > > > > >> > > > > >> disk to other disks, the system operator first needs
> to
> > > get
> > > > > the
> > > > > >> > list
> > > > > >> > > > of
> > > > > >> > > > > >> partitions on the failed disk. This is currently
> > achieved
> > > > > using
> > > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > > >> DescribeLogDirsRequest
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > >> broker. If we only prioritize the controller
> requests,
> > > then
> > > > > the
> > > > > >> > > > > >> DescribeLogDirsRequest
> > > > > >> > > > > >> may still take a long time to be processed by the
> > broker.
> > > > So
> > > > > >> the
> > > > > >> > > > overall
> > > > > >> > > > > >> time to move leaders away from the failed disk may
> > still
> > > be
> > > > > >> long
> > > > > >> > > even
> > > > > >> > > > > with
> > > > > >> > > > > >> this KIP. What do you think?
> > > > > >> > > > > >>
> > > > > >> > > > > >> Thanks,
> > > > > >> > > > > >> Dong
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >>
> > > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > @Dong,
> > > > > >> > > > > >> > Since both of the two comments in your previous
> email
> > > are
> > > > > >> about
> > > > > >> > > the
> > > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > >> > > > > >> > in light of Jun's last comment, do you agree that
> > this
> > > > KIP
> > > > > >> can
> > > > > >> > be
> > > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > >> > > > > >> > Please let me know, thanks!
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > Regards,
> > > > > >> > > > > >> > Lucas
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > > jun@confluent.io>
> > > > > >> > wrote:
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > If all disks on a broker are slow, one probably
> > > should
> > > > > just
> > > > > >> > kill
> > > > > >> > > > the
> > > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
> > only
> > > > one
> > > > > of
> > > > > >> > the
> > > > > >> > > > > disks
> > > > > >> > > > > >> on
> > > > > >> > > > > >> > a
> > > > > >> > > > > >> > > broker is slow, one may want to fail that disk
> and
> > > move
> > > > > the
> > > > > >> > > > leaders
> > > > > >> > > > > on
> > > > > >> > > > > >> > that
> > > > > >> > > > > >> > > disk to other brokers. In that case, being able
> to
> > > > > process
> > > > > >> the
> > > > > >> > > > > >> > LeaderAndIsr
> > > > > >> > > > > >> > > requests faster will potentially help the
> producers
> > > > > recover
> > > > > >> > > > quicker.
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Jun
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > > >> lindong28@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > > Hey Lucas,
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
> > > below.
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > > > partitions
> > > > > >> > that
> > > > > >> > > > are
> > > > > >> > > > > >> > > randomly
> > > > > >> > > > > >> > > > distributed across all partitions, then each
> > > > > >> ProduceRequest
> > > > > >> > > will
> > > > > >> > > > > >> likely
> > > > > >> > > > > >> > > > cover some partitions for which the broker is
> > still
> > > > > >> leader
> > > > > >> > > after
> > > > > >> > > > > it
> > > > > >> > > > > >> > > quickly
> > > > > >> > > > > >> > > > processes the
> > > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be
> > slow
> > > > in
> > > > > >> > > > processing
> > > > > >> > > > > >> these
> > > > > >> > > > > >> > > > ProduceRequest and request will still be very
> > high
> > > > with
> > > > > >> this
> > > > > >> > > > KIP.
> > > > > >> > > > > It
> > > > > >> > > > > >> > > seems
> > > > > >> > > > > >> > > > that most ProduceRequest will still timeout
> after
> > > 30
> > > > > >> > seconds.
> > > > > >> > > Is
> > > > > >> > > > > >> this
> > > > > >> > > > > >> > > > understanding correct?
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
> > > > timeout
> > > > > >> after
> > > > > >> > > 30
> > > > > >> > > > > >> > seconds,
> > > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> > average
> > > > > >> produce
> > > > > >> > > > > latency.
> > > > > >> > > > > >> Can
> > > > > >> > > > > >> > > you
> > > > > >> > > > > >> > > > clarify what metrics can be improved by this
> KIP?
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Not sure why system operator directly cares
> > number
> > > of
> > > > > >> > > truncated
> > > > > >> > > > > >> > messages.
> > > > > >> > > > > >> > > > Do you mean this KIP can improve average
> > throughput
> > > > or
> > > > > >> > reduce
> > > > > >> > > > > >> message
> > > > > >> > > > > >> > > > duplication? It will be good to understand
> this.
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > Thanks,
> > > > > >> > > > > >> > > > Dong
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > > >> > > lucasatucla@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > >> > wrote:
> > > > > >> > > > > >> > > >
> > > > > >> > > > > >> > > > > Hi Dong,
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Thanks for your valuable comments. Please see
> > my
> > > > > reply
> > > > > >> > > below.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition.
> Now
> > > > let's
> > > > > >> > > consider
> > > > > >> > > > a
> > > > > >> > > > > >> more
> > > > > >> > > > > >> > > > common
> > > > > >> > > > > >> > > > > scenario
> > > > > >> > > > > >> > > > > where broker0 is the leader of many
> partitions.
> > > And
> > > > > >> let's
> > > > > >> > > say
> > > > > >> > > > > for
> > > > > >> > > > > >> > some
> > > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > >> > > > > The number of leader partitions on broker0 is
> > so
> > > > > large,
> > > > > >> > say
> > > > > >> > > > 10K,
> > > > > >> > > > > >> that
> > > > > >> > > > > >> > > the
> > > > > >> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > >> > > > > and the operator would like to shift the
> > > leadership
> > > > > >> for a
> > > > > >> > > lot
> > > > > >> > > > of
> > > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > >> > > > > >> > > > > either manually or through some service like
> > > cruise
> > > > > >> > control.
> > > > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > > > transitions
> > > > > >> > > finish
> > > > > >> > > > > >> more
> > > > > >> > > > > >> > > > > quickly, helping the cluster itself becoming
> > more
> > > > > >> > balanced,
> > > > > >> > > > > >> > > > > but all existing producers corresponding to
> the
> > > 9K
> > > > > >> > > partitions
> > > > > >> > > > > will
> > > > > >> > > > > >> > get
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > >> > > > > rather than relying on their timeout, thanks
> to
> > > the
> > > > > >> > batched
> > > > > >> > > > > async
> > > > > >> > > > > >> ZK
> > > > > >> > > > > >> > > > > operations.
> > > > > >> > > > > >> > > > > To me it's a useful feature to have during
> such
> > > > > >> > troublesome
> > > > > >> > > > > times.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have
> shown
> > > > that
> > > > > >> with
> > > > > >> > > this
> > > > > >> > > > > KIP
> > > > > >> > > > > >> > many
> > > > > >> > > > > >> > > > > producers
> > > > > >> > > > > >> > > > > receive an explicit error
> > NotLeaderForPartition,
> > > > > based
> > > > > >> on
> > > > > >> > > > which
> > > > > >> > > > > >> they
> > > > > >> > > > > >> > > > retry
> > > > > >> > > > > >> > > > > immediately.
> > > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick
> retry)
> > > for
> > > > > >> their
> > > > > >> > > > single
> > > > > >> > > > > >> > > message
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > much smaller
> > > > > >> > > > > >> > > > > compared with the case of timing out without
> > the
> > > > KIP
> > > > > >> (30
> > > > > >> > > > seconds
> > > > > >> > > > > >> for
> > > > > >> > > > > >> > > > timing
> > > > > >> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > >> > > > > One might argue that reducing the timing out
> on
> > > the
> > > > > >> > producer
> > > > > >> > > > > side
> > > > > >> > > > > >> can
> > > > > >> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > > drawbacks[1].
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> > number
> > > of
> > > > > >> > > truncated
> > > > > >> > > > > >> > messages
> > > > > >> > > > > >> > > on
> > > > > >> > > > > >> > > > > brokers,
> > > > > >> > > > > >> > > > > with the experiments done in the Google Doc,
> it
> > > > > should
> > > > > >> be
> > > > > >> > > easy
> > > > > >> > > > > to
> > > > > >> > > > > >> see
> > > > > >> > > > > >> > > > that
> > > > > >> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > >> > > > > to be truncated on broker0 since the
> up-to-date
> > > > > >> metadata
> > > > > >> > > > avoids
> > > > > >> > > > > >> > > appending
> > > > > >> > > > > >> > > > > of messages
> > > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk
> to a
> > > > > system
> > > > > >> > > > operator
> > > > > >> > > > > >> and
> > > > > >> > > > > >> > ask
> > > > > >> > > > > >> > > > > whether
> > > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
> > likely
> > > > the
> > > > > >> > answer
> > > > > >> > > > is
> > > > > >> > > > > >> yes.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > 3. To answer your question, I think it might
> be
> > > > > >> helpful to
> > > > > >> > > > > >> construct
> > > > > >> > > > > >> > > some
> > > > > >> > > > > >> > > > > formulas.
> > > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to
> the
> > > > case
> > > > > >> where
> > > > > >> > > > there
> > > > > >> > > > > >> is
> > > > > >> > > > > >> > > only
> > > > > >> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > >> > > > > Following the experiments in the Google Doc,
> > > let's
> > > > > say
> > > > > >> > > broker0
> > > > > >> > > > > >> > becomes
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > >> > > > > and after t0 there were still N produce
> > requests
> > > in
> > > > > its
> > > > > >> > > > request
> > > > > >> > > > > >> > queue.
> > > > > >> > > > > >> > > > > With the up-to-date metadata brought by this
> > KIP,
> > > > > >> broker0
> > > > > >> > > can
> > > > > >> > > > > >> reply
> > > > > >> > > > > >> > > with
> > > > > >> > > > > >> > > > an
> > > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > >> > > > > >> > > > > let's use M1 to denote the average processing
> > > time
> > > > of
> > > > > >> > > replying
> > > > > >> > > > > >> with
> > > > > >> > > > > >> > > such
> > > > > >> > > > > >> > > > an
> > > > > >> > > > > >> > > > > error message.
> > > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> > append
> > > > > >> messages
> > > > > >> > to
> > > > > >> > > > > >> > segments,
> > > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > > >> > > > > >> > > > > let's use M2 to denote the average processing
> > > time
> > > > > for
> > > > > >> > such
> > > > > >> > > > > logic.
> > > > > >> > > > > >> > > > > Then the average extra latency incurred
> without
> > > > this
> > > > > >> KIP
> > > > > >> > is
> > > > > >> > > N
> > > > > >> > > > *
> > > > > >> > > > > >> (M2 -
> > > > > >> > > > > >> > > > M1) /
> > > > > >> > > > > >> > > > > 2.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > In practice, M2 should always be larger than
> > M1,
> > > > > which
> > > > > >> > means
> > > > > >> > > > as
> > > > > >> > > > > >> long
> > > > > >> > > > > >> > > as N
> > > > > >> > > > > >> > > > > is positive,
> > > > > >> > > > > >> > > > > we would see improvements on the average
> > latency.
> > > > > >> > > > > >> > > > > There does not need to be significant backlog
> > of
> > > > > >> requests
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > >> > > request
> > > > > >> > > > > >> > > > > queue,
> > > > > >> > > > > >> > > > > or severe degradation of disk performance to
> > have
> > > > the
> > > > > >> > > > > improvement.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > Regards,
> > > > > >> > > > > >> > > > > Lucas
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> > > > > producer
> > > > > >> > side
> > > > > >> > > > can
> > > > > >> > > > > >> > trigger
> > > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > > >> > > > > >> > > > > when the corresponding leader broker is
> > > overloaded,
> > > > > >> > > > exacerbating
> > > > > >> > > > > >> the
> > > > > >> > > > > >> > > > > situation.
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > > > >> > > lindong28@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > >> > wrote:
> > > > > >> > > > > >> > > > >
> > > > > >> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Thanks much for the detailed documentation
> of
> > > the
> > > > > >> > > > experiment.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Initially I also think having a separate
> > queue
> > > > for
> > > > > >> > > > controller
> > > > > >> > > > > >> > > requests
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> > summary
> > > > > >> section
> > > > > >> > of
> > > > > >> > > > the
> > > > > >> > > > > >> > Google
> > > > > >> > > > > >> > > > > doc,
> > > > > >> > > > > >> > > > > > controller requests are generally more
> > > important
> > > > > than
> > > > > >> > data
> > > > > >> > > > > >> requests
> > > > > >> > > > > >> > > and
> > > > > >> > > > > >> > > > > we
> > > > > >> > > > > >> > > > > > probably want controller requests to be
> > > processed
> > > > > >> > sooner.
> > > > > >> > > > But
> > > > > >> > > > > >> then
> > > > > >> > > > > >> > > Eno
> > > > > >> > > > > >> > > > > has
> > > > > >> > > > > >> > > > > > two very good questions which I am not sure
> > the
> > > > > >> Google
> > > > > >> > doc
> > > > > >> > > > has
> > > > > >> > > > > >> > > answered
> > > > > >> > > > > >> > > > > > explicitly. Could you help with the
> following
> > > > > >> questions?
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
> > > > benefit
> > > > > of
> > > > > >> > > > KIP-291
> > > > > >> > > > > to
> > > > > >> > > > > >> > > users.
> > > > > >> > > > > >> > > > > The
> > > > > >> > > > > >> > > > > > experiment setup in the Google doc
> simulates
> > > the
> > > > > >> > scenario
> > > > > >> > > > that
> > > > > >> > > > > >> > broker
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to
> e.g.
> > > > slow
> > > > > >> disk.
> > > > > >> > > It
> > > > > >> > > > > >> > currently
> > > > > >> > > > > >> > > > > > assumes that there is only 1 partition. But
> > in
> > > > the
> > > > > >> > common
> > > > > >> > > > > >> scenario,
> > > > > >> > > > > >> > > it
> > > > > >> > > > > >> > > > is
> > > > > >> > > > > >> > > > > > probably reasonable to assume that there
> are
> > > many
> > > > > >> other
> > > > > >> > > > > >> partitions
> > > > > >> > > > > >> > > that
> > > > > >> > > > > >> > > > > are
> > > > > >> > > > > >> > > > > > also actively produced to and
> ProduceRequest
> > to
> > > > > these
> > > > > >> > > > > partition
> > > > > >> > > > > >> > also
> > > > > >> > > > > >> > > > > takes
> > > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
> > > > broker0
> > > > > >> can
> > > > > >> > > > become
> > > > > >> > > > > >> > > follower
> > > > > >> > > > > >> > > > > for
> > > > > >> > > > > >> > > > > > the partition 0 soon, it probably still
> needs
> > > to
> > > > > >> process
> > > > > >> > > the
> > > > > >> > > > > >> > > > > ProduceRequest
> > > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > > ProduceRequests
> > > > > >> > cover
> > > > > >> > > > > other
> > > > > >> > > > > >> > > > > partitions.
> > > > > >> > > > > >> > > > > > Thus most ProduceRequest will still timeout
> > > after
> > > > > 30
> > > > > >> > > seconds
> > > > > >> > > > > and
> > > > > >> > > > > >> > most
> > > > > >> > > > > >> > > > > > clients will still likely timeout after 30
> > > > seconds.
> > > > > >> Then
> > > > > >> > > it
> > > > > >> > > > is
> > > > > >> > > > > >> not
> > > > > >> > > > > >> > > > > > obviously what is the benefit to client
> since
> > > > > client
> > > > > >> > will
> > > > > >> > > > > >> timeout
> > > > > >> > > > > >> > > after
> > > > > >> > > > > >> > > > > 30
> > > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> > > broker1,
> > > > > >> with
> > > > > >> > or
> > > > > >> > > > > >> without
> > > > > >> > > > > >> > > > > KIP-291.
> > > > > >> > > > > >> > > > > > Did I miss something here?
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> > > > > benefits
> > > > > >> of
> > > > > >> > > this
> > > > > >> > > > > >> KIP to
> > > > > >> > > > > >> > > > user
> > > > > >> > > > > >> > > > > or
> > > > > >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> > > > > decreases
> > > > > >> > > > average
> > > > > >> > > > > >> > > latency,
> > > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> > exception
> > > > > >> exposed
> > > > > >> > to
> > > > > >> > > > > >> client
> > > > > >> > > > > >> > > etc.
> > > > > >> > > > > >> > > > It
> > > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > 3) Does this KIP help improve user
> experience
> > > > only
> > > > > >> when
> > > > > >> > > > there
> > > > > >> > > > > is
> > > > > >> > > > > >> > > issue
> > > > > >> > > > > >> > > > > with
> > > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> > request
> > > > > queue
> > > > > >> > due
> > > > > >> > > to
> > > > > >> > > > > >> slow
> > > > > >> > > > > >> > > disk
> > > > > >> > > > > >> > > > as
> > > > > >> > > > > >> > > > > > described in the Google doc? Or is this KIP
> > > also
> > > > > >> useful
> > > > > >> > > when
> > > > > >> > > > > >> there
> > > > > >> > > > > >> > is
> > > > > >> > > > > >> > > > no
> > > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> > > helpful
> > > > > to
> > > > > >> > > clarify
> > > > > >> > > > > >> this
> > > > > >> > > > > >> > to
> > > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > Thanks much,
> > > > > >> > > > > >> > > > > > Dong
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas
> Wang <
> > > > > >> > > > > >> lucasatucla@gmail.com
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > > > wrote:
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > > Hi Eno,
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> > experiment
> > > > > >> results.
> > > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > > achieved
> > > > by
> > > > > >> > > > > implementing
> > > > > >> > > > > >> > the
> > > > > >> > > > > >> > > > > > proposed
> > > > > >> > > > > >> > > > > > > change:
> > > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > >> > > > > >> > > > > > > Please take a look when you have time and
> > let
> > > > me
> > > > > >> know
> > > > > >> > > your
> > > > > >> > > > > >> > > feedback.
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > Regards,
> > > > > >> > > > > >> > > > > > > Lucas
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > > > >> > > kafka@harsha.io>
> > > > > >> > > > > >> wrote:
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a
> look
> > > > might
> > > > > >> suit
> > > > > >> > > our
> > > > > >> > > > > >> > > > requirements
> > > > > >> > > > > >> > > > > > > > better.
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > Harsha
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM,
> Lucas
> > > > Wang <
> > > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > > >> > > > > >> > > > > >
> > > > > >> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > > replication
> > > > > >> quota
> > > > > >> > > > > mechanism
> > > > > >> > > > > >> > > > proposed
> > > > > >> > > > > >> > > > > > in
> > > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that
> scenario.
> > > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> > Harsha <
> > > > > >> > > > > kafka@harsha.io
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > > wrote:
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
> > > making
> > > > > >> this
> > > > > >> > > > > >> configurable
> > > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> > > requests
> > > > > to
> > > > > >> be
> > > > > >> > > > > >> > prioritized.
> > > > > >> > > > > >> > > > For
> > > > > >> > > > > >> > > > > > > > example
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
> > take
> > > > out
> > > > > a
> > > > > >> > > broker
> > > > > >> > > > > and
> > > > > >> > > > > >> > bring
> > > > > >> > > > > >> > > > new
> > > > > >> > > > > >> > > > > > one
> > > > > >> > > > > >> > > > > > > > it
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > will try to become follower and
> have
> > > lot
> > > > of
> > > > > >> > fetch
> > > > > >> > > > > >> requests
> > > > > >> > > > > >> > to
> > > > > >> > > > > >> > > > > other
> > > > > >> > > > > >> > > > > > > > > leaders
> > > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> > > effect
> > > > > the
> > > > > >> > > > > >> > > application/client
> > > > > >> > > > > >> > > > > > > > > requests.
> > > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > > > solution
> > > > > to
> > > > > >> > > > > >> de-prioritize
> > > > > >> > > > > >> > > if
> > > > > >> > > > > >> > > > a
> > > > > >> > > > > >> > > > > > new
> > > > > >> > > > > >> > > > > > > > > > replica comes in for fetch
> requests,
> > we
> > > > are
> > > > > >> ok
> > > > > >> > > with
> > > > > >> > > > > the
> > > > > >> > > > > >> > > replica
> > > > > >> > > > > >> > > > > to
> > > > > >> > > > > >> > > > > > be
> > > > > >> > > > > >> > > > > > > > > > taking time but the leaders should
> > > > > prioritize
> > > > > >> > the
> > > > > >> > > > > client
> > > > > >> > > > > >> > > > > requests.
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > > >> > > > > >> > > > > > > > > > Harsha
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
> > > Lucas
> > > > > Wang
> > > > > >> > > wrote:
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the
> feature
> > > > yet,
> > > > > >> so no
> > > > > >> > > > > >> > experimental
> > > > > >> > > > > >> > > > > > results
> > > > > >> > > > > >> > > > > > > > so
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > far.
> > > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> > > > > following
> > > > > >> > days.
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that
> the
> > > > > >> priority
> > > > > >> > > queue
> > > > > >> > > > > >> does
> > > > > >> > > > > >> > not
> > > > > >> > > > > >> > > > > > > > completely
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > prevent
> > > > > >> > > > > >> > > > > > > > > > > data requests being processed
> ahead
> > > of
> > > > > >> > > controller
> > > > > >> > > > > >> > requests.
> > > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> > > greatly
> > > > > >> > mitigate
> > > > > >> > > > the
> > > > > >> > > > > >> > effect
> > > > > >> > > > > >> > > > of
> > > > > >> > > > > >> > > > > > > stable
> > > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and
> > post
> > > > the
> > > > > >> > > results
> > > > > >> > > > > >> when I
> > > > > >> > > > > >> > > have
> > > > > >> > > > > >> > > > > it.
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM,
> > Eno
> > > > > >> Thereska
> > > > > >> > <
> > > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > >> > > > > >> > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a
> > > look
> > > > at
> > > > > >> > this.
> > > > > >> > > A
> > > > > >> > > > > >> couple
> > > > > >> > > > > >> > of
> > > > > >> > > > > >> > > > > > > > questions:
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> > > change
> > > > > >> after
> > > > > >> > > > > >> implementing
> > > > > >> > > > > >> > > > this
> > > > > >> > > > > >> > > > > > KIP?
> > > > > >> > > > > >> > > > > > > > > I'm
> > > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > > experimental
> > > > > >> > results
> > > > > >> > > > > that
> > > > > >> > > > > >> > show
> > > > > >> > > > > >> > > > the
> > > > > >> > > > > >> > > > > > > > benefit
> > > > > >> > > > > >> > > > > > > > > of
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > > sufficient
> > > > in
> > > > > >> > > > addressing
> > > > > >> > > > > >> the
> > > > > >> > > > > >> > > > > problem
> > > > > >> > > > > >> > > > > > > the
> > > > > >> > > > > >> > > > > > > > > KIP
> > > > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
> > > > queues,
> > > > > >> you
> > > > > >> > > will
> > > > > >> > > > > >> > sometimes
> > > > > >> > > > > >> > > > > > > (often?)
> > > > > >> > > > > >> > > > > > > > > have
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
> > will
> > > be
> > > > > >> ahead
> > > > > >> > of
> > > > > >> > > > the
> > > > > >> > > > > >> > > control
> > > > > >> > > > > >> > > > > > plane
> > > > > >> > > > > >> > > > > > > > > > > requests.
> > > > > >> > > > > >> > > > > > > > > > > > This happens because the system
> > > might
> > > > > >> have
> > > > > >> > > > already
> > > > > >> > > > > >> > > started
> > > > > >> > > > > >> > > > > > > > > processing
> > > > > >> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > >> > > > > > > > > > > > data plane requests before the
> > > > control
> > > > > >> plane
> > > > > >> > > > ones
> > > > > >> > > > > >> > > arrived.
> > > > > >> > > > > >> > > > So
> > > > > >> > > > > >> > > > > > it
> > > > > >> > > > > >> > > > > > > > > would
> > > > > >> > > > > >> > > > > > > > > > > be
> > > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> > problem
> > > > this
> > > > > >> KIP
> > > > > >> > > > > >> addresses.
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44
> PM,
> > > Ted
> > > > > Yu <
> > > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > >
> > > > > >> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42
> > AM,
> > > > > Lucas
> > > > > >> > Wang
> > > > > >> > > <
> > > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion.
> > I've
> > > > > >> updated
> > > > > >> > > the
> > > > > >> > > > > KIP.
> > > > > >> > > > > >> > > Please
> > > > > >> > > > > >> > > > > > take
> > > > > >> > > > > >> > > > > > > > > > another
> > > > > >> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> 6:34
> > > PM,
> > > > > Ted
> > > > > >> Yu
> > > > > >> > <
> > > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > >> > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > > KafkaConfig.scala
> > > > :
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests =
> 500
> > > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you
> can
> > > > > include
> > > > > >> > the
> > > > > >> > > > > >> default
> > > > > >> > > > > >> > > value
> > > > > >> > > > > >> > > > > for
> > > > > >> > > > > >> > > > > > > > this
> > > > > >> > > > > >> > > > > > > > >
> > > > > >>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Mayuresh/Joel,

Using the request channel as a dequeue was bright up some time ago when we
initially thinking of prioritizing the request. The concern was that the
controller requests are supposed to be processed in order. If we can ensure
that there is one controller request in the request channel, the order is
not a concern. But in cases that there are more than one controller request
inserted into the queue, the controller request order may change and cause
problem. For example, think about the following sequence:
1. Controller successfully sent a request R1 to broker
2. Broker receives R1 and put the request to the head of the request queue.
3. Controller to broker connection failed and the controller reconnected to
the broker.
4. Controller sends a request R2 to the broker
5. Broker receives R2 and add it to the head of the request queue.
Now on the broker side, R2 will be processed before R1 is processed, which
may cause problem.

Thanks,

Jiangjie (Becket) Qin



On Thu, Jul 19, 2018 at 3:23 AM, Joel Koshy <jj...@gmail.com> wrote:

> @Mayuresh - I like your idea. It appears to be a simpler less invasive
> alternative and it should work. Jun/Becket/others, do you see any pitfalls
> with this approach?
>
> On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> > @Mayuresh,
> > That's a very interesting idea that I haven't thought before.
> > It seems to solve our problem at hand pretty well, and also
> > avoids the need to have a new size metric and capacity config
> > for the controller request queue. In fact, if we were to adopt
> > this design, there is no public interface change, and we
> > probably don't need a KIP.
> > Also implementation wise, it seems
> > the java class LinkedBlockingQueue can readily satisfy the requirement
> > by supporting a capacity, and also allowing inserting at both ends.
> >
> > My only concern is that this design is tied to the coincidence that
> > we have two request priorities and there are two ends to a deque.
> > Hence by using the proposed design, it seems the network layer is
> > more tightly coupled with upper layer logic, e.g. if we were to add
> > an extra priority level in the future for some reason, we would probably
> > need to go back to the design of separate queues, one for each priority
> > level.
> >
> > In summary, I'm ok with both designs and lean toward your suggested
> > approach.
> > Let's hear what others think.
> >
> > @Becket,
> > In light of Mayuresh's suggested new design, I'm answering your question
> > only in the context
> > of the current KIP design: I think your suggestion makes sense, and I'm
> ok
> > with removing the capacity config and
> > just relying on the default value of 20 being sufficient enough.
> >
> > Thanks,
> > Lucas
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com
> > > wrote:
> >
> > > Hi Lucas,
> > >
> > > Seems like the main intent here is to prioritize the controller request
> > > over any other requests.
> > > In that case, we can change the request queue to a dequeue, where you
> > > always insert the normal requests (produce, consume,..etc) to the end
> of
> > > the dequeue, but if its a controller request, you insert it to the head
> > of
> > > the queue. This ensures that the controller request will be given
> higher
> > > priority over other requests.
> > >
> > > Also since we only read one request from the socket and mute it and
> only
> > > unmute it after handling the request, this would ensure that we don't
> > > handle controller requests out of order.
> > >
> > > With this approach we can avoid the second queue and the additional
> > config
> > > for the size of the queue.
> > >
> > > What do you think ?
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Hey Joel,
> > > >
> > > > Thank for the detail explanation. I agree the current design makes
> > sense.
> > > > My confusion is about whether the new config for the controller queue
> > > > capacity is necessary. I cannot think of a case in which users would
> > > change
> > > > it.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Lucas,
> > > > >
> > > > > I guess my question can be rephrased to "do we expect user to ever
> > > change
> > > > > the controller request queue capacity"? If we agree that 20 is
> > already
> > > a
> > > > > very generous default number and we do not expect user to change
> it,
> > is
> > > > it
> > > > > still necessary to expose this as a config?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> @Becket
> > > > >> 1. Thanks for the comment. You are right that normally there
> should
> > be
> > > > >> just
> > > > >> one controller request because of muting,
> > > > >> and I had NOT intended to say there would be many enqueued
> > controller
> > > > >> requests.
> > > > >> I went through the KIP again, and I'm not sure which part conveys
> > that
> > > > >> info.
> > > > >> I'd be happy to revise if you point it out the section.
> > > > >>
> > > > >> 2. Though it should not happen in normal conditions, the current
> > > design
> > > > >> does not preclude multiple controllers running
> > > > >> at the same time, hence if we don't have the controller queue
> > capacity
> > > > >> config and simply make its capacity to be 1,
> > > > >> network threads handling requests from different controllers will
> be
> > > > >> blocked during those troublesome times,
> > > > >> which is probably not what we want. On the other hand, adding the
> > > extra
> > > > >> config with a default value, say 20, guards us from issues in
> those
> > > > >> troublesome times, and IMO there isn't much downside of adding the
> > > extra
> > > > >> config.
> > > > >>
> > > > >> @Mayuresh
> > > > >> Good catch, this sentence is an obsolete statement based on a
> > previous
> > > > >> design. I've revised the wording in the KIP.
> > > > >>
> > > > >> Thanks,
> > > > >> Lucas
> > > > >>
> > > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > > >> gharatmayuresh15@gmail.com> wrote:
> > > > >>
> > > > >> > Hi Lucas,
> > > > >> >
> > > > >> > Thanks for the KIP.
> > > > >> > I am trying to understand why you think "The memory consumption
> > can
> > > > rise
> > > > >> > given the total number of queued requests can go up to 2x" in
> the
> > > > impact
> > > > >> > section. Normally the requests from controller to a Broker are
> not
> > > > high
> > > > >> > volume, right ?
> > > > >> >
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Mayuresh
> > > > >> >
> > > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <
> becket.qin@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks for the KIP, Lucas. Separating the control plane from
> the
> > > > data
> > > > >> > plane
> > > > >> > > makes a lot of sense.
> > > > >> > >
> > > > >> > > In the KIP you mentioned that the controller request queue may
> > > have
> > > > >> many
> > > > >> > > requests in it. Will this be a common case? The controller
> > > requests
> > > > >> still
> > > > >> > > goes through the SocketServer. The SocketServer will mute the
> > > > channel
> > > > >> > once
> > > > >> > > a request is read and put into the request channel. So
> assuming
> > > > there
> > > > >> is
> > > > >> > > only one connection between controller and each broker, on the
> > > > broker
> > > > >> > side,
> > > > >> > > there should be only one controller request in the controller
> > > > request
> > > > >> > queue
> > > > >> > > at any given time. If that is the case, do we need a separate
> > > > >> controller
> > > > >> > > request queue capacity config? The default value 20 means that
> > we
> > > > >> expect
> > > > >> > > there are 20 controller switches to happen in a short period
> of
> > > > time.
> > > > >> I
> > > > >> > am
> > > > >> > > not sure whether someone should increase the controller
> request
> > > > queue
> > > > >> > > capacity to handle such case, as it seems indicating something
> > > very
> > > > >> wrong
> > > > >> > > has happened.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Jiangjie (Becket) Qin
> > > > >> > >
> > > > >> > >
> > > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <
> lindong28@gmail.com>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Thanks for the update Lucas.
> > > > >> > > >
> > > > >> > > > I think the motivation section is intuitive. It will be good
> > to
> > > > >> learn
> > > > >> > > more
> > > > >> > > > about the comments from other reviewers.
> > > > >> > > >
> > > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi Dong,
> > > > >> > > > >
> > > > >> > > > > I've updated the motivation section of the KIP by
> explaining
> > > the
> > > > >> > cases
> > > > >> > > > that
> > > > >> > > > > would have user impacts.
> > > > >> > > > > Please take a look at let me know your comments.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Lucas
> > > > >> > > > >
> > > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi Dong,
> > > > >> > > > > >
> > > > >> > > > > > The simulation of disk being slow is merely for me to
> > easily
> > > > >> > > construct
> > > > >> > > > a
> > > > >> > > > > > testing scenario
> > > > >> > > > > > with a backlog of produce requests. In production, other
> > > than
> > > > >> the
> > > > >> > > disk
> > > > >> > > > > > being slow, a backlog of
> > > > >> > > > > > produce requests may also be caused by high produce QPS.
> > > > >> > > > > > In that case, we may not want to kill the broker and
> > that's
> > > > when
> > > > >> > this
> > > > >> > > > KIP
> > > > >> > > > > > can be useful, both for JBOD
> > > > >> > > > > > and non-JBOD setup.
> > > > >> > > > > >
> > > > >> > > > > > Going back to your previous question about each
> > > ProduceRequest
> > > > >> > > covering
> > > > >> > > > > 20
> > > > >> > > > > > partitions that are randomly
> > > > >> > > > > > distributed, let's say a LeaderAndIsr request is
> enqueued
> > > that
> > > > >> > tries
> > > > >> > > to
> > > > >> > > > > > switch the current broker, say broker0, from leader to
> > > > follower
> > > > >> > > > > > *for one of the partitions*, say *test-0*. For the sake
> of
> > > > >> > argument,
> > > > >> > > > > > let's also assume the other brokers, say broker1, have
> > > > *stopped*
> > > > >> > > > fetching
> > > > >> > > > > > from
> > > > >> > > > > > the current broker, i.e. broker0.
> > > > >> > > > > > 1. If the enqueued produce requests have acks =  -1
> (ALL)
> > > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > > >> LeaderAndISR
> > > > >> > > will
> > > > >> > > > be
> > > > >> > > > > > put into the purgatory,
> > > > >> > > > > >         and since they'll never be replicated to other
> > > brokers
> > > > >> > > (because
> > > > >> > > > > of
> > > > >> > > > > > the assumption made above), they will
> > > > >> > > > > >         be completed either when the LeaderAndISR
> request
> > is
> > > > >> > > processed
> > > > >> > > > or
> > > > >> > > > > > when the timeout happens.
> > > > >> > > > > >   1.2 With this KIP, broker0 will immediately transition
> > the
> > > > >> > > partition
> > > > >> > > > > > test-0 to become a follower,
> > > > >> > > > > >         after the current broker sees the replication of
> > the
> > > > >> > > remaining
> > > > >> > > > 19
> > > > >> > > > > > partitions, it can send a response indicating that
> > > > >> > > > > >         it's no longer the leader for the "test-0".
> > > > >> > > > > >   To see the latency difference between 1.1 and 1.2,
> let's
> > > say
> > > > >> > there
> > > > >> > > > are
> > > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and
> there
> > > are
> > > > 8
> > > > >> io
> > > > >> > > > > threads,
> > > > >> > > > > >   so each io thread will process approximately 3000
> > produce
> > > > >> > requests.
> > > > >> > > > Now
> > > > >> > > > > > let's investigate the io thread that finally processed
> the
> > > > >> > > > LeaderAndISR.
> > > > >> > > > > >   For the 3000 produce requests, if we model the time
> when
> > > > their
> > > > >> > > > > remaining
> > > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > > LeaderAndISR
> > > > >> > > > request
> > > > >> > > > > is
> > > > >> > > > > > processed at time t3000.
> > > > >> > > > > >   Without this KIP, the 1st produce request would have
> > > waited
> > > > an
> > > > >> > > extra
> > > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time
> of
> > > > >> t3000 -
> > > > >> > > t1,
> > > > >> > > > > etc.
> > > > >> > > > > >   Roughly speaking, the latency difference is bigger for
> > the
> > > > >> > earlier
> > > > >> > > > > > produce requests than for the later ones. For the same
> > > reason,
> > > > >> the
> > > > >> > > more
> > > > >> > > > > > ProduceRequests queued
> > > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> > (capped
> > > > by
> > > > >> the
> > > > >> > > > > > produce timeout).
> > > > >> > > > > > 2. If the enqueued produce requests have acks=0 or
> acks=1
> > > > >> > > > > >   There will be no latency differences in this case, but
> > > > >> > > > > >   2.1 without this KIP, the records of partition test-0
> in
> > > the
> > > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be
> appended
> > > to
> > > > >> the
> > > > >> > > local
> > > > >> > > > > log,
> > > > >> > > > > >         and eventually be truncated after processing the
> > > > >> > > LeaderAndISR.
> > > > >> > > > > > This is what's referred to as
> > > > >> > > > > >         "some unofficial definition of data loss in
> terms
> > of
> > > > >> > messages
> > > > >> > > > > > beyond the high watermark".
> > > > >> > > > > >   2.2 with this KIP, we can mitigate the effect since if
> > the
> > > > >> > > > LeaderAndISR
> > > > >> > > > > > is immediately processed, the response to producers will
> > > have
> > > > >> > > > > >         the NotLeaderForPartition error, causing
> producers
> > > to
> > > > >> retry
> > > > >> > > > > >
> > > > >> > > > > > This explanation above is the benefit for reducing the
> > > latency
> > > > >> of a
> > > > >> > > > > broker
> > > > >> > > > > > becoming the follower,
> > > > >> > > > > > closely related is reducing the latency of a broker
> > becoming
> > > > the
> > > > >> > > > leader.
> > > > >> > > > > > In this case, the benefit is even more obvious, if other
> > > > brokers
> > > > >> > have
> > > > >> > > > > > resigned leadership, and the
> > > > >> > > > > > current broker should take leadership. Any delay in
> > > processing
> > > > >> the
> > > > >> > > > > > LeaderAndISR will be perceived
> > > > >> > > > > > by clients as unavailability. In extreme cases, this can
> > > cause
> > > > >> > failed
> > > > >> > > > > > produce requests if the retries are
> > > > >> > > > > > exhausted.
> > > > >> > > > > >
> > > > >> > > > > > Another two types of controller requests are
> > UpdateMetadata
> > > > and
> > > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > >> > > > > > For UpdateMetadata requests, delayed processing means
> > > clients
> > > > >> > > receiving
> > > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > >> > > > > > for certain partitions, and the effect is more retries
> or
> > > even
> > > > >> > fatal
> > > > >> > > > > > failure if the retries are exhausted.
> > > > >> > > > > >
> > > > >> > > > > > For StopReplica requests, a long queuing time may
> degrade
> > > the
> > > > >> > > > performance
> > > > >> > > > > > of topic deletion.
> > > > >> > > > > >
> > > > >> > > > > > Regarding your last question of the delay for
> > > > >> > DescribeLogDirsRequest,
> > > > >> > > > you
> > > > >> > > > > > are right
> > > > >> > > > > > that this KIP cannot help with the latency in getting
> the
> > > log
> > > > >> dirs
> > > > >> > > > info,
> > > > >> > > > > > and it's only relevant
> > > > >> > > > > > when controller requests are involved.
> > > > >> > > > > >
> > > > >> > > > > > Regards,
> > > > >> > > > > > Lucas
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > > lindong28@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > > > >
> > > > >> > > > > >> Hey Jun,
> > > > >> > > > > >>
> > > > >> > > > > >> Thanks much for the comments. It is good point. So the
> > > > feature
> > > > >> may
> > > > >> > > be
> > > > >> > > > > >> useful for JBOD use-case. I have one question below.
> > > > >> > > > > >>
> > > > >> > > > > >> Hey Lucas,
> > > > >> > > > > >>
> > > > >> > > > > >> Do you think this feature is also useful for non-JBOD
> > setup
> > > > or
> > > > >> it
> > > > >> > is
> > > > >> > > > > only
> > > > >> > > > > >> useful for the JBOD setup? It may be useful to
> understand
> > > > this.
> > > > >> > > > > >>
> > > > >> > > > > >> When the broker is setup using JBOD, in order to move
> > > leaders
> > > > >> on
> > > > >> > the
> > > > >> > > > > >> failed
> > > > >> > > > > >> disk to other disks, the system operator first needs to
> > get
> > > > the
> > > > >> > list
> > > > >> > > > of
> > > > >> > > > > >> partitions on the failed disk. This is currently
> achieved
> > > > using
> > > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > > >> DescribeLogDirsRequest
> > > > >> > to
> > > > >> > > > the
> > > > >> > > > > >> broker. If we only prioritize the controller requests,
> > then
> > > > the
> > > > >> > > > > >> DescribeLogDirsRequest
> > > > >> > > > > >> may still take a long time to be processed by the
> broker.
> > > So
> > > > >> the
> > > > >> > > > overall
> > > > >> > > > > >> time to move leaders away from the failed disk may
> still
> > be
> > > > >> long
> > > > >> > > even
> > > > >> > > > > with
> > > > >> > > > > >> this KIP. What do you think?
> > > > >> > > > > >>
> > > > >> > > > > >> Thanks,
> > > > >> > > > > >> Dong
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > > >> lucasatucla@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > >>
> > > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > > >> > > > > >> >
> > > > >> > > > > >> > @Dong,
> > > > >> > > > > >> > Since both of the two comments in your previous email
> > are
> > > > >> about
> > > > >> > > the
> > > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > > >> > > > > >> > in light of Jun's last comment, do you agree that
> this
> > > KIP
> > > > >> can
> > > > >> > be
> > > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > > >> > > > > >> > Please let me know, thanks!
> > > > >> > > > > >> >
> > > > >> > > > > >> > Regards,
> > > > >> > > > > >> > Lucas
> > > > >> > > > > >> >
> > > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > > jun@confluent.io>
> > > > >> > wrote:
> > > > >> > > > > >> >
> > > > >> > > > > >> > > Hi, Lucas, Dong,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > If all disks on a broker are slow, one probably
> > should
> > > > just
> > > > >> > kill
> > > > >> > > > the
> > > > >> > > > > >> > > broker. In that case, this KIP may not help. If
> only
> > > one
> > > > of
> > > > >> > the
> > > > >> > > > > disks
> > > > >> > > > > >> on
> > > > >> > > > > >> > a
> > > > >> > > > > >> > > broker is slow, one may want to fail that disk and
> > move
> > > > the
> > > > >> > > > leaders
> > > > >> > > > > on
> > > > >> > > > > >> > that
> > > > >> > > > > >> > > disk to other brokers. In that case, being able to
> > > > process
> > > > >> the
> > > > >> > > > > >> > LeaderAndIsr
> > > > >> > > > > >> > > requests faster will potentially help the producers
> > > > recover
> > > > >> > > > quicker.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Thanks,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Jun
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > > >> lindong28@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > > Hey Lucas,
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
> > below.
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > > partitions
> > > > >> > that
> > > > >> > > > are
> > > > >> > > > > >> > > randomly
> > > > >> > > > > >> > > > distributed across all partitions, then each
> > > > >> ProduceRequest
> > > > >> > > will
> > > > >> > > > > >> likely
> > > > >> > > > > >> > > > cover some partitions for which the broker is
> still
> > > > >> leader
> > > > >> > > after
> > > > >> > > > > it
> > > > >> > > > > >> > > quickly
> > > > >> > > > > >> > > > processes the
> > > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be
> slow
> > > in
> > > > >> > > > processing
> > > > >> > > > > >> these
> > > > >> > > > > >> > > > ProduceRequest and request will still be very
> high
> > > with
> > > > >> this
> > > > >> > > > KIP.
> > > > >> > > > > It
> > > > >> > > > > >> > > seems
> > > > >> > > > > >> > > > that most ProduceRequest will still timeout after
> > 30
> > > > >> > seconds.
> > > > >> > > Is
> > > > >> > > > > >> this
> > > > >> > > > > >> > > > understanding correct?
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
> > > timeout
> > > > >> after
> > > > >> > > 30
> > > > >> > > > > >> > seconds,
> > > > >> > > > > >> > > > then it is less clear how this KIP reduces
> average
> > > > >> produce
> > > > >> > > > > latency.
> > > > >> > > > > >> Can
> > > > >> > > > > >> > > you
> > > > >> > > > > >> > > > clarify what metrics can be improved by this KIP?
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > Not sure why system operator directly cares
> number
> > of
> > > > >> > > truncated
> > > > >> > > > > >> > messages.
> > > > >> > > > > >> > > > Do you mean this KIP can improve average
> throughput
> > > or
> > > > >> > reduce
> > > > >> > > > > >> message
> > > > >> > > > > >> > > > duplication? It will be good to understand this.
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > Thanks,
> > > > >> > > > > >> > > > Dong
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > > >> > > lucasatucla@gmail.com
> > > > >> > > > >
> > > > >> > > > > >> > wrote:
> > > > >> > > > > >> > > >
> > > > >> > > > > >> > > > > Hi Dong,
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > Thanks for your valuable comments. Please see
> my
> > > > reply
> > > > >> > > below.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now
> > > let's
> > > > >> > > consider
> > > > >> > > > a
> > > > >> > > > > >> more
> > > > >> > > > > >> > > > common
> > > > >> > > > > >> > > > > scenario
> > > > >> > > > > >> > > > > where broker0 is the leader of many partitions.
> > And
> > > > >> let's
> > > > >> > > say
> > > > >> > > > > for
> > > > >> > > > > >> > some
> > > > >> > > > > >> > > > > reason its IO becomes slow.
> > > > >> > > > > >> > > > > The number of leader partitions on broker0 is
> so
> > > > large,
> > > > >> > say
> > > > >> > > > 10K,
> > > > >> > > > > >> that
> > > > >> > > > > >> > > the
> > > > >> > > > > >> > > > > cluster is skewed,
> > > > >> > > > > >> > > > > and the operator would like to shift the
> > leadership
> > > > >> for a
> > > > >> > > lot
> > > > >> > > > of
> > > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > >> > > > > >> > > > > either manually or through some service like
> > cruise
> > > > >> > control.
> > > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > > transitions
> > > > >> > > finish
> > > > >> > > > > >> more
> > > > >> > > > > >> > > > > quickly, helping the cluster itself becoming
> more
> > > > >> > balanced,
> > > > >> > > > > >> > > > > but all existing producers corresponding to the
> > 9K
> > > > >> > > partitions
> > > > >> > > > > will
> > > > >> > > > > >> > get
> > > > >> > > > > >> > > > the
> > > > >> > > > > >> > > > > errors relatively quickly
> > > > >> > > > > >> > > > > rather than relying on their timeout, thanks to
> > the
> > > > >> > batched
> > > > >> > > > > async
> > > > >> > > > > >> ZK
> > > > >> > > > > >> > > > > operations.
> > > > >> > > > > >> > > > > To me it's a useful feature to have during such
> > > > >> > troublesome
> > > > >> > > > > times.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > 2. The experiments in the Google Doc have shown
> > > that
> > > > >> with
> > > > >> > > this
> > > > >> > > > > KIP
> > > > >> > > > > >> > many
> > > > >> > > > > >> > > > > producers
> > > > >> > > > > >> > > > > receive an explicit error
> NotLeaderForPartition,
> > > > based
> > > > >> on
> > > > >> > > > which
> > > > >> > > > > >> they
> > > > >> > > > > >> > > > retry
> > > > >> > > > > >> > > > > immediately.
> > > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry)
> > for
> > > > >> their
> > > > >> > > > single
> > > > >> > > > > >> > > message
> > > > >> > > > > >> > > > is
> > > > >> > > > > >> > > > > much smaller
> > > > >> > > > > >> > > > > compared with the case of timing out without
> the
> > > KIP
> > > > >> (30
> > > > >> > > > seconds
> > > > >> > > > > >> for
> > > > >> > > > > >> > > > timing
> > > > >> > > > > >> > > > > out + quick retry).
> > > > >> > > > > >> > > > > One might argue that reducing the timing out on
> > the
> > > > >> > producer
> > > > >> > > > > side
> > > > >> > > > > >> can
> > > > >> > > > > >> > > > > achieve the same result,
> > > > >> > > > > >> > > > > yet reducing the timeout has its own
> > drawbacks[1].
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > Also *IF* there were a metric to show the
> number
> > of
> > > > >> > > truncated
> > > > >> > > > > >> > messages
> > > > >> > > > > >> > > on
> > > > >> > > > > >> > > > > brokers,
> > > > >> > > > > >> > > > > with the experiments done in the Google Doc, it
> > > > should
> > > > >> be
> > > > >> > > easy
> > > > >> > > > > to
> > > > >> > > > > >> see
> > > > >> > > > > >> > > > that
> > > > >> > > > > >> > > > > a lot fewer messages need
> > > > >> > > > > >> > > > > to be truncated on broker0 since the up-to-date
> > > > >> metadata
> > > > >> > > > avoids
> > > > >> > > > > >> > > appending
> > > > >> > > > > >> > > > > of messages
> > > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a
> > > > system
> > > > >> > > > operator
> > > > >> > > > > >> and
> > > > >> > > > > >> > ask
> > > > >> > > > > >> > > > > whether
> > > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most
> likely
> > > the
> > > > >> > answer
> > > > >> > > > is
> > > > >> > > > > >> yes.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > 3. To answer your question, I think it might be
> > > > >> helpful to
> > > > >> > > > > >> construct
> > > > >> > > > > >> > > some
> > > > >> > > > > >> > > > > formulas.
> > > > >> > > > > >> > > > > To simplify the modeling, I'm going back to the
> > > case
> > > > >> where
> > > > >> > > > there
> > > > >> > > > > >> is
> > > > >> > > > > >> > > only
> > > > >> > > > > >> > > > > ONE partition involved.
> > > > >> > > > > >> > > > > Following the experiments in the Google Doc,
> > let's
> > > > say
> > > > >> > > broker0
> > > > >> > > > > >> > becomes
> > > > >> > > > > >> > > > the
> > > > >> > > > > >> > > > > follower at time t0,
> > > > >> > > > > >> > > > > and after t0 there were still N produce
> requests
> > in
> > > > its
> > > > >> > > > request
> > > > >> > > > > >> > queue.
> > > > >> > > > > >> > > > > With the up-to-date metadata brought by this
> KIP,
> > > > >> broker0
> > > > >> > > can
> > > > >> > > > > >> reply
> > > > >> > > > > >> > > with
> > > > >> > > > > >> > > > an
> > > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > > >> > > > > >> > > > > let's use M1 to denote the average processing
> > time
> > > of
> > > > >> > > replying
> > > > >> > > > > >> with
> > > > >> > > > > >> > > such
> > > > >> > > > > >> > > > an
> > > > >> > > > > >> > > > > error message.
> > > > >> > > > > >> > > > > Without this KIP, the broker will need to
> append
> > > > >> messages
> > > > >> > to
> > > > >> > > > > >> > segments,
> > > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > > >> > > > > >> > > > > let's use M2 to denote the average processing
> > time
> > > > for
> > > > >> > such
> > > > >> > > > > logic.
> > > > >> > > > > >> > > > > Then the average extra latency incurred without
> > > this
> > > > >> KIP
> > > > >> > is
> > > > >> > > N
> > > > >> > > > *
> > > > >> > > > > >> (M2 -
> > > > >> > > > > >> > > > M1) /
> > > > >> > > > > >> > > > > 2.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > In practice, M2 should always be larger than
> M1,
> > > > which
> > > > >> > means
> > > > >> > > > as
> > > > >> > > > > >> long
> > > > >> > > > > >> > > as N
> > > > >> > > > > >> > > > > is positive,
> > > > >> > > > > >> > > > > we would see improvements on the average
> latency.
> > > > >> > > > > >> > > > > There does not need to be significant backlog
> of
> > > > >> requests
> > > > >> > in
> > > > >> > > > the
> > > > >> > > > > >> > > request
> > > > >> > > > > >> > > > > queue,
> > > > >> > > > > >> > > > > or severe degradation of disk performance to
> have
> > > the
> > > > >> > > > > improvement.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > Regards,
> > > > >> > > > > >> > > > > Lucas
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> > > > producer
> > > > >> > side
> > > > >> > > > can
> > > > >> > > > > >> > trigger
> > > > >> > > > > >> > > > > unnecessary duplicate requests
> > > > >> > > > > >> > > > > when the corresponding leader broker is
> > overloaded,
> > > > >> > > > exacerbating
> > > > >> > > > > >> the
> > > > >> > > > > >> > > > > situation.
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > > >> > > lindong28@gmail.com
> > > > >> > > > >
> > > > >> > > > > >> > wrote:
> > > > >> > > > > >> > > > >
> > > > >> > > > > >> > > > > > Hey Lucas,
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > Thanks much for the detailed documentation of
> > the
> > > > >> > > > experiment.
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > Initially I also think having a separate
> queue
> > > for
> > > > >> > > > controller
> > > > >> > > > > >> > > requests
> > > > >> > > > > >> > > > is
> > > > >> > > > > >> > > > > > useful because, as you mentioned in the
> summary
> > > > >> section
> > > > >> > of
> > > > >> > > > the
> > > > >> > > > > >> > Google
> > > > >> > > > > >> > > > > doc,
> > > > >> > > > > >> > > > > > controller requests are generally more
> > important
> > > > than
> > > > >> > data
> > > > >> > > > > >> requests
> > > > >> > > > > >> > > and
> > > > >> > > > > >> > > > > we
> > > > >> > > > > >> > > > > > probably want controller requests to be
> > processed
> > > > >> > sooner.
> > > > >> > > > But
> > > > >> > > > > >> then
> > > > >> > > > > >> > > Eno
> > > > >> > > > > >> > > > > has
> > > > >> > > > > >> > > > > > two very good questions which I am not sure
> the
> > > > >> Google
> > > > >> > doc
> > > > >> > > > has
> > > > >> > > > > >> > > answered
> > > > >> > > > > >> > > > > > explicitly. Could you help with the following
> > > > >> questions?
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
> > > benefit
> > > > of
> > > > >> > > > KIP-291
> > > > >> > > > > to
> > > > >> > > > > >> > > users.
> > > > >> > > > > >> > > > > The
> > > > >> > > > > >> > > > > > experiment setup in the Google doc simulates
> > the
> > > > >> > scenario
> > > > >> > > > that
> > > > >> > > > > >> > broker
> > > > >> > > > > >> > > > is
> > > > >> > > > > >> > > > > > very slow handling ProduceRequest due to e.g.
> > > slow
> > > > >> disk.
> > > > >> > > It
> > > > >> > > > > >> > currently
> > > > >> > > > > >> > > > > > assumes that there is only 1 partition. But
> in
> > > the
> > > > >> > common
> > > > >> > > > > >> scenario,
> > > > >> > > > > >> > > it
> > > > >> > > > > >> > > > is
> > > > >> > > > > >> > > > > > probably reasonable to assume that there are
> > many
> > > > >> other
> > > > >> > > > > >> partitions
> > > > >> > > > > >> > > that
> > > > >> > > > > >> > > > > are
> > > > >> > > > > >> > > > > > also actively produced to and ProduceRequest
> to
> > > > these
> > > > >> > > > > partition
> > > > >> > > > > >> > also
> > > > >> > > > > >> > > > > takes
> > > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
> > > broker0
> > > > >> can
> > > > >> > > > become
> > > > >> > > > > >> > > follower
> > > > >> > > > > >> > > > > for
> > > > >> > > > > >> > > > > > the partition 0 soon, it probably still needs
> > to
> > > > >> process
> > > > >> > > the
> > > > >> > > > > >> > > > > ProduceRequest
> > > > >> > > > > >> > > > > > slowly t in the queue because these
> > > ProduceRequests
> > > > >> > cover
> > > > >> > > > > other
> > > > >> > > > > >> > > > > partitions.
> > > > >> > > > > >> > > > > > Thus most ProduceRequest will still timeout
> > after
> > > > 30
> > > > >> > > seconds
> > > > >> > > > > and
> > > > >> > > > > >> > most
> > > > >> > > > > >> > > > > > clients will still likely timeout after 30
> > > seconds.
> > > > >> Then
> > > > >> > > it
> > > > >> > > > is
> > > > >> > > > > >> not
> > > > >> > > > > >> > > > > > obviously what is the benefit to client since
> > > > client
> > > > >> > will
> > > > >> > > > > >> timeout
> > > > >> > > > > >> > > after
> > > > >> > > > > >> > > > > 30
> > > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> > broker1,
> > > > >> with
> > > > >> > or
> > > > >> > > > > >> without
> > > > >> > > > > >> > > > > KIP-291.
> > > > >> > > > > >> > > > > > Did I miss something here?
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> > > > benefits
> > > > >> of
> > > > >> > > this
> > > > >> > > > > >> KIP to
> > > > >> > > > > >> > > > user
> > > > >> > > > > >> > > > > or
> > > > >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> > > > decreases
> > > > >> > > > average
> > > > >> > > > > >> > > latency,
> > > > >> > > > > >> > > > > > 999th percentile latency, probably of
> exception
> > > > >> exposed
> > > > >> > to
> > > > >> > > > > >> client
> > > > >> > > > > >> > > etc.
> > > > >> > > > > >> > > > It
> > > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > 3) Does this KIP help improve user experience
> > > only
> > > > >> when
> > > > >> > > > there
> > > > >> > > > > is
> > > > >> > > > > >> > > issue
> > > > >> > > > > >> > > > > with
> > > > >> > > > > >> > > > > > broker, e.g. significant backlog in the
> request
> > > > queue
> > > > >> > due
> > > > >> > > to
> > > > >> > > > > >> slow
> > > > >> > > > > >> > > disk
> > > > >> > > > > >> > > > as
> > > > >> > > > > >> > > > > > described in the Google doc? Or is this KIP
> > also
> > > > >> useful
> > > > >> > > when
> > > > >> > > > > >> there
> > > > >> > > > > >> > is
> > > > >> > > > > >> > > > no
> > > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> > helpful
> > > > to
> > > > >> > > clarify
> > > > >> > > > > >> this
> > > > >> > > > > >> > to
> > > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > Thanks much,
> > > > >> > > > > >> > > > > > Dong
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > > >> > > > > >> lucasatucla@gmail.com
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > > > wrote:
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > > Hi Eno,
> > > > >> > > > > >> > > > > > >
> > > > >> > > > > >> > > > > > > Sorry for the delay in getting the
> experiment
> > > > >> results.
> > > > >> > > > > >> > > > > > > Here is a link to the positive impact
> > achieved
> > > by
> > > > >> > > > > implementing
> > > > >> > > > > >> > the
> > > > >> > > > > >> > > > > > proposed
> > > > >> > > > > >> > > > > > > change:
> > > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > >> > > > > >> > > > > > > Please take a look when you have time and
> let
> > > me
> > > > >> know
> > > > >> > > your
> > > > >> > > > > >> > > feedback.
> > > > >> > > > > >> > > > > > >
> > > > >> > > > > >> > > > > > > Regards,
> > > > >> > > > > >> > > > > > > Lucas
> > > > >> > > > > >> > > > > > >
> > > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > > >> > > kafka@harsha.io>
> > > > >> > > > > >> wrote:
> > > > >> > > > > >> > > > > > >
> > > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a look
> > > might
> > > > >> suit
> > > > >> > > our
> > > > >> > > > > >> > > > requirements
> > > > >> > > > > >> > > > > > > > better.
> > > > >> > > > > >> > > > > > > >
> > > > >> > > > > >> > > > > > > > Thanks,
> > > > >> > > > > >> > > > > > > > Harsha
> > > > >> > > > > >> > > > > > > >
> > > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas
> > > Wang <
> > > > >> > > > > >> > > > lucasatucla@gmail.com
> > > > >> > > > > >> > > > > >
> > > > >> > > > > >> > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > If I understand correctly, the
> > replication
> > > > >> quota
> > > > >> > > > > mechanism
> > > > >> > > > > >> > > > proposed
> > > > >> > > > > >> > > > > > in
> > > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > Thanks,
> > > > >> > > > > >> > > > > > > > > Lucas
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM,
> Harsha <
> > > > >> > > > > kafka@harsha.io
> > > > >> > > > > >> >
> > > > >> > > > > >> > > > wrote:
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
> > making
> > > > >> this
> > > > >> > > > > >> configurable
> > > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> > requests
> > > > to
> > > > >> be
> > > > >> > > > > >> > prioritized.
> > > > >> > > > > >> > > > For
> > > > >> > > > > >> > > > > > > > example
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we
> take
> > > out
> > > > a
> > > > >> > > broker
> > > > >> > > > > and
> > > > >> > > > > >> > bring
> > > > >> > > > > >> > > > new
> > > > >> > > > > >> > > > > > one
> > > > >> > > > > >> > > > > > > > it
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > will try to become follower and have
> > lot
> > > of
> > > > >> > fetch
> > > > >> > > > > >> requests
> > > > >> > > > > >> > to
> > > > >> > > > > >> > > > > other
> > > > >> > > > > >> > > > > > > > > leaders
> > > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> > effect
> > > > the
> > > > >> > > > > >> > > application/client
> > > > >> > > > > >> > > > > > > > > requests.
> > > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > > solution
> > > > to
> > > > >> > > > > >> de-prioritize
> > > > >> > > > > >> > > if
> > > > >> > > > > >> > > > a
> > > > >> > > > > >> > > > > > new
> > > > >> > > > > >> > > > > > > > > > replica comes in for fetch requests,
> we
> > > are
> > > > >> ok
> > > > >> > > with
> > > > >> > > > > the
> > > > >> > > > > >> > > replica
> > > > >> > > > > >> > > > > to
> > > > >> > > > > >> > > > > > be
> > > > >> > > > > >> > > > > > > > > > taking time but the leaders should
> > > > prioritize
> > > > >> > the
> > > > >> > > > > client
> > > > >> > > > > >> > > > > requests.
> > > > >> > > > > >> > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > Thanks,
> > > > >> > > > > >> > > > > > > > > > Harsha
> > > > >> > > > > >> > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
> > Lucas
> > > > Wang
> > > > >> > > wrote:
> > > > >> > > > > >> > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > >> > > > > >> > > > > > > > > > > - I haven't implemented the feature
> > > yet,
> > > > >> so no
> > > > >> > > > > >> > experimental
> > > > >> > > > > >> > > > > > results
> > > > >> > > > > >> > > > > > > > so
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > far.
> > > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> > > > following
> > > > >> > days.
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > - You are absolutely right that the
> > > > >> priority
> > > > >> > > queue
> > > > >> > > > > >> does
> > > > >> > > > > >> > not
> > > > >> > > > > >> > > > > > > > completely
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > prevent
> > > > >> > > > > >> > > > > > > > > > > data requests being processed ahead
> > of
> > > > >> > > controller
> > > > >> > > > > >> > requests.
> > > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> > greatly
> > > > >> > mitigate
> > > > >> > > > the
> > > > >> > > > > >> > effect
> > > > >> > > > > >> > > > of
> > > > >> > > > > >> > > > > > > stable
> > > > >> > > > > >> > > > > > > > > > > metadata.
> > > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and
> post
> > > the
> > > > >> > > results
> > > > >> > > > > >> when I
> > > > >> > > > > >> > > have
> > > > >> > > > > >> > > > > it.
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > Regards,
> > > > >> > > > > >> > > > > > > > > > > Lucas
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM,
> Eno
> > > > >> Thereska
> > > > >> > <
> > > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > >> > > > > >> > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a
> > look
> > > at
> > > > >> > this.
> > > > >> > > A
> > > > >> > > > > >> couple
> > > > >> > > > > >> > of
> > > > >> > > > > >> > > > > > > > questions:
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> > change
> > > > >> after
> > > > >> > > > > >> implementing
> > > > >> > > > > >> > > > this
> > > > >> > > > > >> > > > > > KIP?
> > > > >> > > > > >> > > > > > > > > I'm
> > > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > > experimental
> > > > >> > results
> > > > >> > > > > that
> > > > >> > > > > >> > show
> > > > >> > > > > >> > > > the
> > > > >> > > > > >> > > > > > > > benefit
> > > > >> > > > > >> > > > > > > > > of
> > > > >> > > > > >> > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > two queues.
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> > sufficient
> > > in
> > > > >> > > > addressing
> > > > >> > > > > >> the
> > > > >> > > > > >> > > > > problem
> > > > >> > > > > >> > > > > > > the
> > > > >> > > > > >> > > > > > > > > KIP
> > > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
> > > queues,
> > > > >> you
> > > > >> > > will
> > > > >> > > > > >> > sometimes
> > > > >> > > > > >> > > > > > > (often?)
> > > > >> > > > > >> > > > > > > > > have
> > > > >> > > > > >> > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > case that data plane requests
> will
> > be
> > > > >> ahead
> > > > >> > of
> > > > >> > > > the
> > > > >> > > > > >> > > control
> > > > >> > > > > >> > > > > > plane
> > > > >> > > > > >> > > > > > > > > > > requests.
> > > > >> > > > > >> > > > > > > > > > > > This happens because the system
> > might
> > > > >> have
> > > > >> > > > already
> > > > >> > > > > >> > > started
> > > > >> > > > > >> > > > > > > > > processing
> > > > >> > > > > >> > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > data plane requests before the
> > > control
> > > > >> plane
> > > > >> > > > ones
> > > > >> > > > > >> > > arrived.
> > > > >> > > > > >> > > > So
> > > > >> > > > > >> > > > > > it
> > > > >> > > > > >> > > > > > > > > would
> > > > >> > > > > >> > > > > > > > > > > be
> > > > >> > > > > >> > > > > > > > > > > > good to know what % of the
> problem
> > > this
> > > > >> KIP
> > > > >> > > > > >> addresses.
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > Thanks
> > > > >> > > > > >> > > > > > > > > > > > Eno
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM,
> > Ted
> > > > Yu <
> > > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > > >> > > > > >> > > > > > >
> > > > >> > > > > >> > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > >> > > > > >> > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > > >> > > > > >> > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42
> AM,
> > > > Lucas
> > > > >> > Wang
> > > > >> > > <
> > > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > >> > > > > >> > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion.
> I've
> > > > >> updated
> > > > >> > > the
> > > > >> > > > > KIP.
> > > > >> > > > > >> > > Please
> > > > >> > > > > >> > > > > > take
> > > > >> > > > > >> > > > > > > > > > another
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > look.
> > > > >> > > > > >> > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > > >> > > > > >> > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34
> > PM,
> > > > Ted
> > > > >> Yu
> > > > >> > <
> > > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> > KafkaConfig.scala
> > > :
> > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you can
> > > > include
> > > > >> > the
> > > > >> > > > > >> default
> > > > >> > > > > >> > > value
> > > > >> > > > > >> > > > > for
> > > > >> > > > > >> > > > > > > > this
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > new
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > config
> > > > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> 4:28
> > > PM,
> > > > >> Lucas
> > > > >> > > > Wang
> > > > >> > > > > <
> > > > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by
> > > adding
> > > > a
> > > > >> new
> > > > >> > > > > config,
> > > > >> > > > > >> > > > instead
> > > > >> > > > > >> > > > > of
> > > > >> > > > > >> > > > > > > > > reusing
> > > > >> > > > > >> > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > >> > > > > >> > > > > > > > > > > > > > > > Please take another look
> > when
> > > > you
> > > > >> > have
> > > > >> > > > > time.
> > > > >> > > > > >> > > > Thanks a
> > > > >> > > > > >> > > > > > > lot!
> > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> > 2:33
> > > > PM,
> > > > >> Ted
> > > > >> > > Yu
> > > > >> > > > <
> > > > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of
> > > > resource
> > > > >> if
> > > > >> > > > > control
> > > > >> > > > > >> > > request
> > > > >> > > > > >> > > > > > rate
> > > > >> > > > > >> > > > > > > is
> > > > >> > > > > >> > > > > > > > > low
> > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if control
> > > > request
> > > > >> > rate
> > > > >> > > > can
> > > > >> > > > > >> get
> > > > >> > > > > >> > to
> > > > >> > > > > >> > > > > > > 100,000,
> > > > >> > > > > >> > > > > > > > > > > likely
> > > > >> > > > > >> > > > > > > > > > > > > not.
> > > > >> > > > > >> > > > > > > > > > > > > > > Then
> > > > >> > > > > >> > > > > > > > > > > > > > > > > using the same bound as
> > > that
> > > > >> for
> > > > >> > > data
> > > > >> > > > > >> > requests
> > > > >> > > > > >> > > > > seems
> > > > >> > > > > >> > > > > > > > high.
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > > 10:13
> > > > >> PM,
> > > > >> > > > Lucas
> > > > >> > > > > >> Wang
> > > > >> > > > > >> > <
> > > > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a
> > look
> > > at
> > > > >> this
> > > > >> > > > KIP.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the
> > > setting
> > > > >> of
> > > > >> > > > > >> > > > > > "queued.max.requests"
> > > > >> > > > > >> > > > > > > in
> > > > >> > > > > >> > > > > > > > > > > > cluster A
> > > > >> > > > > >> > > > > > > > > > > > > > is
> > > > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > while the setting in
> > > > cluster
> > > > >> B
> > > > >> > is
> > > > >> > > > > >> 100,000.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times
> > difference
> > > > >> might
> > > > >> > > have
> > > > >> > > > > >> > indicated
> > > > >> > > > > >> > > > > that
> > > > >> > > > > >> > > > > > > > > machines
> > > > >> > > > > >> > > > > > > > > > > in
> > > > >> > > > > >> > > > > > > > > > > > > > > cluster
> > > > >> > > > > >> > > > > > > > > > > > > > > > B
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > >> > > > "queued.max.requests",
> > > > >> > > > > >> the
> > > > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > >> > > > > >> > > > > > > > > > > > in
> > > > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > >> > > > > >> > > > > > > > > > > > > > > > > B
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity
> > > > without
> > > > >> > > > > explicitly
> > > > >> > > > > >> > > > bothering
> > > > >> > > > > >> > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > operators.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > I understand the
> > counter
> > > > >> > argument
> > > > >> > > > can
> > > > >> > > > > be
> > > > >> > > > > >> > that
> > > > >> > > > > >> > > > > maybe
> > > > >> > > > > >> > > > > > > > > that's
> > > > >> > > > > >> > > > > > > > > > a
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > waste
> > > > >> > > > > >> > > > > > > > > > > > > > of
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > resource if control
> > > request
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and
> > operators
> > > > may
> > > > >> > want
> > > > >> > > > to
> > > > >> > > > > >> fine
> > > > >> > > > > >> > > tune
> > > > >> > > > > >> > > > > the
> > > > >> > > > > >> > > > > > > > > > capacity
> > > > >> > > > > >> > > > > > > > > > > of
> > > > >> > > > > >> > > > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either
> > > > approach,
> > > > >> and
> > > > >> > > can
> > > > >> > > > > >> change
> > > > >> > > > > >> > > it
> > > > >> > > > > >> > > > if
> > > > >> > > > > >> > > > > > you
> > > > >> > > > > >> > > > > > > > or
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > anyone
> > > > >> > > > > >> > > > > > > > > > > > > > else
> > > > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding
> the
> > > > extra
> > > > >> > > > config.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018
> at
> > > > 3:11
> > > > >> PM,
> > > > >> > > Ted
> > > > >> > > > > Yu
> > > > >> > > > > >> <
> > > > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > >> > > > > >> > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> > > > >> Alternatives,
> > > > >> > #2,
> > > > >> > > > can
> > > > >> > > > > >> you
> > > > >> > > > > >> > > > > > elaborate
> > > > >> > > > > >> > > > > > > a
> > > > >> > > > > >> > > > > > > > > bit
> > > > >> > > > > >> > > > > > > > > > > more
> > > > >> > > > > >> > > > > > > > > > > > > on
> > > > >> > > > > >> > > > > > > > > > > > > > > why
> > > > >> > > > > >> > > > > > > > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > separate config has
> > > > bigger
> > > > >> > > impact
> > > > >> > > > ?
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> 2018
> > at
> > > > >> 2:00
> > > > >> > PM,
> > > > >> > > > > Dong
> > > > >> > > > > >> > Lin <
> > > > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > >> > > > > >> > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the
> KIP.
> > > > Looks
> > > > >> > good
> > > > >> > > > > >> overall.
> > > > >> > > > > >> > > > Some
> > > > >> > > > > >> > > > > > > > > comments
> > > > >> > > > > >> > > > > > > > > > > > below:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually
> > specify
> > > > the
> > > > >> > full
> > > > >> > > > > mbean
> > > > >> > > > > >> for
> > > > >> > > > > >> > > the
> > > > >> > > > > >> > > > > new
> > > > >> > > > > >> > > > > > > > > metrics
> > > > >> > > > > >> > > > > > > > > > > in
> > > > >> > > > > >> > > > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the
> > > > Public
> > > > >> > > > Interface
> > > > >> > > > > >> > > section
> > > > >> > > > > >> > > > > > > similar
> > > > >> > > > > >> > > > > > > > > to
> > > > >> > > > > >> > > > > > > > > > > > KIP-237
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > >> https://cwiki.apache.org/
> > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > 237%3A+More+Controller+Health+
> > > > >> > > > > >> Metrics>
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could
> > > follow
> > > > >> the
> > > > >> > > same
> > > > >> > > > > >> > pattern
> > > > >> > > > > >> > > as
> > > > >> > > > > >> > > > > > > KIP-153
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > > >> https://cwiki.apache.org/
> > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > >> > > > > >> > > > > > > > > > > > > metric>,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
> > > > >> existing
> > > > >> > > > sensor
> > > > >> > > > > >> name
> > > > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > >> > > > > >> > > > > > > > > > > and
> > > > >> > > > > >> > > > > > > > > > > > > add
> > > > >> > > > > >> > > > > > > > > > > > > > a
> > > > >> > > > > >> > > > > > > > > > > > > > > > new
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> "ReplicationBytesInPerSec",
> > > > >> > > > rather
> > > > >> > > > > >> than
> > > > >> > > > > >> > > > > > replacing
> > > > >> > > > > >> > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > sensor
> > > > >> > > > > >> > > > > > > > > > > > > > > name "
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec"
> with
> > > > e.g.
> > > > >> > > > > >> > > > > "ClientBytesInPerSec".
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that
> the
> > > KIP
> > > > >> > > changes
> > > > >> > > > > the
> > > > >> > > > > >> > > > semantics
> > > > >> > > > > >> > > > > > of
> > > > >> > > > > >> > > > > > > > the
> > > > >> > > > > >> > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > broker
> > > > >> > > > > >> > > > > > > > > > > > > > > config
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > "queued.max.requests"
> > > > >> > because
> > > > >> > > > the
> > > > >> > > > > >> > number
> > > > >> > > > > >> > > of
> > > > >> > > > > >> > > > > > total
> > > > >> > > > > >> > > > > > > > > > > requests
> > > > >> > > > > >> > > > > > > > > > > > > > queued
> > > > >> > > > > >> > > > > > > > > > > > > > > > in
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no
> > > > longer
> > > > >> > > bounded
> > > > >> > > > > by
> > > > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > >> > > > > >> > > > > > > > > > > > > This
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be
> > specified
> > > > in
> > > > >> the
> > > > >> > > > > Public
> > > > >> > > > > >> > > > > Interfaces
> > > > >> > > > > >> > > > > > > > > section
> > > > >> > > > > >> > > > > > > > > > > for
> > > > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> > 2018
> > > at
> > > > >> > 12:45
> > > > >> > > > PM,
> > > > >> > > > > >> Lucas
> > > > >> > > > > >> > > > Wang
> > > > >> > > > > >> > > > > <
> > > > >> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka
> experts,
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created
> KIP-291
> > > to
> > > > >> add a
> > > > >> > > > > >> separate
> > > > >> > > > > >> > > queue
> > > > >> > > > > >> > > > > for
> > > > >> > > > > >> > > > > > > > > > > controller
> > > > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> https://cwiki.apache.org/
> > > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > 291%
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > 3A+Have+separate+queues+for+
> > > > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > >> > > > > >> > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > requests
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please
> > > take a
> > > > >> look
> > > > >> > > and
> > > > >> > > > > >> let me
> > > > >> > > > > >> > > > know
> > > > >> > > > > >> > > > > > your
> > > > >> > > > > >> > > > > > > > > > > feedback?
> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Joel Koshy <jj...@gmail.com>.

@Mayuresh - I like your idea. It appears to be a simpler less invasive
alternative and it should work. Jun/Becket/others, do you see any pitfalls
with this approach?

On Wed, Jul 18, 2018 at 12:03 PM, Lucas Wang <lu...@gmail.com> wrote:

> @Mayuresh,
> That's a very interesting idea that I haven't thought before.
> It seems to solve our problem at hand pretty well, and also
> avoids the need to have a new size metric and capacity config
> for the controller request queue. In fact, if we were to adopt
> this design, there is no public interface change, and we
> probably don't need a KIP.
> Also implementation wise, it seems
> the java class LinkedBlockingQueue can readily satisfy the requirement
> by supporting a capacity, and also allowing inserting at both ends.
>
> My only concern is that this design is tied to the coincidence that
> we have two request priorities and there are two ends to a deque.
> Hence by using the proposed design, it seems the network layer is
> more tightly coupled with upper layer logic, e.g. if we were to add
> an extra priority level in the future for some reason, we would probably
> need to go back to the design of separate queues, one for each priority
> level.
>
> In summary, I'm ok with both designs and lean toward your suggested
> approach.
> Let's hear what others think.
>
> @Becket,
> In light of Mayuresh's suggested new design, I'm answering your question
> only in the context
> of the current KIP design: I think your suggestion makes sense, and I'm ok
> with removing the capacity config and
> just relying on the default value of 20 being sufficient enough.
>
> Thanks,
> Lucas
>
>
>
>
>
>
>
>
>
>
> On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com
> > wrote:
>
> > Hi Lucas,
> >
> > Seems like the main intent here is to prioritize the controller request
> > over any other requests.
> > In that case, we can change the request queue to a dequeue, where you
> > always insert the normal requests (produce, consume,..etc) to the end of
> > the dequeue, but if its a controller request, you insert it to the head
> of
> > the queue. This ensures that the controller request will be given higher
> > priority over other requests.
> >
> > Also since we only read one request from the socket and mute it and only
> > unmute it after handling the request, this would ensure that we don't
> > handle controller requests out of order.
> >
> > With this approach we can avoid the second queue and the additional
> config
> > for the size of the queue.
> >
> > What do you think ?
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > Hey Joel,
> > >
> > > Thank for the detail explanation. I agree the current design makes
> sense.
> > > My confusion is about whether the new config for the controller queue
> > > capacity is necessary. I cannot think of a case in which users would
> > change
> > > it.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > > > Hi Lucas,
> > > >
> > > > I guess my question can be rephrased to "do we expect user to ever
> > change
> > > > the controller request queue capacity"? If we agree that 20 is
> already
> > a
> > > > very generous default number and we do not expect user to change it,
> is
> > > it
> > > > still necessary to expose this as a config?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > >> @Becket
> > > >> 1. Thanks for the comment. You are right that normally there should
> be
> > > >> just
> > > >> one controller request because of muting,
> > > >> and I had NOT intended to say there would be many enqueued
> controller
> > > >> requests.
> > > >> I went through the KIP again, and I'm not sure which part conveys
> that
> > > >> info.
> > > >> I'd be happy to revise if you point it out the section.
> > > >>
> > > >> 2. Though it should not happen in normal conditions, the current
> > design
> > > >> does not preclude multiple controllers running
> > > >> at the same time, hence if we don't have the controller queue
> capacity
> > > >> config and simply make its capacity to be 1,
> > > >> network threads handling requests from different controllers will be
> > > >> blocked during those troublesome times,
> > > >> which is probably not what we want. On the other hand, adding the
> > extra
> > > >> config with a default value, say 20, guards us from issues in those
> > > >> troublesome times, and IMO there isn't much downside of adding the
> > extra
> > > >> config.
> > > >>
> > > >> @Mayuresh
> > > >> Good catch, this sentence is an obsolete statement based on a
> previous
> > > >> design. I've revised the wording in the KIP.
> > > >>
> > > >> Thanks,
> > > >> Lucas
> > > >>
> > > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > > >> gharatmayuresh15@gmail.com> wrote:
> > > >>
> > > >> > Hi Lucas,
> > > >> >
> > > >> > Thanks for the KIP.
> > > >> > I am trying to understand why you think "The memory consumption
> can
> > > rise
> > > >> > given the total number of queued requests can go up to 2x" in the
> > > impact
> > > >> > section. Normally the requests from controller to a Broker are not
> > > high
> > > >> > volume, right ?
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Mayuresh
> > > >> >
> > > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Thanks for the KIP, Lucas. Separating the control plane from the
> > > data
> > > >> > plane
> > > >> > > makes a lot of sense.
> > > >> > >
> > > >> > > In the KIP you mentioned that the controller request queue may
> > have
> > > >> many
> > > >> > > requests in it. Will this be a common case? The controller
> > requests
> > > >> still
> > > >> > > goes through the SocketServer. The SocketServer will mute the
> > > channel
> > > >> > once
> > > >> > > a request is read and put into the request channel. So assuming
> > > there
> > > >> is
> > > >> > > only one connection between controller and each broker, on the
> > > broker
> > > >> > side,
> > > >> > > there should be only one controller request in the controller
> > > request
> > > >> > queue
> > > >> > > at any given time. If that is the case, do we need a separate
> > > >> controller
> > > >> > > request queue capacity config? The default value 20 means that
> we
> > > >> expect
> > > >> > > there are 20 controller switches to happen in a short period of
> > > time.
> > > >> I
> > > >> > am
> > > >> > > not sure whether someone should increase the controller request
> > > queue
> > > >> > > capacity to handle such case, as it seems indicating something
> > very
> > > >> wrong
> > > >> > > has happened.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jiangjie (Becket) Qin
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Thanks for the update Lucas.
> > > >> > > >
> > > >> > > > I think the motivation section is intuitive. It will be good
> to
> > > >> learn
> > > >> > > more
> > > >> > > > about the comments from other reviewers.
> > > >> > > >
> > > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Hi Dong,
> > > >> > > > >
> > > >> > > > > I've updated the motivation section of the KIP by explaining
> > the
> > > >> > cases
> > > >> > > > that
> > > >> > > > > would have user impacts.
> > > >> > > > > Please take a look at let me know your comments.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Lucas
> > > >> > > > >
> > > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Dong,
> > > >> > > > > >
> > > >> > > > > > The simulation of disk being slow is merely for me to
> easily
> > > >> > > construct
> > > >> > > > a
> > > >> > > > > > testing scenario
> > > >> > > > > > with a backlog of produce requests. In production, other
> > than
> > > >> the
> > > >> > > disk
> > > >> > > > > > being slow, a backlog of
> > > >> > > > > > produce requests may also be caused by high produce QPS.
> > > >> > > > > > In that case, we may not want to kill the broker and
> that's
> > > when
> > > >> > this
> > > >> > > > KIP
> > > >> > > > > > can be useful, both for JBOD
> > > >> > > > > > and non-JBOD setup.
> > > >> > > > > >
> > > >> > > > > > Going back to your previous question about each
> > ProduceRequest
> > > >> > > covering
> > > >> > > > > 20
> > > >> > > > > > partitions that are randomly
> > > >> > > > > > distributed, let's say a LeaderAndIsr request is enqueued
> > that
> > > >> > tries
> > > >> > > to
> > > >> > > > > > switch the current broker, say broker0, from leader to
> > > follower
> > > >> > > > > > *for one of the partitions*, say *test-0*. For the sake of
> > > >> > argument,
> > > >> > > > > > let's also assume the other brokers, say broker1, have
> > > *stopped*
> > > >> > > > fetching
> > > >> > > > > > from
> > > >> > > > > > the current broker, i.e. broker0.
> > > >> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > > >> LeaderAndISR
> > > >> > > will
> > > >> > > > be
> > > >> > > > > > put into the purgatory,
> > > >> > > > > >         and since they'll never be replicated to other
> > brokers
> > > >> > > (because
> > > >> > > > > of
> > > >> > > > > > the assumption made above), they will
> > > >> > > > > >         be completed either when the LeaderAndISR request
> is
> > > >> > > processed
> > > >> > > > or
> > > >> > > > > > when the timeout happens.
> > > >> > > > > >   1.2 With this KIP, broker0 will immediately transition
> the
> > > >> > > partition
> > > >> > > > > > test-0 to become a follower,
> > > >> > > > > >         after the current broker sees the replication of
> the
> > > >> > > remaining
> > > >> > > > 19
> > > >> > > > > > partitions, it can send a response indicating that
> > > >> > > > > >         it's no longer the leader for the "test-0".
> > > >> > > > > >   To see the latency difference between 1.1 and 1.2, let's
> > say
> > > >> > there
> > > >> > > > are
> > > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and there
> > are
> > > 8
> > > >> io
> > > >> > > > > threads,
> > > >> > > > > >   so each io thread will process approximately 3000
> produce
> > > >> > requests.
> > > >> > > > Now
> > > >> > > > > > let's investigate the io thread that finally processed the
> > > >> > > > LeaderAndISR.
> > > >> > > > > >   For the 3000 produce requests, if we model the time when
> > > their
> > > >> > > > > remaining
> > > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > > LeaderAndISR
> > > >> > > > request
> > > >> > > > > is
> > > >> > > > > > processed at time t3000.
> > > >> > > > > >   Without this KIP, the 1st produce request would have
> > waited
> > > an
> > > >> > > extra
> > > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of
> > > >> t3000 -
> > > >> > > t1,
> > > >> > > > > etc.
> > > >> > > > > >   Roughly speaking, the latency difference is bigger for
> the
> > > >> > earlier
> > > >> > > > > > produce requests than for the later ones. For the same
> > reason,
> > > >> the
> > > >> > > more
> > > >> > > > > > ProduceRequests queued
> > > >> > > > > >   before the LeaderAndISR, the bigger benefit we get
> (capped
> > > by
> > > >> the
> > > >> > > > > > produce timeout).
> > > >> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > >> > > > > >   There will be no latency differences in this case, but
> > > >> > > > > >   2.1 without this KIP, the records of partition test-0 in
> > the
> > > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended
> > to
> > > >> the
> > > >> > > local
> > > >> > > > > log,
> > > >> > > > > >         and eventually be truncated after processing the
> > > >> > > LeaderAndISR.
> > > >> > > > > > This is what's referred to as
> > > >> > > > > >         "some unofficial definition of data loss in terms
> of
> > > >> > messages
> > > >> > > > > > beyond the high watermark".
> > > >> > > > > >   2.2 with this KIP, we can mitigate the effect since if
> the
> > > >> > > > LeaderAndISR
> > > >> > > > > > is immediately processed, the response to producers will
> > have
> > > >> > > > > >         the NotLeaderForPartition error, causing producers
> > to
> > > >> retry
> > > >> > > > > >
> > > >> > > > > > This explanation above is the benefit for reducing the
> > latency
> > > >> of a
> > > >> > > > > broker
> > > >> > > > > > becoming the follower,
> > > >> > > > > > closely related is reducing the latency of a broker
> becoming
> > > the
> > > >> > > > leader.
> > > >> > > > > > In this case, the benefit is even more obvious, if other
> > > brokers
> > > >> > have
> > > >> > > > > > resigned leadership, and the
> > > >> > > > > > current broker should take leadership. Any delay in
> > processing
> > > >> the
> > > >> > > > > > LeaderAndISR will be perceived
> > > >> > > > > > by clients as unavailability. In extreme cases, this can
> > cause
> > > >> > failed
> > > >> > > > > > produce requests if the retries are
> > > >> > > > > > exhausted.
> > > >> > > > > >
> > > >> > > > > > Another two types of controller requests are
> UpdateMetadata
> > > and
> > > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > >> > > > > > For UpdateMetadata requests, delayed processing means
> > clients
> > > >> > > receiving
> > > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > > >> > > > > > for certain partitions, and the effect is more retries or
> > even
> > > >> > fatal
> > > >> > > > > > failure if the retries are exhausted.
> > > >> > > > > >
> > > >> > > > > > For StopReplica requests, a long queuing time may degrade
> > the
> > > >> > > > performance
> > > >> > > > > > of topic deletion.
> > > >> > > > > >
> > > >> > > > > > Regarding your last question of the delay for
> > > >> > DescribeLogDirsRequest,
> > > >> > > > you
> > > >> > > > > > are right
> > > >> > > > > > that this KIP cannot help with the latency in getting the
> > log
> > > >> dirs
> > > >> > > > info,
> > > >> > > > > > and it's only relevant
> > > >> > > > > > when controller requests are involved.
> > > >> > > > > >
> > > >> > > > > > Regards,
> > > >> > > > > > Lucas
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> > lindong28@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > > >
> > > >> > > > > >> Hey Jun,
> > > >> > > > > >>
> > > >> > > > > >> Thanks much for the comments. It is good point. So the
> > > feature
> > > >> may
> > > >> > > be
> > > >> > > > > >> useful for JBOD use-case. I have one question below.
> > > >> > > > > >>
> > > >> > > > > >> Hey Lucas,
> > > >> > > > > >>
> > > >> > > > > >> Do you think this feature is also useful for non-JBOD
> setup
> > > or
> > > >> it
> > > >> > is
> > > >> > > > > only
> > > >> > > > > >> useful for the JBOD setup? It may be useful to understand
> > > this.
> > > >> > > > > >>
> > > >> > > > > >> When the broker is setup using JBOD, in order to move
> > leaders
> > > >> on
> > > >> > the
> > > >> > > > > >> failed
> > > >> > > > > >> disk to other disks, the system operator first needs to
> get
> > > the
> > > >> > list
> > > >> > > > of
> > > >> > > > > >> partitions on the failed disk. This is currently achieved
> > > using
> > > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > > >> DescribeLogDirsRequest
> > > >> > to
> > > >> > > > the
> > > >> > > > > >> broker. If we only prioritize the controller requests,
> then
> > > the
> > > >> > > > > >> DescribeLogDirsRequest
> > > >> > > > > >> may still take a long time to be processed by the broker.
> > So
> > > >> the
> > > >> > > > overall
> > > >> > > > > >> time to move leaders away from the failed disk may still
> be
> > > >> long
> > > >> > > even
> > > >> > > > > with
> > > >> > > > > >> this KIP. What do you think?
> > > >> > > > > >>
> > > >> > > > > >> Thanks,
> > > >> > > > > >> Dong
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > > >> lucasatucla@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > >>
> > > >> > > > > >> > Thanks for the insightful comment, Jun.
> > > >> > > > > >> >
> > > >> > > > > >> > @Dong,
> > > >> > > > > >> > Since both of the two comments in your previous email
> are
> > > >> about
> > > >> > > the
> > > >> > > > > >> > benefits of this KIP and whether it's useful,
> > > >> > > > > >> > in light of Jun's last comment, do you agree that this
> > KIP
> > > >> can
> > > >> > be
> > > >> > > > > >> > beneficial in the case mentioned by Jun?
> > > >> > > > > >> > Please let me know, thanks!
> > > >> > > > > >> >
> > > >> > > > > >> > Regards,
> > > >> > > > > >> > Lucas
> > > >> > > > > >> >
> > > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> > jun@confluent.io>
> > > >> > wrote:
> > > >> > > > > >> >
> > > >> > > > > >> > > Hi, Lucas, Dong,
> > > >> > > > > >> > >
> > > >> > > > > >> > > If all disks on a broker are slow, one probably
> should
> > > just
> > > >> > kill
> > > >> > > > the
> > > >> > > > > >> > > broker. In that case, this KIP may not help. If only
> > one
> > > of
> > > >> > the
> > > >> > > > > disks
> > > >> > > > > >> on
> > > >> > > > > >> > a
> > > >> > > > > >> > > broker is slow, one may want to fail that disk and
> move
> > > the
> > > >> > > > leaders
> > > >> > > > > on
> > > >> > > > > >> > that
> > > >> > > > > >> > > disk to other brokers. In that case, being able to
> > > process
> > > >> the
> > > >> > > > > >> > LeaderAndIsr
> > > >> > > > > >> > > requests faster will potentially help the producers
> > > recover
> > > >> > > > quicker.
> > > >> > > > > >> > >
> > > >> > > > > >> > > Thanks,
> > > >> > > > > >> > >
> > > >> > > > > >> > > Jun
> > > >> > > > > >> > >
> > > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > > >> lindong28@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > >> > >
> > > >> > > > > >> > > > Hey Lucas,
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > Thanks for the reply. Some follow up questions
> below.
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > > partitions
> > > >> > that
> > > >> > > > are
> > > >> > > > > >> > > randomly
> > > >> > > > > >> > > > distributed across all partitions, then each
> > > >> ProduceRequest
> > > >> > > will
> > > >> > > > > >> likely
> > > >> > > > > >> > > > cover some partitions for which the broker is still
> > > >> leader
> > > >> > > after
> > > >> > > > > it
> > > >> > > > > >> > > quickly
> > > >> > > > > >> > > > processes the
> > > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow
> > in
> > > >> > > > processing
> > > >> > > > > >> these
> > > >> > > > > >> > > > ProduceRequest and request will still be very high
> > with
> > > >> this
> > > >> > > > KIP.
> > > >> > > > > It
> > > >> > > > > >> > > seems
> > > >> > > > > >> > > > that most ProduceRequest will still timeout after
> 30
> > > >> > seconds.
> > > >> > > Is
> > > >> > > > > >> this
> > > >> > > > > >> > > > understanding correct?
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
> > timeout
> > > >> after
> > > >> > > 30
> > > >> > > > > >> > seconds,
> > > >> > > > > >> > > > then it is less clear how this KIP reduces average
> > > >> produce
> > > >> > > > > latency.
> > > >> > > > > >> Can
> > > >> > > > > >> > > you
> > > >> > > > > >> > > > clarify what metrics can be improved by this KIP?
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > Not sure why system operator directly cares number
> of
> > > >> > > truncated
> > > >> > > > > >> > messages.
> > > >> > > > > >> > > > Do you mean this KIP can improve average throughput
> > or
> > > >> > reduce
> > > >> > > > > >> message
> > > >> > > > > >> > > > duplication? It will be good to understand this.
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > Thanks,
> > > >> > > > > >> > > > Dong
> > > >> > > > > >> > > >
> > > >> > > > > >> > > >
> > > >> > > > > >> > > >
> > > >> > > > > >> > > >
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > >> > > lucasatucla@gmail.com
> > > >> > > > >
> > > >> > > > > >> > wrote:
> > > >> > > > > >> > > >
> > > >> > > > > >> > > > > Hi Dong,
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > Thanks for your valuable comments. Please see my
> > > reply
> > > >> > > below.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now
> > let's
> > > >> > > consider
> > > >> > > > a
> > > >> > > > > >> more
> > > >> > > > > >> > > > common
> > > >> > > > > >> > > > > scenario
> > > >> > > > > >> > > > > where broker0 is the leader of many partitions.
> And
> > > >> let's
> > > >> > > say
> > > >> > > > > for
> > > >> > > > > >> > some
> > > >> > > > > >> > > > > reason its IO becomes slow.
> > > >> > > > > >> > > > > The number of leader partitions on broker0 is so
> > > large,
> > > >> > say
> > > >> > > > 10K,
> > > >> > > > > >> that
> > > >> > > > > >> > > the
> > > >> > > > > >> > > > > cluster is skewed,
> > > >> > > > > >> > > > > and the operator would like to shift the
> leadership
> > > >> for a
> > > >> > > lot
> > > >> > > > of
> > > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > >> > > > > >> > > > > either manually or through some service like
> cruise
> > > >> > control.
> > > >> > > > > >> > > > > With this KIP, not only will the leadership
> > > transitions
> > > >> > > finish
> > > >> > > > > >> more
> > > >> > > > > >> > > > > quickly, helping the cluster itself becoming more
> > > >> > balanced,
> > > >> > > > > >> > > > > but all existing producers corresponding to the
> 9K
> > > >> > > partitions
> > > >> > > > > will
> > > >> > > > > >> > get
> > > >> > > > > >> > > > the
> > > >> > > > > >> > > > > errors relatively quickly
> > > >> > > > > >> > > > > rather than relying on their timeout, thanks to
> the
> > > >> > batched
> > > >> > > > > async
> > > >> > > > > >> ZK
> > > >> > > > > >> > > > > operations.
> > > >> > > > > >> > > > > To me it's a useful feature to have during such
> > > >> > troublesome
> > > >> > > > > times.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > 2. The experiments in the Google Doc have shown
> > that
> > > >> with
> > > >> > > this
> > > >> > > > > KIP
> > > >> > > > > >> > many
> > > >> > > > > >> > > > > producers
> > > >> > > > > >> > > > > receive an explicit error NotLeaderForPartition,
> > > based
> > > >> on
> > > >> > > > which
> > > >> > > > > >> they
> > > >> > > > > >> > > > retry
> > > >> > > > > >> > > > > immediately.
> > > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry)
> for
> > > >> their
> > > >> > > > single
> > > >> > > > > >> > > message
> > > >> > > > > >> > > > is
> > > >> > > > > >> > > > > much smaller
> > > >> > > > > >> > > > > compared with the case of timing out without the
> > KIP
> > > >> (30
> > > >> > > > seconds
> > > >> > > > > >> for
> > > >> > > > > >> > > > timing
> > > >> > > > > >> > > > > out + quick retry).
> > > >> > > > > >> > > > > One might argue that reducing the timing out on
> the
> > > >> > producer
> > > >> > > > > side
> > > >> > > > > >> can
> > > >> > > > > >> > > > > achieve the same result,
> > > >> > > > > >> > > > > yet reducing the timeout has its own
> drawbacks[1].
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > Also *IF* there were a metric to show the number
> of
> > > >> > > truncated
> > > >> > > > > >> > messages
> > > >> > > > > >> > > on
> > > >> > > > > >> > > > > brokers,
> > > >> > > > > >> > > > > with the experiments done in the Google Doc, it
> > > should
> > > >> be
> > > >> > > easy
> > > >> > > > > to
> > > >> > > > > >> see
> > > >> > > > > >> > > > that
> > > >> > > > > >> > > > > a lot fewer messages need
> > > >> > > > > >> > > > > to be truncated on broker0 since the up-to-date
> > > >> metadata
> > > >> > > > avoids
> > > >> > > > > >> > > appending
> > > >> > > > > >> > > > > of messages
> > > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a
> > > system
> > > >> > > > operator
> > > >> > > > > >> and
> > > >> > > > > >> > ask
> > > >> > > > > >> > > > > whether
> > > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely
> > the
> > > >> > answer
> > > >> > > > is
> > > >> > > > > >> yes.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > 3. To answer your question, I think it might be
> > > >> helpful to
> > > >> > > > > >> construct
> > > >> > > > > >> > > some
> > > >> > > > > >> > > > > formulas.
> > > >> > > > > >> > > > > To simplify the modeling, I'm going back to the
> > case
> > > >> where
> > > >> > > > there
> > > >> > > > > >> is
> > > >> > > > > >> > > only
> > > >> > > > > >> > > > > ONE partition involved.
> > > >> > > > > >> > > > > Following the experiments in the Google Doc,
> let's
> > > say
> > > >> > > broker0
> > > >> > > > > >> > becomes
> > > >> > > > > >> > > > the
> > > >> > > > > >> > > > > follower at time t0,
> > > >> > > > > >> > > > > and after t0 there were still N produce requests
> in
> > > its
> > > >> > > > request
> > > >> > > > > >> > queue.
> > > >> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
> > > >> broker0
> > > >> > > can
> > > >> > > > > >> reply
> > > >> > > > > >> > > with
> > > >> > > > > >> > > > an
> > > >> > > > > >> > > > > NotLeaderForPartition exception,
> > > >> > > > > >> > > > > let's use M1 to denote the average processing
> time
> > of
> > > >> > > replying
> > > >> > > > > >> with
> > > >> > > > > >> > > such
> > > >> > > > > >> > > > an
> > > >> > > > > >> > > > > error message.
> > > >> > > > > >> > > > > Without this KIP, the broker will need to append
> > > >> messages
> > > >> > to
> > > >> > > > > >> > segments,
> > > >> > > > > >> > > > > which may trigger a flush to disk,
> > > >> > > > > >> > > > > let's use M2 to denote the average processing
> time
> > > for
> > > >> > such
> > > >> > > > > logic.
> > > >> > > > > >> > > > > Then the average extra latency incurred without
> > this
> > > >> KIP
> > > >> > is
> > > >> > > N
> > > >> > > > *
> > > >> > > > > >> (M2 -
> > > >> > > > > >> > > > M1) /
> > > >> > > > > >> > > > > 2.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > In practice, M2 should always be larger than M1,
> > > which
> > > >> > means
> > > >> > > > as
> > > >> > > > > >> long
> > > >> > > > > >> > > as N
> > > >> > > > > >> > > > > is positive,
> > > >> > > > > >> > > > > we would see improvements on the average latency.
> > > >> > > > > >> > > > > There does not need to be significant backlog of
> > > >> requests
> > > >> > in
> > > >> > > > the
> > > >> > > > > >> > > request
> > > >> > > > > >> > > > > queue,
> > > >> > > > > >> > > > > or severe degradation of disk performance to have
> > the
> > > >> > > > > improvement.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > Regards,
> > > >> > > > > >> > > > > Lucas
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> > > producer
> > > >> > side
> > > >> > > > can
> > > >> > > > > >> > trigger
> > > >> > > > > >> > > > > unnecessary duplicate requests
> > > >> > > > > >> > > > > when the corresponding leader broker is
> overloaded,
> > > >> > > > exacerbating
> > > >> > > > > >> the
> > > >> > > > > >> > > > > situation.
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > >> > > lindong28@gmail.com
> > > >> > > > >
> > > >> > > > > >> > wrote:
> > > >> > > > > >> > > > >
> > > >> > > > > >> > > > > > Hey Lucas,
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > Thanks much for the detailed documentation of
> the
> > > >> > > > experiment.
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > Initially I also think having a separate queue
> > for
> > > >> > > > controller
> > > >> > > > > >> > > requests
> > > >> > > > > >> > > > is
> > > >> > > > > >> > > > > > useful because, as you mentioned in the summary
> > > >> section
> > > >> > of
> > > >> > > > the
> > > >> > > > > >> > Google
> > > >> > > > > >> > > > > doc,
> > > >> > > > > >> > > > > > controller requests are generally more
> important
> > > than
> > > >> > data
> > > >> > > > > >> requests
> > > >> > > > > >> > > and
> > > >> > > > > >> > > > > we
> > > >> > > > > >> > > > > > probably want controller requests to be
> processed
> > > >> > sooner.
> > > >> > > > But
> > > >> > > > > >> then
> > > >> > > > > >> > > Eno
> > > >> > > > > >> > > > > has
> > > >> > > > > >> > > > > > two very good questions which I am not sure the
> > > >> Google
> > > >> > doc
> > > >> > > > has
> > > >> > > > > >> > > answered
> > > >> > > > > >> > > > > > explicitly. Could you help with the following
> > > >> questions?
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > 1) It is not very clear what is the actual
> > benefit
> > > of
> > > >> > > > KIP-291
> > > >> > > > > to
> > > >> > > > > >> > > users.
> > > >> > > > > >> > > > > The
> > > >> > > > > >> > > > > > experiment setup in the Google doc simulates
> the
> > > >> > scenario
> > > >> > > > that
> > > >> > > > > >> > broker
> > > >> > > > > >> > > > is
> > > >> > > > > >> > > > > > very slow handling ProduceRequest due to e.g.
> > slow
> > > >> disk.
> > > >> > > It
> > > >> > > > > >> > currently
> > > >> > > > > >> > > > > > assumes that there is only 1 partition. But in
> > the
> > > >> > common
> > > >> > > > > >> scenario,
> > > >> > > > > >> > > it
> > > >> > > > > >> > > > is
> > > >> > > > > >> > > > > > probably reasonable to assume that there are
> many
> > > >> other
> > > >> > > > > >> partitions
> > > >> > > > > >> > > that
> > > >> > > > > >> > > > > are
> > > >> > > > > >> > > > > > also actively produced to and ProduceRequest to
> > > these
> > > >> > > > > partition
> > > >> > > > > >> > also
> > > >> > > > > >> > > > > takes
> > > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
> > broker0
> > > >> can
> > > >> > > > become
> > > >> > > > > >> > > follower
> > > >> > > > > >> > > > > for
> > > >> > > > > >> > > > > > the partition 0 soon, it probably still needs
> to
> > > >> process
> > > >> > > the
> > > >> > > > > >> > > > > ProduceRequest
> > > >> > > > > >> > > > > > slowly t in the queue because these
> > ProduceRequests
> > > >> > cover
> > > >> > > > > other
> > > >> > > > > >> > > > > partitions.
> > > >> > > > > >> > > > > > Thus most ProduceRequest will still timeout
> after
> > > 30
> > > >> > > seconds
> > > >> > > > > and
> > > >> > > > > >> > most
> > > >> > > > > >> > > > > > clients will still likely timeout after 30
> > seconds.
> > > >> Then
> > > >> > > it
> > > >> > > > is
> > > >> > > > > >> not
> > > >> > > > > >> > > > > > obviously what is the benefit to client since
> > > client
> > > >> > will
> > > >> > > > > >> timeout
> > > >> > > > > >> > > after
> > > >> > > > > >> > > > > 30
> > > >> > > > > >> > > > > > seconds before possibly re-connecting to
> broker1,
> > > >> with
> > > >> > or
> > > >> > > > > >> without
> > > >> > > > > >> > > > > KIP-291.
> > > >> > > > > >> > > > > > Did I miss something here?
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> > > benefits
> > > >> of
> > > >> > > this
> > > >> > > > > >> KIP to
> > > >> > > > > >> > > > user
> > > >> > > > > >> > > > > or
> > > >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> > > decreases
> > > >> > > > average
> > > >> > > > > >> > > latency,
> > > >> > > > > >> > > > > > 999th percentile latency, probably of exception
> > > >> exposed
> > > >> > to
> > > >> > > > > >> client
> > > >> > > > > >> > > etc.
> > > >> > > > > >> > > > It
> > > >> > > > > >> > > > > > is probably useful to clarify this.
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > 3) Does this KIP help improve user experience
> > only
> > > >> when
> > > >> > > > there
> > > >> > > > > is
> > > >> > > > > >> > > issue
> > > >> > > > > >> > > > > with
> > > >> > > > > >> > > > > > broker, e.g. significant backlog in the request
> > > queue
> > > >> > due
> > > >> > > to
> > > >> > > > > >> slow
> > > >> > > > > >> > > disk
> > > >> > > > > >> > > > as
> > > >> > > > > >> > > > > > described in the Google doc? Or is this KIP
> also
> > > >> useful
> > > >> > > when
> > > >> > > > > >> there
> > > >> > > > > >> > is
> > > >> > > > > >> > > > no
> > > >> > > > > >> > > > > > ongoing issue in the cluster? It might be
> helpful
> > > to
> > > >> > > clarify
> > > >> > > > > >> this
> > > >> > > > > >> > to
> > > >> > > > > >> > > > > > understand the benefit of this KIP.
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > Thanks much,
> > > >> > > > > >> > > > > > Dong
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > >> > > > > >> lucasatucla@gmail.com
> > > >> > > > > >> > >
> > > >> > > > > >> > > > > wrote:
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > > Hi Eno,
> > > >> > > > > >> > > > > > >
> > > >> > > > > >> > > > > > > Sorry for the delay in getting the experiment
> > > >> results.
> > > >> > > > > >> > > > > > > Here is a link to the positive impact
> achieved
> > by
> > > >> > > > > implementing
> > > >> > > > > >> > the
> > > >> > > > > >> > > > > > proposed
> > > >> > > > > >> > > > > > > change:
> > > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > >> > > > > >> > > > > > > Please take a look when you have time and let
> > me
> > > >> know
> > > >> > > your
> > > >> > > > > >> > > feedback.
> > > >> > > > > >> > > > > > >
> > > >> > > > > >> > > > > > > Regards,
> > > >> > > > > >> > > > > > > Lucas
> > > >> > > > > >> > > > > > >
> > > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > >> > > kafka@harsha.io>
> > > >> > > > > >> wrote:
> > > >> > > > > >> > > > > > >
> > > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a look
> > might
> > > >> suit
> > > >> > > our
> > > >> > > > > >> > > > requirements
> > > >> > > > > >> > > > > > > > better.
> > > >> > > > > >> > > > > > > >
> > > >> > > > > >> > > > > > > > Thanks,
> > > >> > > > > >> > > > > > > > Harsha
> > > >> > > > > >> > > > > > > >
> > > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas
> > Wang <
> > > >> > > > > >> > > > lucasatucla@gmail.com
> > > >> > > > > >> > > > > >
> > > >> > > > > >> > > > > > > > wrote:
> > > >> > > > > >> > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > Hi Harsha,
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > If I understand correctly, the
> replication
> > > >> quota
> > > >> > > > > mechanism
> > > >> > > > > >> > > > proposed
> > > >> > > > > >> > > > > > in
> > > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > >> > > > > >> > > > > > > > > Have you tried it out?
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > Thanks,
> > > >> > > > > >> > > > > > > > > Lucas
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > >> > > > > kafka@harsha.io
> > > >> > > > > >> >
> > > >> > > > > >> > > > wrote:
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > Hi Lucas,
> > > >> > > > > >> > > > > > > > > > One more question, any thoughts on
> making
> > > >> this
> > > >> > > > > >> configurable
> > > >> > > > > >> > > > > > > > > > and also allowing subset of data
> requests
> > > to
> > > >> be
> > > >> > > > > >> > prioritized.
> > > >> > > > > >> > > > For
> > > >> > > > > >> > > > > > > > example
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we take
> > out
> > > a
> > > >> > > broker
> > > >> > > > > and
> > > >> > > > > >> > bring
> > > >> > > > > >> > > > new
> > > >> > > > > >> > > > > > one
> > > >> > > > > >> > > > > > > > it
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > will try to become follower and have
> lot
> > of
> > > >> > fetch
> > > >> > > > > >> requests
> > > >> > > > > >> > to
> > > >> > > > > >> > > > > other
> > > >> > > > > >> > > > > > > > > leaders
> > > >> > > > > >> > > > > > > > > > in clusters. This will negatively
> effect
> > > the
> > > >> > > > > >> > > application/client
> > > >> > > > > >> > > > > > > > > requests.
> > > >> > > > > >> > > > > > > > > > We are also exploring the similar
> > solution
> > > to
> > > >> > > > > >> de-prioritize
> > > >> > > > > >> > > if
> > > >> > > > > >> > > > a
> > > >> > > > > >> > > > > > new
> > > >> > > > > >> > > > > > > > > > replica comes in for fetch requests, we
> > are
> > > >> ok
> > > >> > > with
> > > >> > > > > the
> > > >> > > > > >> > > replica
> > > >> > > > > >> > > > > to
> > > >> > > > > >> > > > > > be
> > > >> > > > > >> > > > > > > > > > taking time but the leaders should
> > > prioritize
> > > >> > the
> > > >> > > > > client
> > > >> > > > > >> > > > > requests.
> > > >> > > > > >> > > > > > > > > >
> > > >> > > > > >> > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > Thanks,
> > > >> > > > > >> > > > > > > > > > Harsha
> > > >> > > > > >> > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM
> Lucas
> > > Wang
> > > >> > > wrote:
> > > >> > > > > >> > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > Hi Eno,
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > >> > > > > >> > > > > > > > > > > - I haven't implemented the feature
> > yet,
> > > >> so no
> > > >> > > > > >> > experimental
> > > >> > > > > >> > > > > > results
> > > >> > > > > >> > > > > > > > so
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > far.
> > > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> > > following
> > > >> > days.
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > - You are absolutely right that the
> > > >> priority
> > > >> > > queue
> > > >> > > > > >> does
> > > >> > > > > >> > not
> > > >> > > > > >> > > > > > > > completely
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > prevent
> > > >> > > > > >> > > > > > > > > > > data requests being processed ahead
> of
> > > >> > > controller
> > > >> > > > > >> > requests.
> > > >> > > > > >> > > > > > > > > > > That being said, I expect it to
> greatly
> > > >> > mitigate
> > > >> > > > the
> > > >> > > > > >> > effect
> > > >> > > > > >> > > > of
> > > >> > > > > >> > > > > > > stable
> > > >> > > > > >> > > > > > > > > > > metadata.
> > > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and post
> > the
> > > >> > > results
> > > >> > > > > >> when I
> > > >> > > > > >> > > have
> > > >> > > > > >> > > > > it.
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > Regards,
> > > >> > > > > >> > > > > > > > > > > Lucas
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
> > > >> Thereska
> > > >> > <
> > > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > >> > > > > >> > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a
> look
> > at
> > > >> > this.
> > > >> > > A
> > > >> > > > > >> couple
> > > >> > > > > >> > of
> > > >> > > > > >> > > > > > > > questions:
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > - did you notice any positive
> change
> > > >> after
> > > >> > > > > >> implementing
> > > >> > > > > >> > > > this
> > > >> > > > > >> > > > > > KIP?
> > > >> > > > > >> > > > > > > > > I'm
> > > >> > > > > >> > > > > > > > > > > > wondering if you have any
> > experimental
> > > >> > results
> > > >> > > > > that
> > > >> > > > > >> > show
> > > >> > > > > >> > > > the
> > > >> > > > > >> > > > > > > > benefit
> > > >> > > > > >> > > > > > > > > of
> > > >> > > > > >> > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > two queues.
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > - priority is usually not
> sufficient
> > in
> > > >> > > > addressing
> > > >> > > > > >> the
> > > >> > > > > >> > > > > problem
> > > >> > > > > >> > > > > > > the
> > > >> > > > > >> > > > > > > > > KIP
> > > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
> > queues,
> > > >> you
> > > >> > > will
> > > >> > > > > >> > sometimes
> > > >> > > > > >> > > > > > > (often?)
> > > >> > > > > >> > > > > > > > > have
> > > >> > > > > >> > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > case that data plane requests will
> be
> > > >> ahead
> > > >> > of
> > > >> > > > the
> > > >> > > > > >> > > control
> > > >> > > > > >> > > > > > plane
> > > >> > > > > >> > > > > > > > > > > requests.
> > > >> > > > > >> > > > > > > > > > > > This happens because the system
> might
> > > >> have
> > > >> > > > already
> > > >> > > > > >> > > started
> > > >> > > > > >> > > > > > > > > processing
> > > >> > > > > >> > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > data plane requests before the
> > control
> > > >> plane
> > > >> > > > ones
> > > >> > > > > >> > > arrived.
> > > >> > > > > >> > > > So
> > > >> > > > > >> > > > > > it
> > > >> > > > > >> > > > > > > > > would
> > > >> > > > > >> > > > > > > > > > > be
> > > >> > > > > >> > > > > > > > > > > > good to know what % of the problem
> > this
> > > >> KIP
> > > >> > > > > >> addresses.
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > Thanks
> > > >> > > > > >> > > > > > > > > > > > Eno
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM,
> Ted
> > > Yu <
> > > >> > > > > >> > > > > yuzhihong@gmail.com
> > > >> > > > > >> > > > > > >
> > > >> > > > > >> > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > > >> > > > > >> > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > Thanks
> > > >> > > > > >> > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM,
> > > Lucas
> > > >> > Wang
> > > >> > > <
> > > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
> > > >> updated
> > > >> > > the
> > > >> > > > > KIP.
> > > >> > > > > >> > > Please
> > > >> > > > > >> > > > > > take
> > > >> > > > > >> > > > > > > > > > another
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > look.
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > Lucas
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34
> PM,
> > > Ted
> > > >> Yu
> > > >> > <
> > > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > Currently in
> KafkaConfig.scala
> > :
> > > >> > > > > >> > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > >> > > > > >> > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > It would be good if you can
> > > include
> > > >> > the
> > > >> > > > > >> default
> > > >> > > > > >> > > value
> > > >> > > > > >> > > > > for
> > > >> > > > > >> > > > > > > > this
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > new
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > config
> > > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > >> > > > > >> > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > > >> > > > > >> > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28
> > PM,
> > > >> Lucas
> > > >> > > > Wang
> > > >> > > > > <
> > > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > >> > > > > >> > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by
> > adding
> > > a
> > > >> new
> > > >> > > > > config,
> > > >> > > > > >> > > > instead
> > > >> > > > > >> > > > > of
> > > >> > > > > >> > > > > > > > > reusing
> > > >> > > > > >> > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > >> > > > > >> > > > > > > > > > > > > > > > Please take another look
> when
> > > you
> > > >> > have
> > > >> > > > > time.
> > > >> > > > > >> > > > Thanks a
> > > >> > > > > >> > > > > > > lot!
> > > >> > > > > >> > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > >> > > > > >> > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at
> 2:33
> > > PM,
> > > >> Ted
> > > >> > > Yu
> > > >> > > > <
> > > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of
> > > resource
> > > >> if
> > > >> > > > > control
> > > >> > > > > >> > > request
> > > >> > > > > >> > > > > > rate
> > > >> > > > > >> > > > > > > is
> > > >> > > > > >> > > > > > > > > low
> > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if control
> > > request
> > > >> > rate
> > > >> > > > can
> > > >> > > > > >> get
> > > >> > > > > >> > to
> > > >> > > > > >> > > > > > > 100,000,
> > > >> > > > > >> > > > > > > > > > > likely
> > > >> > > > > >> > > > > > > > > > > > > not.
> > > >> > > > > >> > > > > > > > > > > > > > > Then
> > > >> > > > > >> > > > > > > > > > > > > > > > > using the same bound as
> > that
> > > >> for
> > > >> > > data
> > > >> > > > > >> > requests
> > > >> > > > > >> > > > > seems
> > > >> > > > > >> > > > > > > > high.
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > 10:13
> > > >> PM,
> > > >> > > > Lucas
> > > >> > > > > >> Wang
> > > >> > > > > >> > <
> > > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a
> look
> > at
> > > >> this
> > > >> > > > KIP.
> > > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the
> > setting
> > > >> of
> > > >> > > > > >> > > > > > "queued.max.requests"
> > > >> > > > > >> > > > > > > in
> > > >> > > > > >> > > > > > > > > > > > cluster A
> > > >> > > > > >> > > > > > > > > > > > > > is
> > > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > while the setting in
> > > cluster
> > > >> B
> > > >> > is
> > > >> > > > > >> 100,000.
> > > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times
> difference
> > > >> might
> > > >> > > have
> > > >> > > > > >> > indicated
> > > >> > > > > >> > > > > that
> > > >> > > > > >> > > > > > > > > machines
> > > >> > > > > >> > > > > > > > > > > in
> > > >> > > > > >> > > > > > > > > > > > > > > cluster
> > > >> > > > > >> > > > > > > > > > > > > > > > B
> > > >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > >> > > > "queued.max.requests",
> > > >> > > > > >> the
> > > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > > >> > > > > >> > > > > > > > > > > > in
> > > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > > >> > > > > >> > > > > > > > > > > > > > > > > B
> > > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity
> > > without
> > > >> > > > > explicitly
> > > >> > > > > >> > > > bothering
> > > >> > > > > >> > > > > > the
> > > >> > > > > >> > > > > > > > > > > > operators.
> > > >> > > > > >> > > > > > > > > > > > > > > > > > I understand the
> counter
> > > >> > argument
> > > >> > > > can
> > > >> > > > > be
> > > >> > > > > >> > that
> > > >> > > > > >> > > > > maybe
> > > >> > > > > >> > > > > > > > > that's
> > > >> > > > > >> > > > > > > > > > a
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > waste
> > > >> > > > > >> > > > > > > > > > > > > > of
> > > >> > > > > >> > > > > > > > > > > > > > > > > > resource if control
> > request
> > > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and
> operators
> > > may
> > > >> > want
> > > >> > > > to
> > > >> > > > > >> fine
> > > >> > > > > >> > > tune
> > > >> > > > > >> > > > > the
> > > >> > > > > >> > > > > > > > > > capacity
> > > >> > > > > >> > > > > > > > > > > of
> > > >> > > > > >> > > > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either
> > > approach,
> > > >> and
> > > >> > > can
> > > >> > > > > >> change
> > > >> > > > > >> > > it
> > > >> > > > > >> > > > if
> > > >> > > > > >> > > > > > you
> > > >> > > > > >> > > > > > > > or
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > anyone
> > > >> > > > > >> > > > > > > > > > > > > > else
> > > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > > >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the
> > > extra
> > > >> > > > config.
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > > 3:11
> > > >> PM,
> > > >> > > Ted
> > > >> > > > > Yu
> > > >> > > > > >> <
> > > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > >> > > > > >> > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> > > >> Alternatives,
> > > >> > #2,
> > > >> > > > can
> > > >> > > > > >> you
> > > >> > > > > >> > > > > > elaborate
> > > >> > > > > >> > > > > > > a
> > > >> > > > > >> > > > > > > > > bit
> > > >> > > > > >> > > > > > > > > > > more
> > > >> > > > > >> > > > > > > > > > > > > on
> > > >> > > > > >> > > > > > > > > > > > > > > why
> > > >> > > > > >> > > > > > > > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > separate config has
> > > bigger
> > > >> > > impact
> > > >> > > > ?
> > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018
> at
> > > >> 2:00
> > > >> > PM,
> > > >> > > > > Dong
> > > >> > > > > >> > Lin <
> > > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP.
> > > Looks
> > > >> > good
> > > >> > > > > >> overall.
> > > >> > > > > >> > > > Some
> > > >> > > > > >> > > > > > > > > comments
> > > >> > > > > >> > > > > > > > > > > > below:
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually
> specify
> > > the
> > > >> > full
> > > >> > > > > mbean
> > > >> > > > > >> for
> > > >> > > > > >> > > the
> > > >> > > > > >> > > > > new
> > > >> > > > > >> > > > > > > > > metrics
> > > >> > > > > >> > > > > > > > > > > in
> > > >> > > > > >> > > > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the
> > > Public
> > > >> > > > Interface
> > > >> > > > > >> > > section
> > > >> > > > > >> > > > > > > similar
> > > >> > > > > >> > > > > > > > > to
> > > >> > > > > >> > > > > > > > > > > > KIP-237
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > >> https://cwiki.apache.org/
> > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > 237%3A+More+Controller+Health+
> > > >> > > > > >> Metrics>
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could
> > follow
> > > >> the
> > > >> > > same
> > > >> > > > > >> > pattern
> > > >> > > > > >> > > as
> > > >> > > > > >> > > > > > > KIP-153
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > > >> https://cwiki.apache.org/
> > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > >
> > > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > >> > > > > >> > > > > > > > > > > > > metric>,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
> > > >> existing
> > > >> > > > sensor
> > > >> > > > > >> name
> > > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > > >> > > > > >> > > > > > > > > > > and
> > > >> > > > > >> > > > > > > > > > > > > add
> > > >> > > > > >> > > > > > > > > > > > > > a
> > > >> > > > > >> > > > > > > > > > > > > > > > new
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> "ReplicationBytesInPerSec",
> > > >> > > > rather
> > > >> > > > > >> than
> > > >> > > > > >> > > > > > replacing
> > > >> > > > > >> > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > sensor
> > > >> > > > > >> > > > > > > > > > > > > > > name "
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with
> > > e.g.
> > > >> > > > > >> > > > > "ClientBytesInPerSec".
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the
> > KIP
> > > >> > > changes
> > > >> > > > > the
> > > >> > > > > >> > > > semantics
> > > >> > > > > >> > > > > > of
> > > >> > > > > >> > > > > > > > the
> > > >> > > > > >> > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > broker
> > > >> > > > > >> > > > > > > > > > > > > > > config
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> "queued.max.requests"
> > > >> > because
> > > >> > > > the
> > > >> > > > > >> > number
> > > >> > > > > >> > > of
> > > >> > > > > >> > > > > > total
> > > >> > > > > >> > > > > > > > > > > requests
> > > >> > > > > >> > > > > > > > > > > > > > queued
> > > >> > > > > >> > > > > > > > > > > > > > > > in
> > > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no
> > > longer
> > > >> > > bounded
> > > >> > > > > by
> > > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > >> > > > > >> > > > > > > > > > > > > This
> > > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be
> specified
> > > in
> > > >> the
> > > >> > > > > Public
> > > >> > > > > >> > > > > Interfaces
> > > >> > > > > >> > > > > > > > > section
> > > >> > > > > >> > > > > > > > > > > for
> > > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13,
> 2018
> > at
> > > >> > 12:45
> > > >> > > > PM,
> > > >> > > > > >> Lucas
> > > >> > > > > >> > > > Wang
> > > >> > > > > >> > > > > <
> > > >> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291
> > to
> > > >> add a
> > > >> > > > > >> separate
> > > >> > > > > >> > > queue
> > > >> > > > > >> > > > > for
> > > >> > > > > >> > > > > > > > > > > controller
> > > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> https://cwiki.apache.org/
> > > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > 291%
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > 3A+Have+separate+queues+for+
> > > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > > >> > > > > >> > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > requests
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please
> > take a
> > > >> look
> > > >> > > and
> > > >> > > > > >> let me
> > > >> > > > > >> > > > know
> > > >> > > > > >> > > > > > your
> > > >> > > > > >> > > > > > > > > > > feedback?
> > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Mayuresh,
That's a very interesting idea that I haven't thought before.
It seems to solve our problem at hand pretty well, and also
avoids the need to have a new size metric and capacity config
for the controller request queue. In fact, if we were to adopt
this design, there is no public interface change, and we
probably don't need a KIP.
Also implementation wise, it seems
the java class LinkedBlockingQueue can readily satisfy the requirement
by supporting a capacity, and also allowing inserting at both ends.

My only concern is that this design is tied to the coincidence that
we have two request priorities and there are two ends to a deque.
Hence by using the proposed design, it seems the network layer is
more tightly coupled with upper layer logic, e.g. if we were to add
an extra priority level in the future for some reason, we would probably
need to go back to the design of separate queues, one for each priority
level.

In summary, I'm ok with both designs and lean toward your suggested
approach.
Let's hear what others think.

@Becket,
In light of Mayuresh's suggested new design, I'm answering your question
only in the context
of the current KIP design: I think your suggestion makes sense, and I'm ok
with removing the capacity config and
just relying on the default value of 20 being sufficient enough.

Thanks,
Lucas










On Wed, Jul 18, 2018 at 9:57 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com
> wrote:

> Hi Lucas,
>
> Seems like the main intent here is to prioritize the controller request
> over any other requests.
> In that case, we can change the request queue to a dequeue, where you
> always insert the normal requests (produce, consume,..etc) to the end of
> the dequeue, but if its a controller request, you insert it to the head of
> the queue. This ensures that the controller request will be given higher
> priority over other requests.
>
> Also since we only read one request from the socket and mute it and only
> unmute it after handling the request, this would ensure that we don't
> handle controller requests out of order.
>
> With this approach we can avoid the second queue and the additional config
> for the size of the queue.
>
> What do you think ?
>
> Thanks,
>
> Mayuresh
>
>
> On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com> wrote:
>
> > Hey Joel,
> >
> > Thank for the detail explanation. I agree the current design makes sense.
> > My confusion is about whether the new config for the controller queue
> > capacity is necessary. I cannot think of a case in which users would
> change
> > it.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Hi Lucas,
> > >
> > > I guess my question can be rephrased to "do we expect user to ever
> change
> > > the controller request queue capacity"? If we agree that 20 is already
> a
> > > very generous default number and we do not expect user to change it, is
> > it
> > > still necessary to expose this as a config?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > >> @Becket
> > >> 1. Thanks for the comment. You are right that normally there should be
> > >> just
> > >> one controller request because of muting,
> > >> and I had NOT intended to say there would be many enqueued controller
> > >> requests.
> > >> I went through the KIP again, and I'm not sure which part conveys that
> > >> info.
> > >> I'd be happy to revise if you point it out the section.
> > >>
> > >> 2. Though it should not happen in normal conditions, the current
> design
> > >> does not preclude multiple controllers running
> > >> at the same time, hence if we don't have the controller queue capacity
> > >> config and simply make its capacity to be 1,
> > >> network threads handling requests from different controllers will be
> > >> blocked during those troublesome times,
> > >> which is probably not what we want. On the other hand, adding the
> extra
> > >> config with a default value, say 20, guards us from issues in those
> > >> troublesome times, and IMO there isn't much downside of adding the
> extra
> > >> config.
> > >>
> > >> @Mayuresh
> > >> Good catch, this sentence is an obsolete statement based on a previous
> > >> design. I've revised the wording in the KIP.
> > >>
> > >> Thanks,
> > >> Lucas
> > >>
> > >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> > >> gharatmayuresh15@gmail.com> wrote:
> > >>
> > >> > Hi Lucas,
> > >> >
> > >> > Thanks for the KIP.
> > >> > I am trying to understand why you think "The memory consumption can
> > rise
> > >> > given the total number of queued requests can go up to 2x" in the
> > impact
> > >> > section. Normally the requests from controller to a Broker are not
> > high
> > >> > volume, right ?
> > >> >
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Mayuresh
> > >> >
> > >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Thanks for the KIP, Lucas. Separating the control plane from the
> > data
> > >> > plane
> > >> > > makes a lot of sense.
> > >> > >
> > >> > > In the KIP you mentioned that the controller request queue may
> have
> > >> many
> > >> > > requests in it. Will this be a common case? The controller
> requests
> > >> still
> > >> > > goes through the SocketServer. The SocketServer will mute the
> > channel
> > >> > once
> > >> > > a request is read and put into the request channel. So assuming
> > there
> > >> is
> > >> > > only one connection between controller and each broker, on the
> > broker
> > >> > side,
> > >> > > there should be only one controller request in the controller
> > request
> > >> > queue
> > >> > > at any given time. If that is the case, do we need a separate
> > >> controller
> > >> > > request queue capacity config? The default value 20 means that we
> > >> expect
> > >> > > there are 20 controller switches to happen in a short period of
> > time.
> > >> I
> > >> > am
> > >> > > not sure whether someone should increase the controller request
> > queue
> > >> > > capacity to handle such case, as it seems indicating something
> very
> > >> wrong
> > >> > > has happened.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jiangjie (Becket) Qin
> > >> > >
> > >> > >
> > >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Thanks for the update Lucas.
> > >> > > >
> > >> > > > I think the motivation section is intuitive. It will be good to
> > >> learn
> > >> > > more
> > >> > > > about the comments from other reviewers.
> > >> > > >
> > >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Dong,
> > >> > > > >
> > >> > > > > I've updated the motivation section of the KIP by explaining
> the
> > >> > cases
> > >> > > > that
> > >> > > > > would have user impacts.
> > >> > > > > Please take a look at let me know your comments.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Lucas
> > >> > > > >
> > >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> > lucasatucla@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Dong,
> > >> > > > > >
> > >> > > > > > The simulation of disk being slow is merely for me to easily
> > >> > > construct
> > >> > > > a
> > >> > > > > > testing scenario
> > >> > > > > > with a backlog of produce requests. In production, other
> than
> > >> the
> > >> > > disk
> > >> > > > > > being slow, a backlog of
> > >> > > > > > produce requests may also be caused by high produce QPS.
> > >> > > > > > In that case, we may not want to kill the broker and that's
> > when
> > >> > this
> > >> > > > KIP
> > >> > > > > > can be useful, both for JBOD
> > >> > > > > > and non-JBOD setup.
> > >> > > > > >
> > >> > > > > > Going back to your previous question about each
> ProduceRequest
> > >> > > covering
> > >> > > > > 20
> > >> > > > > > partitions that are randomly
> > >> > > > > > distributed, let's say a LeaderAndIsr request is enqueued
> that
> > >> > tries
> > >> > > to
> > >> > > > > > switch the current broker, say broker0, from leader to
> > follower
> > >> > > > > > *for one of the partitions*, say *test-0*. For the sake of
> > >> > argument,
> > >> > > > > > let's also assume the other brokers, say broker1, have
> > *stopped*
> > >> > > > fetching
> > >> > > > > > from
> > >> > > > > > the current broker, i.e. broker0.
> > >> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> > >> LeaderAndISR
> > >> > > will
> > >> > > > be
> > >> > > > > > put into the purgatory,
> > >> > > > > >         and since they'll never be replicated to other
> brokers
> > >> > > (because
> > >> > > > > of
> > >> > > > > > the assumption made above), they will
> > >> > > > > >         be completed either when the LeaderAndISR request is
> > >> > > processed
> > >> > > > or
> > >> > > > > > when the timeout happens.
> > >> > > > > >   1.2 With this KIP, broker0 will immediately transition the
> > >> > > partition
> > >> > > > > > test-0 to become a follower,
> > >> > > > > >         after the current broker sees the replication of the
> > >> > > remaining
> > >> > > > 19
> > >> > > > > > partitions, it can send a response indicating that
> > >> > > > > >         it's no longer the leader for the "test-0".
> > >> > > > > >   To see the latency difference between 1.1 and 1.2, let's
> say
> > >> > there
> > >> > > > are
> > >> > > > > > 24K produce requests ahead of the LeaderAndISR, and there
> are
> > 8
> > >> io
> > >> > > > > threads,
> > >> > > > > >   so each io thread will process approximately 3000 produce
> > >> > requests.
> > >> > > > Now
> > >> > > > > > let's investigate the io thread that finally processed the
> > >> > > > LeaderAndISR.
> > >> > > > > >   For the 3000 produce requests, if we model the time when
> > their
> > >> > > > > remaining
> > >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> > LeaderAndISR
> > >> > > > request
> > >> > > > > is
> > >> > > > > > processed at time t3000.
> > >> > > > > >   Without this KIP, the 1st produce request would have
> waited
> > an
> > >> > > extra
> > >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of
> > >> t3000 -
> > >> > > t1,
> > >> > > > > etc.
> > >> > > > > >   Roughly speaking, the latency difference is bigger for the
> > >> > earlier
> > >> > > > > > produce requests than for the later ones. For the same
> reason,
> > >> the
> > >> > > more
> > >> > > > > > ProduceRequests queued
> > >> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped
> > by
> > >> the
> > >> > > > > > produce timeout).
> > >> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > >> > > > > >   There will be no latency differences in this case, but
> > >> > > > > >   2.1 without this KIP, the records of partition test-0 in
> the
> > >> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended
> to
> > >> the
> > >> > > local
> > >> > > > > log,
> > >> > > > > >         and eventually be truncated after processing the
> > >> > > LeaderAndISR.
> > >> > > > > > This is what's referred to as
> > >> > > > > >         "some unofficial definition of data loss in terms of
> > >> > messages
> > >> > > > > > beyond the high watermark".
> > >> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
> > >> > > > LeaderAndISR
> > >> > > > > > is immediately processed, the response to producers will
> have
> > >> > > > > >         the NotLeaderForPartition error, causing producers
> to
> > >> retry
> > >> > > > > >
> > >> > > > > > This explanation above is the benefit for reducing the
> latency
> > >> of a
> > >> > > > > broker
> > >> > > > > > becoming the follower,
> > >> > > > > > closely related is reducing the latency of a broker becoming
> > the
> > >> > > > leader.
> > >> > > > > > In this case, the benefit is even more obvious, if other
> > brokers
> > >> > have
> > >> > > > > > resigned leadership, and the
> > >> > > > > > current broker should take leadership. Any delay in
> processing
> > >> the
> > >> > > > > > LeaderAndISR will be perceived
> > >> > > > > > by clients as unavailability. In extreme cases, this can
> cause
> > >> > failed
> > >> > > > > > produce requests if the retries are
> > >> > > > > > exhausted.
> > >> > > > > >
> > >> > > > > > Another two types of controller requests are UpdateMetadata
> > and
> > >> > > > > > StopReplica, which I'll briefly discuss as follows:
> > >> > > > > > For UpdateMetadata requests, delayed processing means
> clients
> > >> > > receiving
> > >> > > > > > stale metadata, e.g. with the wrong leadership info
> > >> > > > > > for certain partitions, and the effect is more retries or
> even
> > >> > fatal
> > >> > > > > > failure if the retries are exhausted.
> > >> > > > > >
> > >> > > > > > For StopReplica requests, a long queuing time may degrade
> the
> > >> > > > performance
> > >> > > > > > of topic deletion.
> > >> > > > > >
> > >> > > > > > Regarding your last question of the delay for
> > >> > DescribeLogDirsRequest,
> > >> > > > you
> > >> > > > > > are right
> > >> > > > > > that this KIP cannot help with the latency in getting the
> log
> > >> dirs
> > >> > > > info,
> > >> > > > > > and it's only relevant
> > >> > > > > > when controller requests are involved.
> > >> > > > > >
> > >> > > > > > Regards,
> > >> > > > > > Lucas
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > >> Hey Jun,
> > >> > > > > >>
> > >> > > > > >> Thanks much for the comments. It is good point. So the
> > feature
> > >> may
> > >> > > be
> > >> > > > > >> useful for JBOD use-case. I have one question below.
> > >> > > > > >>
> > >> > > > > >> Hey Lucas,
> > >> > > > > >>
> > >> > > > > >> Do you think this feature is also useful for non-JBOD setup
> > or
> > >> it
> > >> > is
> > >> > > > > only
> > >> > > > > >> useful for the JBOD setup? It may be useful to understand
> > this.
> > >> > > > > >>
> > >> > > > > >> When the broker is setup using JBOD, in order to move
> leaders
> > >> on
> > >> > the
> > >> > > > > >> failed
> > >> > > > > >> disk to other disks, the system operator first needs to get
> > the
> > >> > list
> > >> > > > of
> > >> > > > > >> partitions on the failed disk. This is currently achieved
> > using
> > >> > > > > >> AdminClient.describeLogDirs(), which sends
> > >> DescribeLogDirsRequest
> > >> > to
> > >> > > > the
> > >> > > > > >> broker. If we only prioritize the controller requests, then
> > the
> > >> > > > > >> DescribeLogDirsRequest
> > >> > > > > >> may still take a long time to be processed by the broker.
> So
> > >> the
> > >> > > > overall
> > >> > > > > >> time to move leaders away from the failed disk may still be
> > >> long
> > >> > > even
> > >> > > > > with
> > >> > > > > >> this KIP. What do you think?
> > >> > > > > >>
> > >> > > > > >> Thanks,
> > >> > > > > >> Dong
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> > >> lucasatucla@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >>
> > >> > > > > >> > Thanks for the insightful comment, Jun.
> > >> > > > > >> >
> > >> > > > > >> > @Dong,
> > >> > > > > >> > Since both of the two comments in your previous email are
> > >> about
> > >> > > the
> > >> > > > > >> > benefits of this KIP and whether it's useful,
> > >> > > > > >> > in light of Jun's last comment, do you agree that this
> KIP
> > >> can
> > >> > be
> > >> > > > > >> > beneficial in the case mentioned by Jun?
> > >> > > > > >> > Please let me know, thanks!
> > >> > > > > >> >
> > >> > > > > >> > Regards,
> > >> > > > > >> > Lucas
> > >> > > > > >> >
> > >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <
> jun@confluent.io>
> > >> > wrote:
> > >> > > > > >> >
> > >> > > > > >> > > Hi, Lucas, Dong,
> > >> > > > > >> > >
> > >> > > > > >> > > If all disks on a broker are slow, one probably should
> > just
> > >> > kill
> > >> > > > the
> > >> > > > > >> > > broker. In that case, this KIP may not help. If only
> one
> > of
> > >> > the
> > >> > > > > disks
> > >> > > > > >> on
> > >> > > > > >> > a
> > >> > > > > >> > > broker is slow, one may want to fail that disk and move
> > the
> > >> > > > leaders
> > >> > > > > on
> > >> > > > > >> > that
> > >> > > > > >> > > disk to other brokers. In that case, being able to
> > process
> > >> the
> > >> > > > > >> > LeaderAndIsr
> > >> > > > > >> > > requests faster will potentially help the producers
> > recover
> > >> > > > quicker.
> > >> > > > > >> > >
> > >> > > > > >> > > Thanks,
> > >> > > > > >> > >
> > >> > > > > >> > > Jun
> > >> > > > > >> > >
> > >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> > >> lindong28@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >> > >
> > >> > > > > >> > > > Hey Lucas,
> > >> > > > > >> > > >
> > >> > > > > >> > > > Thanks for the reply. Some follow up questions below.
> > >> > > > > >> > > >
> > >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> > partitions
> > >> > that
> > >> > > > are
> > >> > > > > >> > > randomly
> > >> > > > > >> > > > distributed across all partitions, then each
> > >> ProduceRequest
> > >> > > will
> > >> > > > > >> likely
> > >> > > > > >> > > > cover some partitions for which the broker is still
> > >> leader
> > >> > > after
> > >> > > > > it
> > >> > > > > >> > > quickly
> > >> > > > > >> > > > processes the
> > >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow
> in
> > >> > > > processing
> > >> > > > > >> these
> > >> > > > > >> > > > ProduceRequest and request will still be very high
> with
> > >> this
> > >> > > > KIP.
> > >> > > > > It
> > >> > > > > >> > > seems
> > >> > > > > >> > > > that most ProduceRequest will still timeout after 30
> > >> > seconds.
> > >> > > Is
> > >> > > > > >> this
> > >> > > > > >> > > > understanding correct?
> > >> > > > > >> > > >
> > >> > > > > >> > > > Regarding 2, if most ProduceRequest will still
> timeout
> > >> after
> > >> > > 30
> > >> > > > > >> > seconds,
> > >> > > > > >> > > > then it is less clear how this KIP reduces average
> > >> produce
> > >> > > > > latency.
> > >> > > > > >> Can
> > >> > > > > >> > > you
> > >> > > > > >> > > > clarify what metrics can be improved by this KIP?
> > >> > > > > >> > > >
> > >> > > > > >> > > > Not sure why system operator directly cares number of
> > >> > > truncated
> > >> > > > > >> > messages.
> > >> > > > > >> > > > Do you mean this KIP can improve average throughput
> or
> > >> > reduce
> > >> > > > > >> message
> > >> > > > > >> > > > duplication? It will be good to understand this.
> > >> > > > > >> > > >
> > >> > > > > >> > > > Thanks,
> > >> > > > > >> > > > Dong
> > >> > > > > >> > > >
> > >> > > > > >> > > >
> > >> > > > > >> > > >
> > >> > > > > >> > > >
> > >> > > > > >> > > >
> > >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > >> > > lucasatucla@gmail.com
> > >> > > > >
> > >> > > > > >> > wrote:
> > >> > > > > >> > > >
> > >> > > > > >> > > > > Hi Dong,
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > Thanks for your valuable comments. Please see my
> > reply
> > >> > > below.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now
> let's
> > >> > > consider
> > >> > > > a
> > >> > > > > >> more
> > >> > > > > >> > > > common
> > >> > > > > >> > > > > scenario
> > >> > > > > >> > > > > where broker0 is the leader of many partitions. And
> > >> let's
> > >> > > say
> > >> > > > > for
> > >> > > > > >> > some
> > >> > > > > >> > > > > reason its IO becomes slow.
> > >> > > > > >> > > > > The number of leader partitions on broker0 is so
> > large,
> > >> > say
> > >> > > > 10K,
> > >> > > > > >> that
> > >> > > > > >> > > the
> > >> > > > > >> > > > > cluster is skewed,
> > >> > > > > >> > > > > and the operator would like to shift the leadership
> > >> for a
> > >> > > lot
> > >> > > > of
> > >> > > > > >> > > > > partitions, say 9K, to other brokers,
> > >> > > > > >> > > > > either manually or through some service like cruise
> > >> > control.
> > >> > > > > >> > > > > With this KIP, not only will the leadership
> > transitions
> > >> > > finish
> > >> > > > > >> more
> > >> > > > > >> > > > > quickly, helping the cluster itself becoming more
> > >> > balanced,
> > >> > > > > >> > > > > but all existing producers corresponding to the 9K
> > >> > > partitions
> > >> > > > > will
> > >> > > > > >> > get
> > >> > > > > >> > > > the
> > >> > > > > >> > > > > errors relatively quickly
> > >> > > > > >> > > > > rather than relying on their timeout, thanks to the
> > >> > batched
> > >> > > > > async
> > >> > > > > >> ZK
> > >> > > > > >> > > > > operations.
> > >> > > > > >> > > > > To me it's a useful feature to have during such
> > >> > troublesome
> > >> > > > > times.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > 2. The experiments in the Google Doc have shown
> that
> > >> with
> > >> > > this
> > >> > > > > KIP
> > >> > > > > >> > many
> > >> > > > > >> > > > > producers
> > >> > > > > >> > > > > receive an explicit error NotLeaderForPartition,
> > based
> > >> on
> > >> > > > which
> > >> > > > > >> they
> > >> > > > > >> > > > retry
> > >> > > > > >> > > > > immediately.
> > >> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
> > >> their
> > >> > > > single
> > >> > > > > >> > > message
> > >> > > > > >> > > > is
> > >> > > > > >> > > > > much smaller
> > >> > > > > >> > > > > compared with the case of timing out without the
> KIP
> > >> (30
> > >> > > > seconds
> > >> > > > > >> for
> > >> > > > > >> > > > timing
> > >> > > > > >> > > > > out + quick retry).
> > >> > > > > >> > > > > One might argue that reducing the timing out on the
> > >> > producer
> > >> > > > > side
> > >> > > > > >> can
> > >> > > > > >> > > > > achieve the same result,
> > >> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > Also *IF* there were a metric to show the number of
> > >> > > truncated
> > >> > > > > >> > messages
> > >> > > > > >> > > on
> > >> > > > > >> > > > > brokers,
> > >> > > > > >> > > > > with the experiments done in the Google Doc, it
> > should
> > >> be
> > >> > > easy
> > >> > > > > to
> > >> > > > > >> see
> > >> > > > > >> > > > that
> > >> > > > > >> > > > > a lot fewer messages need
> > >> > > > > >> > > > > to be truncated on broker0 since the up-to-date
> > >> metadata
> > >> > > > avoids
> > >> > > > > >> > > appending
> > >> > > > > >> > > > > of messages
> > >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a
> > system
> > >> > > > operator
> > >> > > > > >> and
> > >> > > > > >> > ask
> > >> > > > > >> > > > > whether
> > >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely
> the
> > >> > answer
> > >> > > > is
> > >> > > > > >> yes.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > 3. To answer your question, I think it might be
> > >> helpful to
> > >> > > > > >> construct
> > >> > > > > >> > > some
> > >> > > > > >> > > > > formulas.
> > >> > > > > >> > > > > To simplify the modeling, I'm going back to the
> case
> > >> where
> > >> > > > there
> > >> > > > > >> is
> > >> > > > > >> > > only
> > >> > > > > >> > > > > ONE partition involved.
> > >> > > > > >> > > > > Following the experiments in the Google Doc, let's
> > say
> > >> > > broker0
> > >> > > > > >> > becomes
> > >> > > > > >> > > > the
> > >> > > > > >> > > > > follower at time t0,
> > >> > > > > >> > > > > and after t0 there were still N produce requests in
> > its
> > >> > > > request
> > >> > > > > >> > queue.
> > >> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
> > >> broker0
> > >> > > can
> > >> > > > > >> reply
> > >> > > > > >> > > with
> > >> > > > > >> > > > an
> > >> > > > > >> > > > > NotLeaderForPartition exception,
> > >> > > > > >> > > > > let's use M1 to denote the average processing time
> of
> > >> > > replying
> > >> > > > > >> with
> > >> > > > > >> > > such
> > >> > > > > >> > > > an
> > >> > > > > >> > > > > error message.
> > >> > > > > >> > > > > Without this KIP, the broker will need to append
> > >> messages
> > >> > to
> > >> > > > > >> > segments,
> > >> > > > > >> > > > > which may trigger a flush to disk,
> > >> > > > > >> > > > > let's use M2 to denote the average processing time
> > for
> > >> > such
> > >> > > > > logic.
> > >> > > > > >> > > > > Then the average extra latency incurred without
> this
> > >> KIP
> > >> > is
> > >> > > N
> > >> > > > *
> > >> > > > > >> (M2 -
> > >> > > > > >> > > > M1) /
> > >> > > > > >> > > > > 2.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > In practice, M2 should always be larger than M1,
> > which
> > >> > means
> > >> > > > as
> > >> > > > > >> long
> > >> > > > > >> > > as N
> > >> > > > > >> > > > > is positive,
> > >> > > > > >> > > > > we would see improvements on the average latency.
> > >> > > > > >> > > > > There does not need to be significant backlog of
> > >> requests
> > >> > in
> > >> > > > the
> > >> > > > > >> > > request
> > >> > > > > >> > > > > queue,
> > >> > > > > >> > > > > or severe degradation of disk performance to have
> the
> > >> > > > > improvement.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > Regards,
> > >> > > > > >> > > > > Lucas
> > >> > > > > >> > > > >
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> > producer
> > >> > side
> > >> > > > can
> > >> > > > > >> > trigger
> > >> > > > > >> > > > > unnecessary duplicate requests
> > >> > > > > >> > > > > when the corresponding leader broker is overloaded,
> > >> > > > exacerbating
> > >> > > > > >> the
> > >> > > > > >> > > > > situation.
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > >> > > lindong28@gmail.com
> > >> > > > >
> > >> > > > > >> > wrote:
> > >> > > > > >> > > > >
> > >> > > > > >> > > > > > Hey Lucas,
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > Thanks much for the detailed documentation of the
> > >> > > > experiment.
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > Initially I also think having a separate queue
> for
> > >> > > > controller
> > >> > > > > >> > > requests
> > >> > > > > >> > > > is
> > >> > > > > >> > > > > > useful because, as you mentioned in the summary
> > >> section
> > >> > of
> > >> > > > the
> > >> > > > > >> > Google
> > >> > > > > >> > > > > doc,
> > >> > > > > >> > > > > > controller requests are generally more important
> > than
> > >> > data
> > >> > > > > >> requests
> > >> > > > > >> > > and
> > >> > > > > >> > > > > we
> > >> > > > > >> > > > > > probably want controller requests to be processed
> > >> > sooner.
> > >> > > > But
> > >> > > > > >> then
> > >> > > > > >> > > Eno
> > >> > > > > >> > > > > has
> > >> > > > > >> > > > > > two very good questions which I am not sure the
> > >> Google
> > >> > doc
> > >> > > > has
> > >> > > > > >> > > answered
> > >> > > > > >> > > > > > explicitly. Could you help with the following
> > >> questions?
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > 1) It is not very clear what is the actual
> benefit
> > of
> > >> > > > KIP-291
> > >> > > > > to
> > >> > > > > >> > > users.
> > >> > > > > >> > > > > The
> > >> > > > > >> > > > > > experiment setup in the Google doc simulates the
> > >> > scenario
> > >> > > > that
> > >> > > > > >> > broker
> > >> > > > > >> > > > is
> > >> > > > > >> > > > > > very slow handling ProduceRequest due to e.g.
> slow
> > >> disk.
> > >> > > It
> > >> > > > > >> > currently
> > >> > > > > >> > > > > > assumes that there is only 1 partition. But in
> the
> > >> > common
> > >> > > > > >> scenario,
> > >> > > > > >> > > it
> > >> > > > > >> > > > is
> > >> > > > > >> > > > > > probably reasonable to assume that there are many
> > >> other
> > >> > > > > >> partitions
> > >> > > > > >> > > that
> > >> > > > > >> > > > > are
> > >> > > > > >> > > > > > also actively produced to and ProduceRequest to
> > these
> > >> > > > > partition
> > >> > > > > >> > also
> > >> > > > > >> > > > > takes
> > >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if
> broker0
> > >> can
> > >> > > > become
> > >> > > > > >> > > follower
> > >> > > > > >> > > > > for
> > >> > > > > >> > > > > > the partition 0 soon, it probably still needs to
> > >> process
> > >> > > the
> > >> > > > > >> > > > > ProduceRequest
> > >> > > > > >> > > > > > slowly t in the queue because these
> ProduceRequests
> > >> > cover
> > >> > > > > other
> > >> > > > > >> > > > > partitions.
> > >> > > > > >> > > > > > Thus most ProduceRequest will still timeout after
> > 30
> > >> > > seconds
> > >> > > > > and
> > >> > > > > >> > most
> > >> > > > > >> > > > > > clients will still likely timeout after 30
> seconds.
> > >> Then
> > >> > > it
> > >> > > > is
> > >> > > > > >> not
> > >> > > > > >> > > > > > obviously what is the benefit to client since
> > client
> > >> > will
> > >> > > > > >> timeout
> > >> > > > > >> > > after
> > >> > > > > >> > > > > 30
> > >> > > > > >> > > > > > seconds before possibly re-connecting to broker1,
> > >> with
> > >> > or
> > >> > > > > >> without
> > >> > > > > >> > > > > KIP-291.
> > >> > > > > >> > > > > > Did I miss something here?
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> > benefits
> > >> of
> > >> > > this
> > >> > > > > >> KIP to
> > >> > > > > >> > > > user
> > >> > > > > >> > > > > or
> > >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> > decreases
> > >> > > > average
> > >> > > > > >> > > latency,
> > >> > > > > >> > > > > > 999th percentile latency, probably of exception
> > >> exposed
> > >> > to
> > >> > > > > >> client
> > >> > > > > >> > > etc.
> > >> > > > > >> > > > It
> > >> > > > > >> > > > > > is probably useful to clarify this.
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > 3) Does this KIP help improve user experience
> only
> > >> when
> > >> > > > there
> > >> > > > > is
> > >> > > > > >> > > issue
> > >> > > > > >> > > > > with
> > >> > > > > >> > > > > > broker, e.g. significant backlog in the request
> > queue
> > >> > due
> > >> > > to
> > >> > > > > >> slow
> > >> > > > > >> > > disk
> > >> > > > > >> > > > as
> > >> > > > > >> > > > > > described in the Google doc? Or is this KIP also
> > >> useful
> > >> > > when
> > >> > > > > >> there
> > >> > > > > >> > is
> > >> > > > > >> > > > no
> > >> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful
> > to
> > >> > > clarify
> > >> > > > > >> this
> > >> > > > > >> > to
> > >> > > > > >> > > > > > understand the benefit of this KIP.
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > Thanks much,
> > >> > > > > >> > > > > > Dong
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > >> > > > > >> lucasatucla@gmail.com
> > >> > > > > >> > >
> > >> > > > > >> > > > > wrote:
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > > Hi Eno,
> > >> > > > > >> > > > > > >
> > >> > > > > >> > > > > > > Sorry for the delay in getting the experiment
> > >> results.
> > >> > > > > >> > > > > > > Here is a link to the positive impact achieved
> by
> > >> > > > > implementing
> > >> > > > > >> > the
> > >> > > > > >> > > > > > proposed
> > >> > > > > >> > > > > > > change:
> > >> > > > > >> > > > > > > https://docs.google.com/document/d/
> > >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > >> > > > > >> > > > > > > Please take a look when you have time and let
> me
> > >> know
> > >> > > your
> > >> > > > > >> > > feedback.
> > >> > > > > >> > > > > > >
> > >> > > > > >> > > > > > > Regards,
> > >> > > > > >> > > > > > > Lucas
> > >> > > > > >> > > > > > >
> > >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > >> > > kafka@harsha.io>
> > >> > > > > >> wrote:
> > >> > > > > >> > > > > > >
> > >> > > > > >> > > > > > > > Thanks for the pointer. Will take a look
> might
> > >> suit
> > >> > > our
> > >> > > > > >> > > > requirements
> > >> > > > > >> > > > > > > > better.
> > >> > > > > >> > > > > > > >
> > >> > > > > >> > > > > > > > Thanks,
> > >> > > > > >> > > > > > > > Harsha
> > >> > > > > >> > > > > > > >
> > >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas
> Wang <
> > >> > > > > >> > > > lucasatucla@gmail.com
> > >> > > > > >> > > > > >
> > >> > > > > >> > > > > > > > wrote:
> > >> > > > > >> > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > Hi Harsha,
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > If I understand correctly, the replication
> > >> quota
> > >> > > > > mechanism
> > >> > > > > >> > > > proposed
> > >> > > > > >> > > > > > in
> > >> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > >> > > > > >> > > > > > > > > Have you tried it out?
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > Thanks,
> > >> > > > > >> > > > > > > > > Lucas
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > >> > > > > kafka@harsha.io
> > >> > > > > >> >
> > >> > > > > >> > > > wrote:
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > Hi Lucas,
> > >> > > > > >> > > > > > > > > > One more question, any thoughts on making
> > >> this
> > >> > > > > >> configurable
> > >> > > > > >> > > > > > > > > > and also allowing subset of data requests
> > to
> > >> be
> > >> > > > > >> > prioritized.
> > >> > > > > >> > > > For
> > >> > > > > >> > > > > > > > example
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > ,we notice in our cluster when we take
> out
> > a
> > >> > > broker
> > >> > > > > and
> > >> > > > > >> > bring
> > >> > > > > >> > > > new
> > >> > > > > >> > > > > > one
> > >> > > > > >> > > > > > > > it
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > will try to become follower and have lot
> of
> > >> > fetch
> > >> > > > > >> requests
> > >> > > > > >> > to
> > >> > > > > >> > > > > other
> > >> > > > > >> > > > > > > > > leaders
> > >> > > > > >> > > > > > > > > > in clusters. This will negatively effect
> > the
> > >> > > > > >> > > application/client
> > >> > > > > >> > > > > > > > > requests.
> > >> > > > > >> > > > > > > > > > We are also exploring the similar
> solution
> > to
> > >> > > > > >> de-prioritize
> > >> > > > > >> > > if
> > >> > > > > >> > > > a
> > >> > > > > >> > > > > > new
> > >> > > > > >> > > > > > > > > > replica comes in for fetch requests, we
> are
> > >> ok
> > >> > > with
> > >> > > > > the
> > >> > > > > >> > > replica
> > >> > > > > >> > > > > to
> > >> > > > > >> > > > > > be
> > >> > > > > >> > > > > > > > > > taking time but the leaders should
> > prioritize
> > >> > the
> > >> > > > > client
> > >> > > > > >> > > > > requests.
> > >> > > > > >> > > > > > > > > >
> > >> > > > > >> > > > > > > > > >
> > >> > > > > >> > > > > > > > > > Thanks,
> > >> > > > > >> > > > > > > > > > Harsha
> > >> > > > > >> > > > > > > > > >
> > >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas
> > Wang
> > >> > > wrote:
> > >> > > > > >> > > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > Hi Eno,
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > >> > > > > >> > > > > > > > > > > - I haven't implemented the feature
> yet,
> > >> so no
> > >> > > > > >> > experimental
> > >> > > > > >> > > > > > results
> > >> > > > > >> > > > > > > > so
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > far.
> > >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> > following
> > >> > days.
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > - You are absolutely right that the
> > >> priority
> > >> > > queue
> > >> > > > > >> does
> > >> > > > > >> > not
> > >> > > > > >> > > > > > > > completely
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > prevent
> > >> > > > > >> > > > > > > > > > > data requests being processed ahead of
> > >> > > controller
> > >> > > > > >> > requests.
> > >> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
> > >> > mitigate
> > >> > > > the
> > >> > > > > >> > effect
> > >> > > > > >> > > > of
> > >> > > > > >> > > > > > > stable
> > >> > > > > >> > > > > > > > > > > metadata.
> > >> > > > > >> > > > > > > > > > > In any case, I'll try it out and post
> the
> > >> > > results
> > >> > > > > >> when I
> > >> > > > > >> > > have
> > >> > > > > >> > > > > it.
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > Regards,
> > >> > > > > >> > > > > > > > > > > Lucas
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
> > >> Thereska
> > >> > <
> > >> > > > > >> > > > > > > > eno.thereska@gmail.com
> > >> > > > > >> > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > Hi Lucas,
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look
> at
> > >> > this.
> > >> > > A
> > >> > > > > >> couple
> > >> > > > > >> > of
> > >> > > > > >> > > > > > > > questions:
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > - did you notice any positive change
> > >> after
> > >> > > > > >> implementing
> > >> > > > > >> > > > this
> > >> > > > > >> > > > > > KIP?
> > >> > > > > >> > > > > > > > > I'm
> > >> > > > > >> > > > > > > > > > > > wondering if you have any
> experimental
> > >> > results
> > >> > > > > that
> > >> > > > > >> > show
> > >> > > > > >> > > > the
> > >> > > > > >> > > > > > > > benefit
> > >> > > > > >> > > > > > > > > of
> > >> > > > > >> > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > two queues.
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > - priority is usually not sufficient
> in
> > >> > > > addressing
> > >> > > > > >> the
> > >> > > > > >> > > > > problem
> > >> > > > > >> > > > > > > the
> > >> > > > > >> > > > > > > > > KIP
> > >> > > > > >> > > > > > > > > > > > identifies. Even with priority
> queues,
> > >> you
> > >> > > will
> > >> > > > > >> > sometimes
> > >> > > > > >> > > > > > > (often?)
> > >> > > > > >> > > > > > > > > have
> > >> > > > > >> > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > case that data plane requests will be
> > >> ahead
> > >> > of
> > >> > > > the
> > >> > > > > >> > > control
> > >> > > > > >> > > > > > plane
> > >> > > > > >> > > > > > > > > > > requests.
> > >> > > > > >> > > > > > > > > > > > This happens because the system might
> > >> have
> > >> > > > already
> > >> > > > > >> > > started
> > >> > > > > >> > > > > > > > > processing
> > >> > > > > >> > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > data plane requests before the
> control
> > >> plane
> > >> > > > ones
> > >> > > > > >> > > arrived.
> > >> > > > > >> > > > So
> > >> > > > > >> > > > > > it
> > >> > > > > >> > > > > > > > > would
> > >> > > > > >> > > > > > > > > > > be
> > >> > > > > >> > > > > > > > > > > > good to know what % of the problem
> this
> > >> KIP
> > >> > > > > >> addresses.
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > Thanks
> > >> > > > > >> > > > > > > > > > > > Eno
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted
> > Yu <
> > >> > > > > >> > > > > yuzhihong@gmail.com
> > >> > > > > >> > > > > > >
> > >> > > > > >> > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > Change looks good.
> > >> > > > > >> > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > Thanks
> > >> > > > > >> > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM,
> > Lucas
> > >> > Wang
> > >> > > <
> > >> > > > > >> > > > > > > > lucasatucla@gmail.com
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > >> > > > > >> > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
> > >> updated
> > >> > > the
> > >> > > > > KIP.
> > >> > > > > >> > > Please
> > >> > > > > >> > > > > > take
> > >> > > > > >> > > > > > > > > > another
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > look.
> > >> > > > > >> > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > Lucas
> > >> > > > > >> > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM,
> > Ted
> > >> Yu
> > >> > <
> > >> > > > > >> > > > > > > yuzhihong@gmail.com
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala
> :
> > >> > > > > >> > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > >> > > > > >> > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > It would be good if you can
> > include
> > >> > the
> > >> > > > > >> default
> > >> > > > > >> > > value
> > >> > > > > >> > > > > for
> > >> > > > > >> > > > > > > > this
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > new
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > config
> > >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > >> > > > > >> > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > Thanks
> > >> > > > > >> > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28
> PM,
> > >> Lucas
> > >> > > > Wang
> > >> > > > > <
> > >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > >> > > > > >> > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by
> adding
> > a
> > >> new
> > >> > > > > config,
> > >> > > > > >> > > > instead
> > >> > > > > >> > > > > of
> > >> > > > > >> > > > > > > > > reusing
> > >> > > > > >> > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > > > > > existing one.
> > >> > > > > >> > > > > > > > > > > > > > > > Please take another look when
> > you
> > >> > have
> > >> > > > > time.
> > >> > > > > >> > > > Thanks a
> > >> > > > > >> > > > > > > lot!
> > >> > > > > >> > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > Lucas
> > >> > > > > >> > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33
> > PM,
> > >> Ted
> > >> > > Yu
> > >> > > > <
> > >> > > > > >> > > > > > > > yuzhihong@gmail.com
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of
> > resource
> > >> if
> > >> > > > > control
> > >> > > > > >> > > request
> > >> > > > > >> > > > > > rate
> > >> > > > > >> > > > > > > is
> > >> > > > > >> > > > > > > > > low
> > >> > > > > >> > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > I don't know if control
> > request
> > >> > rate
> > >> > > > can
> > >> > > > > >> get
> > >> > > > > >> > to
> > >> > > > > >> > > > > > > 100,000,
> > >> > > > > >> > > > > > > > > > > likely
> > >> > > > > >> > > > > > > > > > > > > not.
> > >> > > > > >> > > > > > > > > > > > > > > Then
> > >> > > > > >> > > > > > > > > > > > > > > > > using the same bound as
> that
> > >> for
> > >> > > data
> > >> > > > > >> > requests
> > >> > > > > >> > > > > seems
> > >> > > > > >> > > > > > > > high.
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> 10:13
> > >> PM,
> > >> > > > Lucas
> > >> > > > > >> Wang
> > >> > > > > >> > <
> > >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look
> at
> > >> this
> > >> > > > KIP.
> > >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the
> setting
> > >> of
> > >> > > > > >> > > > > > "queued.max.requests"
> > >> > > > > >> > > > > > > in
> > >> > > > > >> > > > > > > > > > > > cluster A
> > >> > > > > >> > > > > > > > > > > > > > is
> > >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > >> > > > > >> > > > > > > > > > > > > > > > > > while the setting in
> > cluster
> > >> B
> > >> > is
> > >> > > > > >> 100,000.
> > >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference
> > >> might
> > >> > > have
> > >> > > > > >> > indicated
> > >> > > > > >> > > > > that
> > >> > > > > >> > > > > > > > > machines
> > >> > > > > >> > > > > > > > > > > in
> > >> > > > > >> > > > > > > > > > > > > > > cluster
> > >> > > > > >> > > > > > > > > > > > > > > > B
> > >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > >> > > > "queued.max.requests",
> > >> > > > > >> the
> > >> > > > > >> > > > > > > > > > > controlRequestQueue
> > >> > > > > >> > > > > > > > > > > > in
> > >> > > > > >> > > > > > > > > > > > > > > > cluster
> > >> > > > > >> > > > > > > > > > > > > > > > > B
> > >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity
> > without
> > >> > > > > explicitly
> > >> > > > > >> > > > bothering
> > >> > > > > >> > > > > > the
> > >> > > > > >> > > > > > > > > > > > operators.
> > >> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
> > >> > argument
> > >> > > > can
> > >> > > > > be
> > >> > > > > >> > that
> > >> > > > > >> > > > > maybe
> > >> > > > > >> > > > > > > > > that's
> > >> > > > > >> > > > > > > > > > a
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > waste
> > >> > > > > >> > > > > > > > > > > > > > of
> > >> > > > > >> > > > > > > > > > > > > > > > > > resource if control
> request
> > >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators
> > may
> > >> > want
> > >> > > > to
> > >> > > > > >> fine
> > >> > > > > >> > > tune
> > >> > > > > >> > > > > the
> > >> > > > > >> > > > > > > > > > capacity
> > >> > > > > >> > > > > > > > > > > of
> > >> > > > > >> > > > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either
> > approach,
> > >> and
> > >> > > can
> > >> > > > > >> change
> > >> > > > > >> > > it
> > >> > > > > >> > > > if
> > >> > > > > >> > > > > > you
> > >> > > > > >> > > > > > > > or
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > anyone
> > >> > > > > >> > > > > > > > > > > > > > else
> > >> > > > > >> > > > > > > > > > > > > > > > > feels
> > >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the
> > extra
> > >> > > > config.
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > 3:11
> > >> PM,
> > >> > > Ted
> > >> > > > > Yu
> > >> > > > > >> <
> > >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > >> > > > > >> > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> > >> Alternatives,
> > >> > #2,
> > >> > > > can
> > >> > > > > >> you
> > >> > > > > >> > > > > > elaborate
> > >> > > > > >> > > > > > > a
> > >> > > > > >> > > > > > > > > bit
> > >> > > > > >> > > > > > > > > > > more
> > >> > > > > >> > > > > > > > > > > > > on
> > >> > > > > >> > > > > > > > > > > > > > > why
> > >> > > > > >> > > > > > > > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > > > > > > > > separate config has
> > bigger
> > >> > > impact
> > >> > > > ?
> > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > >> 2:00
> > >> > PM,
> > >> > > > > Dong
> > >> > > > > >> > Lin <
> > >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > >> > > > > >> > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP.
> > Looks
> > >> > good
> > >> > > > > >> overall.
> > >> > > > > >> > > > Some
> > >> > > > > >> > > > > > > > > comments
> > >> > > > > >> > > > > > > > > > > > below:
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify
> > the
> > >> > full
> > >> > > > > mbean
> > >> > > > > >> for
> > >> > > > > >> > > the
> > >> > > > > >> > > > > new
> > >> > > > > >> > > > > > > > > metrics
> > >> > > > > >> > > > > > > > > > > in
> > >> > > > > >> > > > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > > > > KIP.
> > >> > > > > >> > > > > > > > > > > > > > > > > Can
> > >> > > > > >> > > > > > > > > > > > > > > > > > > you
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the
> > Public
> > >> > > > Interface
> > >> > > > > >> > > section
> > >> > > > > >> > > > > > > similar
> > >> > > > > >> > > > > > > > > to
> > >> > > > > >> > > > > > > > > > > > KIP-237
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > >> https://cwiki.apache.org/
> > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > 237%3A+More+Controller+Health+
> > >> > > > > >> Metrics>
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could
> follow
> > >> the
> > >> > > same
> > >> > > > > >> > pattern
> > >> > > > > >> > > as
> > >> > > > > >> > > > > > > KIP-153
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> > >> https://cwiki.apache.org/
> > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > >
> > >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > >> > > > > >> > > > > > > > > > > > > metric>,
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
> > >> existing
> > >> > > > sensor
> > >> > > > > >> name
> > >> > > > > >> > > > > > > > > "BytesInPerSec"
> > >> > > > > >> > > > > > > > > > > and
> > >> > > > > >> > > > > > > > > > > > > add
> > >> > > > > >> > > > > > > > > > > > > > a
> > >> > > > > >> > > > > > > > > > > > > > > > new
> > >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> "ReplicationBytesInPerSec",
> > >> > > > rather
> > >> > > > > >> than
> > >> > > > > >> > > > > > replacing
> > >> > > > > >> > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > sensor
> > >> > > > > >> > > > > > > > > > > > > > > name "
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with
> > e.g.
> > >> > > > > >> > > > > "ClientBytesInPerSec".
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the
> KIP
> > >> > > changes
> > >> > > > > the
> > >> > > > > >> > > > semantics
> > >> > > > > >> > > > > > of
> > >> > > > > >> > > > > > > > the
> > >> > > > > >> > > > > > > > >
> > >> > > > > >> > > > > > > > > > > broker
> > >> > > > > >> > > > > > > > > > > > > > > config
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
> > >> > because
> > >> > > > the
> > >> > > > > >> > number
> > >> > > > > >> > > of
> > >> > > > > >> > > > > > total
> > >> > > > > >> > > > > > > > > > > requests
> > >> > > > > >> > > > > > > > > > > > > > queued
> > >> > > > > >> > > > > > > > > > > > > > > > in
> > >> > > > > >> > > > > > > > > > > > > > > > > > the
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no
> > longer
> > >> > > bounded
> > >> > > > > by
> > >> > > > > >> > > > > > > > > > > "queued.max.requests".
> > >> > > > > >> > > > > > > > > > > > > This
> > >> > > > > >> > > > > > > > > > > > > > > > > > probably
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified
> > in
> > >> the
> > >> > > > > Public
> > >> > > > > >> > > > > Interfaces
> > >> > > > > >> > > > > > > > > section
> > >> > > > > >> > > > > > > > > > > for
> > >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018
> at
> > >> > 12:45
> > >> > > > PM,
> > >> > > > > >> Lucas
> > >> > > > > >> > > > Wang
> > >> > > > > >> > > > > <
> > >> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > >> > > > > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291
> to
> > >> add a
> > >> > > > > >> separate
> > >> > > > > >> > > queue
> > >> > > > > >> > > > > for
> > >> > > > > >> > > > > > > > > > > controller
> > >> > > > > >> > > > > > > > > > > > > > > > requests:
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > >> https://cwiki.apache.org/
> > >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > 291%
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > >> > 3A+Have+separate+queues+for+
> > >> > > > > >> > > > > > > > > > control+requests+and+data+
> > >> > > > > >> > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > requests
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please
> take a
> > >> look
> > >> > > and
> > >> > > > > >> let me
> > >> > > > > >> > > > know
> > >> > > > > >> > > > > > your
> > >> > > > > >> > > > > > > > > > > feedback?
> > >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

Hi Lucas,

Seems like the main intent here is to prioritize the controller request
over any other requests.
In that case, we can change the request queue to a dequeue, where you
always insert the normal requests (produce, consume,..etc) to the end of
the dequeue, but if its a controller request, you insert it to the head of
the queue. This ensures that the controller request will be given higher
priority over other requests.

Also since we only read one request from the socket and mute it and only
unmute it after handling the request, this would ensure that we don't
handle controller requests out of order.

With this approach we can avoid the second queue and the additional config
for the size of the queue.

What do you think ?

Thanks,

Mayuresh


On Wed, Jul 18, 2018 at 3:05 AM Becket Qin <be...@gmail.com> wrote:

> Hey Joel,
>
> Thank for the detail explanation. I agree the current design makes sense.
> My confusion is about whether the new config for the controller queue
> capacity is necessary. I cannot think of a case in which users would change
> it.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Lucas,
> >
> > I guess my question can be rephrased to "do we expect user to ever change
> > the controller request queue capacity"? If we agree that 20 is already a
> > very generous default number and we do not expect user to change it, is
> it
> > still necessary to expose this as a config?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> >> @Becket
> >> 1. Thanks for the comment. You are right that normally there should be
> >> just
> >> one controller request because of muting,
> >> and I had NOT intended to say there would be many enqueued controller
> >> requests.
> >> I went through the KIP again, and I'm not sure which part conveys that
> >> info.
> >> I'd be happy to revise if you point it out the section.
> >>
> >> 2. Though it should not happen in normal conditions, the current design
> >> does not preclude multiple controllers running
> >> at the same time, hence if we don't have the controller queue capacity
> >> config and simply make its capacity to be 1,
> >> network threads handling requests from different controllers will be
> >> blocked during those troublesome times,
> >> which is probably not what we want. On the other hand, adding the extra
> >> config with a default value, say 20, guards us from issues in those
> >> troublesome times, and IMO there isn't much downside of adding the extra
> >> config.
> >>
> >> @Mayuresh
> >> Good catch, this sentence is an obsolete statement based on a previous
> >> design. I've revised the wording in the KIP.
> >>
> >> Thanks,
> >> Lucas
> >>
> >> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> >> gharatmayuresh15@gmail.com> wrote:
> >>
> >> > Hi Lucas,
> >> >
> >> > Thanks for the KIP.
> >> > I am trying to understand why you think "The memory consumption can
> rise
> >> > given the total number of queued requests can go up to 2x" in the
> impact
> >> > section. Normally the requests from controller to a Broker are not
> high
> >> > volume, right ?
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Mayuresh
> >> >
> >> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com>
> >> wrote:
> >> >
> >> > > Thanks for the KIP, Lucas. Separating the control plane from the
> data
> >> > plane
> >> > > makes a lot of sense.
> >> > >
> >> > > In the KIP you mentioned that the controller request queue may have
> >> many
> >> > > requests in it. Will this be a common case? The controller requests
> >> still
> >> > > goes through the SocketServer. The SocketServer will mute the
> channel
> >> > once
> >> > > a request is read and put into the request channel. So assuming
> there
> >> is
> >> > > only one connection between controller and each broker, on the
> broker
> >> > side,
> >> > > there should be only one controller request in the controller
> request
> >> > queue
> >> > > at any given time. If that is the case, do we need a separate
> >> controller
> >> > > request queue capacity config? The default value 20 means that we
> >> expect
> >> > > there are 20 controller switches to happen in a short period of
> time.
> >> I
> >> > am
> >> > > not sure whether someone should increase the controller request
> queue
> >> > > capacity to handle such case, as it seems indicating something very
> >> wrong
> >> > > has happened.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > >
> >> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Thanks for the update Lucas.
> >> > > >
> >> > > > I think the motivation section is intuitive. It will be good to
> >> learn
> >> > > more
> >> > > > about the comments from other reviewers.
> >> > > >
> >> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <
> lucasatucla@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Hi Dong,
> >> > > > >
> >> > > > > I've updated the motivation section of the KIP by explaining the
> >> > cases
> >> > > > that
> >> > > > > would have user impacts.
> >> > > > > Please take a look at let me know your comments.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Lucas
> >> > > > >
> >> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <
> lucasatucla@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi Dong,
> >> > > > > >
> >> > > > > > The simulation of disk being slow is merely for me to easily
> >> > > construct
> >> > > > a
> >> > > > > > testing scenario
> >> > > > > > with a backlog of produce requests. In production, other than
> >> the
> >> > > disk
> >> > > > > > being slow, a backlog of
> >> > > > > > produce requests may also be caused by high produce QPS.
> >> > > > > > In that case, we may not want to kill the broker and that's
> when
> >> > this
> >> > > > KIP
> >> > > > > > can be useful, both for JBOD
> >> > > > > > and non-JBOD setup.
> >> > > > > >
> >> > > > > > Going back to your previous question about each ProduceRequest
> >> > > covering
> >> > > > > 20
> >> > > > > > partitions that are randomly
> >> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that
> >> > tries
> >> > > to
> >> > > > > > switch the current broker, say broker0, from leader to
> follower
> >> > > > > > *for one of the partitions*, say *test-0*. For the sake of
> >> > argument,
> >> > > > > > let's also assume the other brokers, say broker1, have
> *stopped*
> >> > > > fetching
> >> > > > > > from
> >> > > > > > the current broker, i.e. broker0.
> >> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> >> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
> >> LeaderAndISR
> >> > > will
> >> > > > be
> >> > > > > > put into the purgatory,
> >> > > > > >         and since they'll never be replicated to other brokers
> >> > > (because
> >> > > > > of
> >> > > > > > the assumption made above), they will
> >> > > > > >         be completed either when the LeaderAndISR request is
> >> > > processed
> >> > > > or
> >> > > > > > when the timeout happens.
> >> > > > > >   1.2 With this KIP, broker0 will immediately transition the
> >> > > partition
> >> > > > > > test-0 to become a follower,
> >> > > > > >         after the current broker sees the replication of the
> >> > > remaining
> >> > > > 19
> >> > > > > > partitions, it can send a response indicating that
> >> > > > > >         it's no longer the leader for the "test-0".
> >> > > > > >   To see the latency difference between 1.1 and 1.2, let's say
> >> > there
> >> > > > are
> >> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are
> 8
> >> io
> >> > > > > threads,
> >> > > > > >   so each io thread will process approximately 3000 produce
> >> > requests.
> >> > > > Now
> >> > > > > > let's investigate the io thread that finally processed the
> >> > > > LeaderAndISR.
> >> > > > > >   For the 3000 produce requests, if we model the time when
> their
> >> > > > > remaining
> >> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the
> LeaderAndISR
> >> > > > request
> >> > > > > is
> >> > > > > > processed at time t3000.
> >> > > > > >   Without this KIP, the 1st produce request would have waited
> an
> >> > > extra
> >> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of
> >> t3000 -
> >> > > t1,
> >> > > > > etc.
> >> > > > > >   Roughly speaking, the latency difference is bigger for the
> >> > earlier
> >> > > > > > produce requests than for the later ones. For the same reason,
> >> the
> >> > > more
> >> > > > > > ProduceRequests queued
> >> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped
> by
> >> the
> >> > > > > > produce timeout).
> >> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> >> > > > > >   There will be no latency differences in this case, but
> >> > > > > >   2.1 without this KIP, the records of partition test-0 in the
> >> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to
> >> the
> >> > > local
> >> > > > > log,
> >> > > > > >         and eventually be truncated after processing the
> >> > > LeaderAndISR.
> >> > > > > > This is what's referred to as
> >> > > > > >         "some unofficial definition of data loss in terms of
> >> > messages
> >> > > > > > beyond the high watermark".
> >> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
> >> > > > LeaderAndISR
> >> > > > > > is immediately processed, the response to producers will have
> >> > > > > >         the NotLeaderForPartition error, causing producers to
> >> retry
> >> > > > > >
> >> > > > > > This explanation above is the benefit for reducing the latency
> >> of a
> >> > > > > broker
> >> > > > > > becoming the follower,
> >> > > > > > closely related is reducing the latency of a broker becoming
> the
> >> > > > leader.
> >> > > > > > In this case, the benefit is even more obvious, if other
> brokers
> >> > have
> >> > > > > > resigned leadership, and the
> >> > > > > > current broker should take leadership. Any delay in processing
> >> the
> >> > > > > > LeaderAndISR will be perceived
> >> > > > > > by clients as unavailability. In extreme cases, this can cause
> >> > failed
> >> > > > > > produce requests if the retries are
> >> > > > > > exhausted.
> >> > > > > >
> >> > > > > > Another two types of controller requests are UpdateMetadata
> and
> >> > > > > > StopReplica, which I'll briefly discuss as follows:
> >> > > > > > For UpdateMetadata requests, delayed processing means clients
> >> > > receiving
> >> > > > > > stale metadata, e.g. with the wrong leadership info
> >> > > > > > for certain partitions, and the effect is more retries or even
> >> > fatal
> >> > > > > > failure if the retries are exhausted.
> >> > > > > >
> >> > > > > > For StopReplica requests, a long queuing time may degrade the
> >> > > > performance
> >> > > > > > of topic deletion.
> >> > > > > >
> >> > > > > > Regarding your last question of the delay for
> >> > DescribeLogDirsRequest,
> >> > > > you
> >> > > > > > are right
> >> > > > > > that this KIP cannot help with the latency in getting the log
> >> dirs
> >> > > > info,
> >> > > > > > and it's only relevant
> >> > > > > > when controller requests are involved.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Lucas
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <lindong28@gmail.com
> >
> >> > > wrote:
> >> > > > > >
> >> > > > > >> Hey Jun,
> >> > > > > >>
> >> > > > > >> Thanks much for the comments. It is good point. So the
> feature
> >> may
> >> > > be
> >> > > > > >> useful for JBOD use-case. I have one question below.
> >> > > > > >>
> >> > > > > >> Hey Lucas,
> >> > > > > >>
> >> > > > > >> Do you think this feature is also useful for non-JBOD setup
> or
> >> it
> >> > is
> >> > > > > only
> >> > > > > >> useful for the JBOD setup? It may be useful to understand
> this.
> >> > > > > >>
> >> > > > > >> When the broker is setup using JBOD, in order to move leaders
> >> on
> >> > the
> >> > > > > >> failed
> >> > > > > >> disk to other disks, the system operator first needs to get
> the
> >> > list
> >> > > > of
> >> > > > > >> partitions on the failed disk. This is currently achieved
> using
> >> > > > > >> AdminClient.describeLogDirs(), which sends
> >> DescribeLogDirsRequest
> >> > to
> >> > > > the
> >> > > > > >> broker. If we only prioritize the controller requests, then
> the
> >> > > > > >> DescribeLogDirsRequest
> >> > > > > >> may still take a long time to be processed by the broker. So
> >> the
> >> > > > overall
> >> > > > > >> time to move leaders away from the failed disk may still be
> >> long
> >> > > even
> >> > > > > with
> >> > > > > >> this KIP. What do you think?
> >> > > > > >>
> >> > > > > >> Thanks,
> >> > > > > >> Dong
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> >> lucasatucla@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > >>
> >> > > > > >> > Thanks for the insightful comment, Jun.
> >> > > > > >> >
> >> > > > > >> > @Dong,
> >> > > > > >> > Since both of the two comments in your previous email are
> >> about
> >> > > the
> >> > > > > >> > benefits of this KIP and whether it's useful,
> >> > > > > >> > in light of Jun's last comment, do you agree that this KIP
> >> can
> >> > be
> >> > > > > >> > beneficial in the case mentioned by Jun?
> >> > > > > >> > Please let me know, thanks!
> >> > > > > >> >
> >> > > > > >> > Regards,
> >> > > > > >> > Lucas
> >> > > > > >> >
> >> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io>
> >> > wrote:
> >> > > > > >> >
> >> > > > > >> > > Hi, Lucas, Dong,
> >> > > > > >> > >
> >> > > > > >> > > If all disks on a broker are slow, one probably should
> just
> >> > kill
> >> > > > the
> >> > > > > >> > > broker. In that case, this KIP may not help. If only one
> of
> >> > the
> >> > > > > disks
> >> > > > > >> on
> >> > > > > >> > a
> >> > > > > >> > > broker is slow, one may want to fail that disk and move
> the
> >> > > > leaders
> >> > > > > on
> >> > > > > >> > that
> >> > > > > >> > > disk to other brokers. In that case, being able to
> process
> >> the
> >> > > > > >> > LeaderAndIsr
> >> > > > > >> > > requests faster will potentially help the producers
> recover
> >> > > > quicker.
> >> > > > > >> > >
> >> > > > > >> > > Thanks,
> >> > > > > >> > >
> >> > > > > >> > > Jun
> >> > > > > >> > >
> >> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> >> lindong28@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > >> > >
> >> > > > > >> > > > Hey Lucas,
> >> > > > > >> > > >
> >> > > > > >> > > > Thanks for the reply. Some follow up questions below.
> >> > > > > >> > > >
> >> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20
> partitions
> >> > that
> >> > > > are
> >> > > > > >> > > randomly
> >> > > > > >> > > > distributed across all partitions, then each
> >> ProduceRequest
> >> > > will
> >> > > > > >> likely
> >> > > > > >> > > > cover some partitions for which the broker is still
> >> leader
> >> > > after
> >> > > > > it
> >> > > > > >> > > quickly
> >> > > > > >> > > > processes the
> >> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> >> > > > processing
> >> > > > > >> these
> >> > > > > >> > > > ProduceRequest and request will still be very high with
> >> this
> >> > > > KIP.
> >> > > > > It
> >> > > > > >> > > seems
> >> > > > > >> > > > that most ProduceRequest will still timeout after 30
> >> > seconds.
> >> > > Is
> >> > > > > >> this
> >> > > > > >> > > > understanding correct?
> >> > > > > >> > > >
> >> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout
> >> after
> >> > > 30
> >> > > > > >> > seconds,
> >> > > > > >> > > > then it is less clear how this KIP reduces average
> >> produce
> >> > > > > latency.
> >> > > > > >> Can
> >> > > > > >> > > you
> >> > > > > >> > > > clarify what metrics can be improved by this KIP?
> >> > > > > >> > > >
> >> > > > > >> > > > Not sure why system operator directly cares number of
> >> > > truncated
> >> > > > > >> > messages.
> >> > > > > >> > > > Do you mean this KIP can improve average throughput or
> >> > reduce
> >> > > > > >> message
> >> > > > > >> > > > duplication? It will be good to understand this.
> >> > > > > >> > > >
> >> > > > > >> > > > Thanks,
> >> > > > > >> > > > Dong
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> >> > > lucasatucla@gmail.com
> >> > > > >
> >> > > > > >> > wrote:
> >> > > > > >> > > >
> >> > > > > >> > > > > Hi Dong,
> >> > > > > >> > > > >
> >> > > > > >> > > > > Thanks for your valuable comments. Please see my
> reply
> >> > > below.
> >> > > > > >> > > > >
> >> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> >> > > consider
> >> > > > a
> >> > > > > >> more
> >> > > > > >> > > > common
> >> > > > > >> > > > > scenario
> >> > > > > >> > > > > where broker0 is the leader of many partitions. And
> >> let's
> >> > > say
> >> > > > > for
> >> > > > > >> > some
> >> > > > > >> > > > > reason its IO becomes slow.
> >> > > > > >> > > > > The number of leader partitions on broker0 is so
> large,
> >> > say
> >> > > > 10K,
> >> > > > > >> that
> >> > > > > >> > > the
> >> > > > > >> > > > > cluster is skewed,
> >> > > > > >> > > > > and the operator would like to shift the leadership
> >> for a
> >> > > lot
> >> > > > of
> >> > > > > >> > > > > partitions, say 9K, to other brokers,
> >> > > > > >> > > > > either manually or through some service like cruise
> >> > control.
> >> > > > > >> > > > > With this KIP, not only will the leadership
> transitions
> >> > > finish
> >> > > > > >> more
> >> > > > > >> > > > > quickly, helping the cluster itself becoming more
> >> > balanced,
> >> > > > > >> > > > > but all existing producers corresponding to the 9K
> >> > > partitions
> >> > > > > will
> >> > > > > >> > get
> >> > > > > >> > > > the
> >> > > > > >> > > > > errors relatively quickly
> >> > > > > >> > > > > rather than relying on their timeout, thanks to the
> >> > batched
> >> > > > > async
> >> > > > > >> ZK
> >> > > > > >> > > > > operations.
> >> > > > > >> > > > > To me it's a useful feature to have during such
> >> > troublesome
> >> > > > > times.
> >> > > > > >> > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > > > 2. The experiments in the Google Doc have shown that
> >> with
> >> > > this
> >> > > > > KIP
> >> > > > > >> > many
> >> > > > > >> > > > > producers
> >> > > > > >> > > > > receive an explicit error NotLeaderForPartition,
> based
> >> on
> >> > > > which
> >> > > > > >> they
> >> > > > > >> > > > retry
> >> > > > > >> > > > > immediately.
> >> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
> >> their
> >> > > > single
> >> > > > > >> > > message
> >> > > > > >> > > > is
> >> > > > > >> > > > > much smaller
> >> > > > > >> > > > > compared with the case of timing out without the KIP
> >> (30
> >> > > > seconds
> >> > > > > >> for
> >> > > > > >> > > > timing
> >> > > > > >> > > > > out + quick retry).
> >> > > > > >> > > > > One might argue that reducing the timing out on the
> >> > producer
> >> > > > > side
> >> > > > > >> can
> >> > > > > >> > > > > achieve the same result,
> >> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> >> > > > > >> > > > >
> >> > > > > >> > > > > Also *IF* there were a metric to show the number of
> >> > > truncated
> >> > > > > >> > messages
> >> > > > > >> > > on
> >> > > > > >> > > > > brokers,
> >> > > > > >> > > > > with the experiments done in the Google Doc, it
> should
> >> be
> >> > > easy
> >> > > > > to
> >> > > > > >> see
> >> > > > > >> > > > that
> >> > > > > >> > > > > a lot fewer messages need
> >> > > > > >> > > > > to be truncated on broker0 since the up-to-date
> >> metadata
> >> > > > avoids
> >> > > > > >> > > appending
> >> > > > > >> > > > > of messages
> >> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a
> system
> >> > > > operator
> >> > > > > >> and
> >> > > > > >> > ask
> >> > > > > >> > > > > whether
> >> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
> >> > answer
> >> > > > is
> >> > > > > >> yes.
> >> > > > > >> > > > >
> >> > > > > >> > > > > 3. To answer your question, I think it might be
> >> helpful to
> >> > > > > >> construct
> >> > > > > >> > > some
> >> > > > > >> > > > > formulas.
> >> > > > > >> > > > > To simplify the modeling, I'm going back to the case
> >> where
> >> > > > there
> >> > > > > >> is
> >> > > > > >> > > only
> >> > > > > >> > > > > ONE partition involved.
> >> > > > > >> > > > > Following the experiments in the Google Doc, let's
> say
> >> > > broker0
> >> > > > > >> > becomes
> >> > > > > >> > > > the
> >> > > > > >> > > > > follower at time t0,
> >> > > > > >> > > > > and after t0 there were still N produce requests in
> its
> >> > > > request
> >> > > > > >> > queue.
> >> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
> >> broker0
> >> > > can
> >> > > > > >> reply
> >> > > > > >> > > with
> >> > > > > >> > > > an
> >> > > > > >> > > > > NotLeaderForPartition exception,
> >> > > > > >> > > > > let's use M1 to denote the average processing time of
> >> > > replying
> >> > > > > >> with
> >> > > > > >> > > such
> >> > > > > >> > > > an
> >> > > > > >> > > > > error message.
> >> > > > > >> > > > > Without this KIP, the broker will need to append
> >> messages
> >> > to
> >> > > > > >> > segments,
> >> > > > > >> > > > > which may trigger a flush to disk,
> >> > > > > >> > > > > let's use M2 to denote the average processing time
> for
> >> > such
> >> > > > > logic.
> >> > > > > >> > > > > Then the average extra latency incurred without this
> >> KIP
> >> > is
> >> > > N
> >> > > > *
> >> > > > > >> (M2 -
> >> > > > > >> > > > M1) /
> >> > > > > >> > > > > 2.
> >> > > > > >> > > > >
> >> > > > > >> > > > > In practice, M2 should always be larger than M1,
> which
> >> > means
> >> > > > as
> >> > > > > >> long
> >> > > > > >> > > as N
> >> > > > > >> > > > > is positive,
> >> > > > > >> > > > > we would see improvements on the average latency.
> >> > > > > >> > > > > There does not need to be significant backlog of
> >> requests
> >> > in
> >> > > > the
> >> > > > > >> > > request
> >> > > > > >> > > > > queue,
> >> > > > > >> > > > > or severe degradation of disk performance to have the
> >> > > > > improvement.
> >> > > > > >> > > > >
> >> > > > > >> > > > > Regards,
> >> > > > > >> > > > > Lucas
> >> > > > > >> > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > > > [1] For instance, reducing the timeout on the
> producer
> >> > side
> >> > > > can
> >> > > > > >> > trigger
> >> > > > > >> > > > > unnecessary duplicate requests
> >> > > > > >> > > > > when the corresponding leader broker is overloaded,
> >> > > > exacerbating
> >> > > > > >> the
> >> > > > > >> > > > > situation.
> >> > > > > >> > > > >
> >> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> >> > > lindong28@gmail.com
> >> > > > >
> >> > > > > >> > wrote:
> >> > > > > >> > > > >
> >> > > > > >> > > > > > Hey Lucas,
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Thanks much for the detailed documentation of the
> >> > > > experiment.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Initially I also think having a separate queue for
> >> > > > controller
> >> > > > > >> > > requests
> >> > > > > >> > > > is
> >> > > > > >> > > > > > useful because, as you mentioned in the summary
> >> section
> >> > of
> >> > > > the
> >> > > > > >> > Google
> >> > > > > >> > > > > doc,
> >> > > > > >> > > > > > controller requests are generally more important
> than
> >> > data
> >> > > > > >> requests
> >> > > > > >> > > and
> >> > > > > >> > > > > we
> >> > > > > >> > > > > > probably want controller requests to be processed
> >> > sooner.
> >> > > > But
> >> > > > > >> then
> >> > > > > >> > > Eno
> >> > > > > >> > > > > has
> >> > > > > >> > > > > > two very good questions which I am not sure the
> >> Google
> >> > doc
> >> > > > has
> >> > > > > >> > > answered
> >> > > > > >> > > > > > explicitly. Could you help with the following
> >> questions?
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > 1) It is not very clear what is the actual benefit
> of
> >> > > > KIP-291
> >> > > > > to
> >> > > > > >> > > users.
> >> > > > > >> > > > > The
> >> > > > > >> > > > > > experiment setup in the Google doc simulates the
> >> > scenario
> >> > > > that
> >> > > > > >> > broker
> >> > > > > >> > > > is
> >> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow
> >> disk.
> >> > > It
> >> > > > > >> > currently
> >> > > > > >> > > > > > assumes that there is only 1 partition. But in the
> >> > common
> >> > > > > >> scenario,
> >> > > > > >> > > it
> >> > > > > >> > > > is
> >> > > > > >> > > > > > probably reasonable to assume that there are many
> >> other
> >> > > > > >> partitions
> >> > > > > >> > > that
> >> > > > > >> > > > > are
> >> > > > > >> > > > > > also actively produced to and ProduceRequest to
> these
> >> > > > > partition
> >> > > > > >> > also
> >> > > > > >> > > > > takes
> >> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0
> >> can
> >> > > > become
> >> > > > > >> > > follower
> >> > > > > >> > > > > for
> >> > > > > >> > > > > > the partition 0 soon, it probably still needs to
> >> process
> >> > > the
> >> > > > > >> > > > > ProduceRequest
> >> > > > > >> > > > > > slowly t in the queue because these ProduceRequests
> >> > cover
> >> > > > > other
> >> > > > > >> > > > > partitions.
> >> > > > > >> > > > > > Thus most ProduceRequest will still timeout after
> 30
> >> > > seconds
> >> > > > > and
> >> > > > > >> > most
> >> > > > > >> > > > > > clients will still likely timeout after 30 seconds.
> >> Then
> >> > > it
> >> > > > is
> >> > > > > >> not
> >> > > > > >> > > > > > obviously what is the benefit to client since
> client
> >> > will
> >> > > > > >> timeout
> >> > > > > >> > > after
> >> > > > > >> > > > > 30
> >> > > > > >> > > > > > seconds before possibly re-connecting to broker1,
> >> with
> >> > or
> >> > > > > >> without
> >> > > > > >> > > > > KIP-291.
> >> > > > > >> > > > > > Did I miss something here?
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > 2) I guess Eno's is asking for the specific
> benefits
> >> of
> >> > > this
> >> > > > > >> KIP to
> >> > > > > >> > > > user
> >> > > > > >> > > > > or
> >> > > > > >> > > > > > system administrator, e.g. whether this KIP
> decreases
> >> > > > average
> >> > > > > >> > > latency,
> >> > > > > >> > > > > > 999th percentile latency, probably of exception
> >> exposed
> >> > to
> >> > > > > >> client
> >> > > > > >> > > etc.
> >> > > > > >> > > > It
> >> > > > > >> > > > > > is probably useful to clarify this.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > 3) Does this KIP help improve user experience only
> >> when
> >> > > > there
> >> > > > > is
> >> > > > > >> > > issue
> >> > > > > >> > > > > with
> >> > > > > >> > > > > > broker, e.g. significant backlog in the request
> queue
> >> > due
> >> > > to
> >> > > > > >> slow
> >> > > > > >> > > disk
> >> > > > > >> > > > as
> >> > > > > >> > > > > > described in the Google doc? Or is this KIP also
> >> useful
> >> > > when
> >> > > > > >> there
> >> > > > > >> > is
> >> > > > > >> > > > no
> >> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful
> to
> >> > > clarify
> >> > > > > >> this
> >> > > > > >> > to
> >> > > > > >> > > > > > understand the benefit of this KIP.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > Thanks much,
> >> > > > > >> > > > > > Dong
> >> > > > > >> > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> >> > > > > >> lucasatucla@gmail.com
> >> > > > > >> > >
> >> > > > > >> > > > > wrote:
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > > Hi Eno,
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > Sorry for the delay in getting the experiment
> >> results.
> >> > > > > >> > > > > > > Here is a link to the positive impact achieved by
> >> > > > > implementing
> >> > > > > >> > the
> >> > > > > >> > > > > > proposed
> >> > > > > >> > > > > > > change:
> >> > > > > >> > > > > > > https://docs.google.com/document/d/
> >> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> >> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> >> > > > > >> > > > > > > Please take a look when you have time and let me
> >> know
> >> > > your
> >> > > > > >> > > feedback.
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > Regards,
> >> > > > > >> > > > > > > Lucas
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> >> > > kafka@harsha.io>
> >> > > > > >> wrote:
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might
> >> suit
> >> > > our
> >> > > > > >> > > > requirements
> >> > > > > >> > > > > > > > better.
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > Thanks,
> >> > > > > >> > > > > > > > Harsha
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> >> > > > > >> > > > lucasatucla@gmail.com
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > > > wrote:
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > Hi Harsha,
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > If I understand correctly, the replication
> >> quota
> >> > > > > mechanism
> >> > > > > >> > > > proposed
> >> > > > > >> > > > > > in
> >> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> >> > > > > >> > > > > > > > > Have you tried it out?
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > Thanks,
> >> > > > > >> > > > > > > > > Lucas
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> >> > > > > kafka@harsha.io
> >> > > > > >> >
> >> > > > > >> > > > wrote:
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > Hi Lucas,
> >> > > > > >> > > > > > > > > > One more question, any thoughts on making
> >> this
> >> > > > > >> configurable
> >> > > > > >> > > > > > > > > > and also allowing subset of data requests
> to
> >> be
> >> > > > > >> > prioritized.
> >> > > > > >> > > > For
> >> > > > > >> > > > > > > > example
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out
> a
> >> > > broker
> >> > > > > and
> >> > > > > >> > bring
> >> > > > > >> > > > new
> >> > > > > >> > > > > > one
> >> > > > > >> > > > > > > > it
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > will try to become follower and have lot of
> >> > fetch
> >> > > > > >> requests
> >> > > > > >> > to
> >> > > > > >> > > > > other
> >> > > > > >> > > > > > > > > leaders
> >> > > > > >> > > > > > > > > > in clusters. This will negatively effect
> the
> >> > > > > >> > > application/client
> >> > > > > >> > > > > > > > > requests.
> >> > > > > >> > > > > > > > > > We are also exploring the similar solution
> to
> >> > > > > >> de-prioritize
> >> > > > > >> > > if
> >> > > > > >> > > > a
> >> > > > > >> > > > > > new
> >> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are
> >> ok
> >> > > with
> >> > > > > the
> >> > > > > >> > > replica
> >> > > > > >> > > > > to
> >> > > > > >> > > > > > be
> >> > > > > >> > > > > > > > > > taking time but the leaders should
> prioritize
> >> > the
> >> > > > > client
> >> > > > > >> > > > > requests.
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > Thanks,
> >> > > > > >> > > > > > > > > > Harsha
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas
> Wang
> >> > > wrote:
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > Hi Eno,
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> >> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet,
> >> so no
> >> > > > > >> > experimental
> >> > > > > >> > > > > > results
> >> > > > > >> > > > > > > > so
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > far.
> >> > > > > >> > > > > > > > > > > And I plan to test in out in the
> following
> >> > days.
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > - You are absolutely right that the
> >> priority
> >> > > queue
> >> > > > > >> does
> >> > > > > >> > not
> >> > > > > >> > > > > > > > completely
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > prevent
> >> > > > > >> > > > > > > > > > > data requests being processed ahead of
> >> > > controller
> >> > > > > >> > requests.
> >> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
> >> > mitigate
> >> > > > the
> >> > > > > >> > effect
> >> > > > > >> > > > of
> >> > > > > >> > > > > > > stable
> >> > > > > >> > > > > > > > > > > metadata.
> >> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the
> >> > > results
> >> > > > > >> when I
> >> > > > > >> > > have
> >> > > > > >> > > > > it.
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > Regards,
> >> > > > > >> > > > > > > > > > > Lucas
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
> >> Thereska
> >> > <
> >> > > > > >> > > > > > > > eno.thereska@gmail.com
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > Hi Lucas,
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
> >> > this.
> >> > > A
> >> > > > > >> couple
> >> > > > > >> > of
> >> > > > > >> > > > > > > > questions:
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > > - did you notice any positive change
> >> after
> >> > > > > >> implementing
> >> > > > > >> > > > this
> >> > > > > >> > > > > > KIP?
> >> > > > > >> > > > > > > > > I'm
> >> > > > > >> > > > > > > > > > > > wondering if you have any experimental
> >> > results
> >> > > > > that
> >> > > > > >> > show
> >> > > > > >> > > > the
> >> > > > > >> > > > > > > > benefit
> >> > > > > >> > > > > > > > > of
> >> > > > > >> > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > two queues.
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in
> >> > > > addressing
> >> > > > > >> the
> >> > > > > >> > > > > problem
> >> > > > > >> > > > > > > the
> >> > > > > >> > > > > > > > > KIP
> >> > > > > >> > > > > > > > > > > > identifies. Even with priority queues,
> >> you
> >> > > will
> >> > > > > >> > sometimes
> >> > > > > >> > > > > > > (often?)
> >> > > > > >> > > > > > > > > have
> >> > > > > >> > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > case that data plane requests will be
> >> ahead
> >> > of
> >> > > > the
> >> > > > > >> > > control
> >> > > > > >> > > > > > plane
> >> > > > > >> > > > > > > > > > > requests.
> >> > > > > >> > > > > > > > > > > > This happens because the system might
> >> have
> >> > > > already
> >> > > > > >> > > started
> >> > > > > >> > > > > > > > > processing
> >> > > > > >> > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > data plane requests before the control
> >> plane
> >> > > > ones
> >> > > > > >> > > arrived.
> >> > > > > >> > > > So
> >> > > > > >> > > > > > it
> >> > > > > >> > > > > > > > > would
> >> > > > > >> > > > > > > > > > > be
> >> > > > > >> > > > > > > > > > > > good to know what % of the problem this
> >> KIP
> >> > > > > >> addresses.
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > Thanks
> >> > > > > >> > > > > > > > > > > > Eno
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted
> Yu <
> >> > > > > >> > > > > yuzhihong@gmail.com
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > Change looks good.
> >> > > > > >> > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > Thanks
> >> > > > > >> > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM,
> Lucas
> >> > Wang
> >> > > <
> >> > > > > >> > > > > > > > lucasatucla@gmail.com
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > Hi Ted,
> >> > > > > >> > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
> >> updated
> >> > > the
> >> > > > > KIP.
> >> > > > > >> > > Please
> >> > > > > >> > > > > > take
> >> > > > > >> > > > > > > > > > another
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > look.
> >> > > > > >> > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > Lucas
> >> > > > > >> > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM,
> Ted
> >> Yu
> >> > <
> >> > > > > >> > > > > > > yuzhihong@gmail.com
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> >> > > > > >> > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> >> > > > > >> > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > It would be good if you can
> include
> >> > the
> >> > > > > >> default
> >> > > > > >> > > value
> >> > > > > >> > > > > for
> >> > > > > >> > > > > > > > this
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > new
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > config
> >> > > > > >> > > > > > > > > > > > > > > in the KIP.
> >> > > > > >> > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > Thanks
> >> > > > > >> > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
> >> Lucas
> >> > > > Wang
> >> > > > > <
> >> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> >> > > > > >> > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding
> a
> >> new
> >> > > > > config,
> >> > > > > >> > > > instead
> >> > > > > >> > > > > of
> >> > > > > >> > > > > > > > > reusing
> >> > > > > >> > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > > > > > existing one.
> >> > > > > >> > > > > > > > > > > > > > > > Please take another look when
> you
> >> > have
> >> > > > > time.
> >> > > > > >> > > > Thanks a
> >> > > > > >> > > > > > > lot!
> >> > > > > >> > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > Lucas
> >> > > > > >> > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33
> PM,
> >> Ted
> >> > > Yu
> >> > > > <
> >> > > > > >> > > > > > > > yuzhihong@gmail.com
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of
> resource
> >> if
> >> > > > > control
> >> > > > > >> > > request
> >> > > > > >> > > > > > rate
> >> > > > > >> > > > > > > is
> >> > > > > >> > > > > > > > > low
> >> > > > > >> > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > I don't know if control
> request
> >> > rate
> >> > > > can
> >> > > > > >> get
> >> > > > > >> > to
> >> > > > > >> > > > > > > 100,000,
> >> > > > > >> > > > > > > > > > > likely
> >> > > > > >> > > > > > > > > > > > > not.
> >> > > > > >> > > > > > > > > > > > > > > Then
> >> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that
> >> for
> >> > > data
> >> > > > > >> > requests
> >> > > > > >> > > > > seems
> >> > > > > >> > > > > > > > high.
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13
> >> PM,
> >> > > > Lucas
> >> > > > > >> Wang
> >> > > > > >> > <
> >> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> >> > > > > >> > > > > > > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at
> >> this
> >> > > > KIP.
> >> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting
> >> of
> >> > > > > >> > > > > > "queued.max.requests"
> >> > > > > >> > > > > > > in
> >> > > > > >> > > > > > > > > > > > cluster A
> >> > > > > >> > > > > > > > > > > > > > is
> >> > > > > >> > > > > > > > > > > > > > > > > 1000,
> >> > > > > >> > > > > > > > > > > > > > > > > > while the setting in
> cluster
> >> B
> >> > is
> >> > > > > >> 100,000.
> >> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference
> >> might
> >> > > have
> >> > > > > >> > indicated
> >> > > > > >> > > > > that
> >> > > > > >> > > > > > > > > machines
> >> > > > > >> > > > > > > > > > > in
> >> > > > > >> > > > > > > > > > > > > > > cluster
> >> > > > > >> > > > > > > > > > > > > > > > B
> >> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> >> > > > "queued.max.requests",
> >> > > > > >> the
> >> > > > > >> > > > > > > > > > > controlRequestQueue
> >> > > > > >> > > > > > > > > > > > in
> >> > > > > >> > > > > > > > > > > > > > > > cluster
> >> > > > > >> > > > > > > > > > > > > > > > > B
> >> > > > > >> > > > > > > > > > > > > > > > > > automatically
> >> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity
> without
> >> > > > > explicitly
> >> > > > > >> > > > bothering
> >> > > > > >> > > > > > the
> >> > > > > >> > > > > > > > > > > > operators.
> >> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
> >> > argument
> >> > > > can
> >> > > > > be
> >> > > > > >> > that
> >> > > > > >> > > > > maybe
> >> > > > > >> > > > > > > > > that's
> >> > > > > >> > > > > > > > > > a
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > waste
> >> > > > > >> > > > > > > > > > > > > > of
> >> > > > > >> > > > > > > > > > > > > > > > > > resource if control request
> >> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators
> may
> >> > want
> >> > > > to
> >> > > > > >> fine
> >> > > > > >> > > tune
> >> > > > > >> > > > > the
> >> > > > > >> > > > > > > > > > capacity
> >> > > > > >> > > > > > > > > > > of
> >> > > > > >> > > > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either
> approach,
> >> and
> >> > > can
> >> > > > > >> change
> >> > > > > >> > > it
> >> > > > > >> > > > if
> >> > > > > >> > > > > > you
> >> > > > > >> > > > > > > > or
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > anyone
> >> > > > > >> > > > > > > > > > > > > > else
> >> > > > > >> > > > > > > > > > > > > > > > > feels
> >> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the
> extra
> >> > > > config.
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> 3:11
> >> PM,
> >> > > Ted
> >> > > > > Yu
> >> > > > > >> <
> >> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> >> > > > > >> > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> >> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
> >> Alternatives,
> >> > #2,
> >> > > > can
> >> > > > > >> you
> >> > > > > >> > > > > > elaborate
> >> > > > > >> > > > > > > a
> >> > > > > >> > > > > > > > > bit
> >> > > > > >> > > > > > > > > > > more
> >> > > > > >> > > > > > > > > > > > > on
> >> > > > > >> > > > > > > > > > > > > > > why
> >> > > > > >> > > > > > > > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > > > > > > > > separate config has
> bigger
> >> > > impact
> >> > > > ?
> >> > > > > >> > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> >> > > > > >> > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> >> 2:00
> >> > PM,
> >> > > > > Dong
> >> > > > > >> > Lin <
> >> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> >> > > > > >> > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP.
> Looks
> >> > good
> >> > > > > >> overall.
> >> > > > > >> > > > Some
> >> > > > > >> > > > > > > > > comments
> >> > > > > >> > > > > > > > > > > > below:
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify
> the
> >> > full
> >> > > > > mbean
> >> > > > > >> for
> >> > > > > >> > > the
> >> > > > > >> > > > > new
> >> > > > > >> > > > > > > > > metrics
> >> > > > > >> > > > > > > > > > > in
> >> > > > > >> > > > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > > > > KIP.
> >> > > > > >> > > > > > > > > > > > > > > > > Can
> >> > > > > >> > > > > > > > > > > > > > > > > > > you
> >> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the
> Public
> >> > > > Interface
> >> > > > > >> > > section
> >> > > > > >> > > > > > > similar
> >> > > > > >> > > > > > > > > to
> >> > > > > >> > > > > > > > > > > > KIP-237
> >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> >> https://cwiki.apache.org/
> >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > 237%3A+More+Controller+Health+
> >> > > > > >> Metrics>
> >> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow
> >> the
> >> > > same
> >> > > > > >> > pattern
> >> > > > > >> > > as
> >> > > > > >> > > > > > > KIP-153
> >> > > > > >> > > > > > > > > > > > > > > > > > > > <
> >> https://cwiki.apache.org/
> >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> >> > > > > >> > > > > > > > > > > > > metric>,
> >> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
> >> existing
> >> > > > sensor
> >> > > > > >> name
> >> > > > > >> > > > > > > > > "BytesInPerSec"
> >> > > > > >> > > > > > > > > > > and
> >> > > > > >> > > > > > > > > > > > > add
> >> > > > > >> > > > > > > > > > > > > > a
> >> > > > > >> > > > > > > > > > > > > > > > new
> >> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> "ReplicationBytesInPerSec",
> >> > > > rather
> >> > > > > >> than
> >> > > > > >> > > > > > replacing
> >> > > > > >> > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > sensor
> >> > > > > >> > > > > > > > > > > > > > > name "
> >> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with
> e.g.
> >> > > > > >> > > > > "ClientBytesInPerSec".
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> >> > > changes
> >> > > > > the
> >> > > > > >> > > > semantics
> >> > > > > >> > > > > > of
> >> > > > > >> > > > > > > > the
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > > broker
> >> > > > > >> > > > > > > > > > > > > > > config
> >> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
> >> > because
> >> > > > the
> >> > > > > >> > number
> >> > > > > >> > > of
> >> > > > > >> > > > > > total
> >> > > > > >> > > > > > > > > > > requests
> >> > > > > >> > > > > > > > > > > > > > queued
> >> > > > > >> > > > > > > > > > > > > > > > in
> >> > > > > >> > > > > > > > > > > > > > > > > > the
> >> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no
> longer
> >> > > bounded
> >> > > > > by
> >> > > > > >> > > > > > > > > > > "queued.max.requests".
> >> > > > > >> > > > > > > > > > > > > This
> >> > > > > >> > > > > > > > > > > > > > > > > > probably
> >> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified
> in
> >> the
> >> > > > > Public
> >> > > > > >> > > > > Interfaces
> >> > > > > >> > > > > > > > > section
> >> > > > > >> > > > > > > > > > > for
> >> > > > > >> > > > > > > > > > > > > > > > > discussion.
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> >> > 12:45
> >> > > > PM,
> >> > > > > >> Lucas
> >> > > > > >> > > > Wang
> >> > > > > >> > > > > <
> >> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to
> >> add a
> >> > > > > >> separate
> >> > > > > >> > > queue
> >> > > > > >> > > > > for
> >> > > > > >> > > > > > > > > > > controller
> >> > > > > >> > > > > > > > > > > > > > > > requests:
> >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> >> https://cwiki.apache.org/
> >> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > 291%
> >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> >> > 3A+Have+separate+queues+for+
> >> > > > > >> > > > > > > > > > control+requests+and+data+
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > requests
> >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a
> >> look
> >> > > and
> >> > > > > >> let me
> >> > > > > >> > > > know
> >> > > > > >> > > > > > your
> >> > > > > >> > > > > > > > > > > feedback?
> >> > > > > >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > >



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hey Joel,

Thank for the detail explanation. I agree the current design makes sense.
My confusion is about whether the new config for the controller queue
capacity is necessary. I cannot think of a case in which users would change
it.

Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Lucas,
>
> I guess my question can be rephrased to "do we expect user to ever change
> the controller request queue capacity"? If we agree that 20 is already a
> very generous default number and we do not expect user to change it, is it
> still necessary to expose this as a config?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lu...@gmail.com> wrote:
>
>> @Becket
>> 1. Thanks for the comment. You are right that normally there should be
>> just
>> one controller request because of muting,
>> and I had NOT intended to say there would be many enqueued controller
>> requests.
>> I went through the KIP again, and I'm not sure which part conveys that
>> info.
>> I'd be happy to revise if you point it out the section.
>>
>> 2. Though it should not happen in normal conditions, the current design
>> does not preclude multiple controllers running
>> at the same time, hence if we don't have the controller queue capacity
>> config and simply make its capacity to be 1,
>> network threads handling requests from different controllers will be
>> blocked during those troublesome times,
>> which is probably not what we want. On the other hand, adding the extra
>> config with a default value, say 20, guards us from issues in those
>> troublesome times, and IMO there isn't much downside of adding the extra
>> config.
>>
>> @Mayuresh
>> Good catch, this sentence is an obsolete statement based on a previous
>> design. I've revised the wording in the KIP.
>>
>> Thanks,
>> Lucas
>>
>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com> wrote:
>>
>> > Hi Lucas,
>> >
>> > Thanks for the KIP.
>> > I am trying to understand why you think "The memory consumption can rise
>> > given the total number of queued requests can go up to 2x" in the impact
>> > section. Normally the requests from controller to a Broker are not high
>> > volume, right ?
>> >
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com>
>> wrote:
>> >
>> > > Thanks for the KIP, Lucas. Separating the control plane from the data
>> > plane
>> > > makes a lot of sense.
>> > >
>> > > In the KIP you mentioned that the controller request queue may have
>> many
>> > > requests in it. Will this be a common case? The controller requests
>> still
>> > > goes through the SocketServer. The SocketServer will mute the channel
>> > once
>> > > a request is read and put into the request channel. So assuming there
>> is
>> > > only one connection between controller and each broker, on the broker
>> > side,
>> > > there should be only one controller request in the controller request
>> > queue
>> > > at any given time. If that is the case, do we need a separate
>> controller
>> > > request queue capacity config? The default value 20 means that we
>> expect
>> > > there are 20 controller switches to happen in a short period of time.
>> I
>> > am
>> > > not sure whether someone should increase the controller request queue
>> > > capacity to handle such case, as it seems indicating something very
>> wrong
>> > > has happened.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com>
>> wrote:
>> > >
>> > > > Thanks for the update Lucas.
>> > > >
>> > > > I think the motivation section is intuitive. It will be good to
>> learn
>> > > more
>> > > > about the comments from other reviewers.
>> > > >
>> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Dong,
>> > > > >
>> > > > > I've updated the motivation section of the KIP by explaining the
>> > cases
>> > > > that
>> > > > > would have user impacts.
>> > > > > Please take a look at let me know your comments.
>> > > > >
>> > > > > Thanks,
>> > > > > Lucas
>> > > > >
>> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lucasatucla@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Dong,
>> > > > > >
>> > > > > > The simulation of disk being slow is merely for me to easily
>> > > construct
>> > > > a
>> > > > > > testing scenario
>> > > > > > with a backlog of produce requests. In production, other than
>> the
>> > > disk
>> > > > > > being slow, a backlog of
>> > > > > > produce requests may also be caused by high produce QPS.
>> > > > > > In that case, we may not want to kill the broker and that's when
>> > this
>> > > > KIP
>> > > > > > can be useful, both for JBOD
>> > > > > > and non-JBOD setup.
>> > > > > >
>> > > > > > Going back to your previous question about each ProduceRequest
>> > > covering
>> > > > > 20
>> > > > > > partitions that are randomly
>> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that
>> > tries
>> > > to
>> > > > > > switch the current broker, say broker0, from leader to follower
>> > > > > > *for one of the partitions*, say *test-0*. For the sake of
>> > argument,
>> > > > > > let's also assume the other brokers, say broker1, have *stopped*
>> > > > fetching
>> > > > > > from
>> > > > > > the current broker, i.e. broker0.
>> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
>> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
>> LeaderAndISR
>> > > will
>> > > > be
>> > > > > > put into the purgatory,
>> > > > > >         and since they'll never be replicated to other brokers
>> > > (because
>> > > > > of
>> > > > > > the assumption made above), they will
>> > > > > >         be completed either when the LeaderAndISR request is
>> > > processed
>> > > > or
>> > > > > > when the timeout happens.
>> > > > > >   1.2 With this KIP, broker0 will immediately transition the
>> > > partition
>> > > > > > test-0 to become a follower,
>> > > > > >         after the current broker sees the replication of the
>> > > remaining
>> > > > 19
>> > > > > > partitions, it can send a response indicating that
>> > > > > >         it's no longer the leader for the "test-0".
>> > > > > >   To see the latency difference between 1.1 and 1.2, let's say
>> > there
>> > > > are
>> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8
>> io
>> > > > > threads,
>> > > > > >   so each io thread will process approximately 3000 produce
>> > requests.
>> > > > Now
>> > > > > > let's investigate the io thread that finally processed the
>> > > > LeaderAndISR.
>> > > > > >   For the 3000 produce requests, if we model the time when their
>> > > > > remaining
>> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
>> > > > request
>> > > > > is
>> > > > > > processed at time t3000.
>> > > > > >   Without this KIP, the 1st produce request would have waited an
>> > > extra
>> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of
>> t3000 -
>> > > t1,
>> > > > > etc.
>> > > > > >   Roughly speaking, the latency difference is bigger for the
>> > earlier
>> > > > > > produce requests than for the later ones. For the same reason,
>> the
>> > > more
>> > > > > > ProduceRequests queued
>> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped by
>> the
>> > > > > > produce timeout).
>> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
>> > > > > >   There will be no latency differences in this case, but
>> > > > > >   2.1 without this KIP, the records of partition test-0 in the
>> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to
>> the
>> > > local
>> > > > > log,
>> > > > > >         and eventually be truncated after processing the
>> > > LeaderAndISR.
>> > > > > > This is what's referred to as
>> > > > > >         "some unofficial definition of data loss in terms of
>> > messages
>> > > > > > beyond the high watermark".
>> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
>> > > > LeaderAndISR
>> > > > > > is immediately processed, the response to producers will have
>> > > > > >         the NotLeaderForPartition error, causing producers to
>> retry
>> > > > > >
>> > > > > > This explanation above is the benefit for reducing the latency
>> of a
>> > > > > broker
>> > > > > > becoming the follower,
>> > > > > > closely related is reducing the latency of a broker becoming the
>> > > > leader.
>> > > > > > In this case, the benefit is even more obvious, if other brokers
>> > have
>> > > > > > resigned leadership, and the
>> > > > > > current broker should take leadership. Any delay in processing
>> the
>> > > > > > LeaderAndISR will be perceived
>> > > > > > by clients as unavailability. In extreme cases, this can cause
>> > failed
>> > > > > > produce requests if the retries are
>> > > > > > exhausted.
>> > > > > >
>> > > > > > Another two types of controller requests are UpdateMetadata and
>> > > > > > StopReplica, which I'll briefly discuss as follows:
>> > > > > > For UpdateMetadata requests, delayed processing means clients
>> > > receiving
>> > > > > > stale metadata, e.g. with the wrong leadership info
>> > > > > > for certain partitions, and the effect is more retries or even
>> > fatal
>> > > > > > failure if the retries are exhausted.
>> > > > > >
>> > > > > > For StopReplica requests, a long queuing time may degrade the
>> > > > performance
>> > > > > > of topic deletion.
>> > > > > >
>> > > > > > Regarding your last question of the delay for
>> > DescribeLogDirsRequest,
>> > > > you
>> > > > > > are right
>> > > > > > that this KIP cannot help with the latency in getting the log
>> dirs
>> > > > info,
>> > > > > > and it's only relevant
>> > > > > > when controller requests are involved.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Lucas
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > >> Hey Jun,
>> > > > > >>
>> > > > > >> Thanks much for the comments. It is good point. So the feature
>> may
>> > > be
>> > > > > >> useful for JBOD use-case. I have one question below.
>> > > > > >>
>> > > > > >> Hey Lucas,
>> > > > > >>
>> > > > > >> Do you think this feature is also useful for non-JBOD setup or
>> it
>> > is
>> > > > > only
>> > > > > >> useful for the JBOD setup? It may be useful to understand this.
>> > > > > >>
>> > > > > >> When the broker is setup using JBOD, in order to move leaders
>> on
>> > the
>> > > > > >> failed
>> > > > > >> disk to other disks, the system operator first needs to get the
>> > list
>> > > > of
>> > > > > >> partitions on the failed disk. This is currently achieved using
>> > > > > >> AdminClient.describeLogDirs(), which sends
>> DescribeLogDirsRequest
>> > to
>> > > > the
>> > > > > >> broker. If we only prioritize the controller requests, then the
>> > > > > >> DescribeLogDirsRequest
>> > > > > >> may still take a long time to be processed by the broker. So
>> the
>> > > > overall
>> > > > > >> time to move leaders away from the failed disk may still be
>> long
>> > > even
>> > > > > with
>> > > > > >> this KIP. What do you think?
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Dong
>> > > > > >>
>> > > > > >>
>> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> lucasatucla@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >>
>> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > >> >
>> > > > > >> > @Dong,
>> > > > > >> > Since both of the two comments in your previous email are
>> about
>> > > the
>> > > > > >> > benefits of this KIP and whether it's useful,
>> > > > > >> > in light of Jun's last comment, do you agree that this KIP
>> can
>> > be
>> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > >> > Please let me know, thanks!
>> > > > > >> >
>> > > > > >> > Regards,
>> > > > > >> > Lucas
>> > > > > >> >
>> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io>
>> > wrote:
>> > > > > >> >
>> > > > > >> > > Hi, Lucas, Dong,
>> > > > > >> > >
>> > > > > >> > > If all disks on a broker are slow, one probably should just
>> > kill
>> > > > the
>> > > > > >> > > broker. In that case, this KIP may not help. If only one of
>> > the
>> > > > > disks
>> > > > > >> on
>> > > > > >> > a
>> > > > > >> > > broker is slow, one may want to fail that disk and move the
>> > > > leaders
>> > > > > on
>> > > > > >> > that
>> > > > > >> > > disk to other brokers. In that case, being able to process
>> the
>> > > > > >> > LeaderAndIsr
>> > > > > >> > > requests faster will potentially help the producers recover
>> > > > quicker.
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jun
>> > > > > >> > >
>> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> lindong28@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >> > >
>> > > > > >> > > > Hey Lucas,
>> > > > > >> > > >
>> > > > > >> > > > Thanks for the reply. Some follow up questions below.
>> > > > > >> > > >
>> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions
>> > that
>> > > > are
>> > > > > >> > > randomly
>> > > > > >> > > > distributed across all partitions, then each
>> ProduceRequest
>> > > will
>> > > > > >> likely
>> > > > > >> > > > cover some partitions for which the broker is still
>> leader
>> > > after
>> > > > > it
>> > > > > >> > > quickly
>> > > > > >> > > > processes the
>> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
>> > > > processing
>> > > > > >> these
>> > > > > >> > > > ProduceRequest and request will still be very high with
>> this
>> > > > KIP.
>> > > > > It
>> > > > > >> > > seems
>> > > > > >> > > > that most ProduceRequest will still timeout after 30
>> > seconds.
>> > > Is
>> > > > > >> this
>> > > > > >> > > > understanding correct?
>> > > > > >> > > >
>> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout
>> after
>> > > 30
>> > > > > >> > seconds,
>> > > > > >> > > > then it is less clear how this KIP reduces average
>> produce
>> > > > > latency.
>> > > > > >> Can
>> > > > > >> > > you
>> > > > > >> > > > clarify what metrics can be improved by this KIP?
>> > > > > >> > > >
>> > > > > >> > > > Not sure why system operator directly cares number of
>> > > truncated
>> > > > > >> > messages.
>> > > > > >> > > > Do you mean this KIP can improve average throughput or
>> > reduce
>> > > > > >> message
>> > > > > >> > > > duplication? It will be good to understand this.
>> > > > > >> > > >
>> > > > > >> > > > Thanks,
>> > > > > >> > > > Dong
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
>> > > lucasatucla@gmail.com
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > >
>> > > > > >> > > > > Thanks for your valuable comments. Please see my reply
>> > > below.
>> > > > > >> > > > >
>> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
>> > > consider
>> > > > a
>> > > > > >> more
>> > > > > >> > > > common
>> > > > > >> > > > > scenario
>> > > > > >> > > > > where broker0 is the leader of many partitions. And
>> let's
>> > > say
>> > > > > for
>> > > > > >> > some
>> > > > > >> > > > > reason its IO becomes slow.
>> > > > > >> > > > > The number of leader partitions on broker0 is so large,
>> > say
>> > > > 10K,
>> > > > > >> that
>> > > > > >> > > the
>> > > > > >> > > > > cluster is skewed,
>> > > > > >> > > > > and the operator would like to shift the leadership
>> for a
>> > > lot
>> > > > of
>> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > >> > > > > either manually or through some service like cruise
>> > control.
>> > > > > >> > > > > With this KIP, not only will the leadership transitions
>> > > finish
>> > > > > >> more
>> > > > > >> > > > > quickly, helping the cluster itself becoming more
>> > balanced,
>> > > > > >> > > > > but all existing producers corresponding to the 9K
>> > > partitions
>> > > > > will
>> > > > > >> > get
>> > > > > >> > > > the
>> > > > > >> > > > > errors relatively quickly
>> > > > > >> > > > > rather than relying on their timeout, thanks to the
>> > batched
>> > > > > async
>> > > > > >> ZK
>> > > > > >> > > > > operations.
>> > > > > >> > > > > To me it's a useful feature to have during such
>> > troublesome
>> > > > > times.
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > 2. The experiments in the Google Doc have shown that
>> with
>> > > this
>> > > > > KIP
>> > > > > >> > many
>> > > > > >> > > > > producers
>> > > > > >> > > > > receive an explicit error NotLeaderForPartition, based
>> on
>> > > > which
>> > > > > >> they
>> > > > > >> > > > retry
>> > > > > >> > > > > immediately.
>> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
>> their
>> > > > single
>> > > > > >> > > message
>> > > > > >> > > > is
>> > > > > >> > > > > much smaller
>> > > > > >> > > > > compared with the case of timing out without the KIP
>> (30
>> > > > seconds
>> > > > > >> for
>> > > > > >> > > > timing
>> > > > > >> > > > > out + quick retry).
>> > > > > >> > > > > One might argue that reducing the timing out on the
>> > producer
>> > > > > side
>> > > > > >> can
>> > > > > >> > > > > achieve the same result,
>> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
>> > > > > >> > > > >
>> > > > > >> > > > > Also *IF* there were a metric to show the number of
>> > > truncated
>> > > > > >> > messages
>> > > > > >> > > on
>> > > > > >> > > > > brokers,
>> > > > > >> > > > > with the experiments done in the Google Doc, it should
>> be
>> > > easy
>> > > > > to
>> > > > > >> see
>> > > > > >> > > > that
>> > > > > >> > > > > a lot fewer messages need
>> > > > > >> > > > > to be truncated on broker0 since the up-to-date
>> metadata
>> > > > avoids
>> > > > > >> > > appending
>> > > > > >> > > > > of messages
>> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
>> > > > operator
>> > > > > >> and
>> > > > > >> > ask
>> > > > > >> > > > > whether
>> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
>> > answer
>> > > > is
>> > > > > >> yes.
>> > > > > >> > > > >
>> > > > > >> > > > > 3. To answer your question, I think it might be
>> helpful to
>> > > > > >> construct
>> > > > > >> > > some
>> > > > > >> > > > > formulas.
>> > > > > >> > > > > To simplify the modeling, I'm going back to the case
>> where
>> > > > there
>> > > > > >> is
>> > > > > >> > > only
>> > > > > >> > > > > ONE partition involved.
>> > > > > >> > > > > Following the experiments in the Google Doc, let's say
>> > > broker0
>> > > > > >> > becomes
>> > > > > >> > > > the
>> > > > > >> > > > > follower at time t0,
>> > > > > >> > > > > and after t0 there were still N produce requests in its
>> > > > request
>> > > > > >> > queue.
>> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
>> broker0
>> > > can
>> > > > > >> reply
>> > > > > >> > > with
>> > > > > >> > > > an
>> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > >> > > > > let's use M1 to denote the average processing time of
>> > > replying
>> > > > > >> with
>> > > > > >> > > such
>> > > > > >> > > > an
>> > > > > >> > > > > error message.
>> > > > > >> > > > > Without this KIP, the broker will need to append
>> messages
>> > to
>> > > > > >> > segments,
>> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > >> > > > > let's use M2 to denote the average processing time for
>> > such
>> > > > > logic.
>> > > > > >> > > > > Then the average extra latency incurred without this
>> KIP
>> > is
>> > > N
>> > > > *
>> > > > > >> (M2 -
>> > > > > >> > > > M1) /
>> > > > > >> > > > > 2.
>> > > > > >> > > > >
>> > > > > >> > > > > In practice, M2 should always be larger than M1, which
>> > means
>> > > > as
>> > > > > >> long
>> > > > > >> > > as N
>> > > > > >> > > > > is positive,
>> > > > > >> > > > > we would see improvements on the average latency.
>> > > > > >> > > > > There does not need to be significant backlog of
>> requests
>> > in
>> > > > the
>> > > > > >> > > request
>> > > > > >> > > > > queue,
>> > > > > >> > > > > or severe degradation of disk performance to have the
>> > > > > improvement.
>> > > > > >> > > > >
>> > > > > >> > > > > Regards,
>> > > > > >> > > > > Lucas
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > [1] For instance, reducing the timeout on the producer
>> > side
>> > > > can
>> > > > > >> > trigger
>> > > > > >> > > > > unnecessary duplicate requests
>> > > > > >> > > > > when the corresponding leader broker is overloaded,
>> > > > exacerbating
>> > > > > >> the
>> > > > > >> > > > > situation.
>> > > > > >> > > > >
>> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
>> > > lindong28@gmail.com
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > > > >
>> > > > > >> > > > > > Hey Lucas,
>> > > > > >> > > > > >
>> > > > > >> > > > > > Thanks much for the detailed documentation of the
>> > > > experiment.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Initially I also think having a separate queue for
>> > > > controller
>> > > > > >> > > requests
>> > > > > >> > > > is
>> > > > > >> > > > > > useful because, as you mentioned in the summary
>> section
>> > of
>> > > > the
>> > > > > >> > Google
>> > > > > >> > > > > doc,
>> > > > > >> > > > > > controller requests are generally more important than
>> > data
>> > > > > >> requests
>> > > > > >> > > and
>> > > > > >> > > > > we
>> > > > > >> > > > > > probably want controller requests to be processed
>> > sooner.
>> > > > But
>> > > > > >> then
>> > > > > >> > > Eno
>> > > > > >> > > > > has
>> > > > > >> > > > > > two very good questions which I am not sure the
>> Google
>> > doc
>> > > > has
>> > > > > >> > > answered
>> > > > > >> > > > > > explicitly. Could you help with the following
>> questions?
>> > > > > >> > > > > >
>> > > > > >> > > > > > 1) It is not very clear what is the actual benefit of
>> > > > KIP-291
>> > > > > to
>> > > > > >> > > users.
>> > > > > >> > > > > The
>> > > > > >> > > > > > experiment setup in the Google doc simulates the
>> > scenario
>> > > > that
>> > > > > >> > broker
>> > > > > >> > > > is
>> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow
>> disk.
>> > > It
>> > > > > >> > currently
>> > > > > >> > > > > > assumes that there is only 1 partition. But in the
>> > common
>> > > > > >> scenario,
>> > > > > >> > > it
>> > > > > >> > > > is
>> > > > > >> > > > > > probably reasonable to assume that there are many
>> other
>> > > > > >> partitions
>> > > > > >> > > that
>> > > > > >> > > > > are
>> > > > > >> > > > > > also actively produced to and ProduceRequest to these
>> > > > > partition
>> > > > > >> > also
>> > > > > >> > > > > takes
>> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0
>> can
>> > > > become
>> > > > > >> > > follower
>> > > > > >> > > > > for
>> > > > > >> > > > > > the partition 0 soon, it probably still needs to
>> process
>> > > the
>> > > > > >> > > > > ProduceRequest
>> > > > > >> > > > > > slowly t in the queue because these ProduceRequests
>> > cover
>> > > > > other
>> > > > > >> > > > > partitions.
>> > > > > >> > > > > > Thus most ProduceRequest will still timeout after 30
>> > > seconds
>> > > > > and
>> > > > > >> > most
>> > > > > >> > > > > > clients will still likely timeout after 30 seconds.
>> Then
>> > > it
>> > > > is
>> > > > > >> not
>> > > > > >> > > > > > obviously what is the benefit to client since client
>> > will
>> > > > > >> timeout
>> > > > > >> > > after
>> > > > > >> > > > > 30
>> > > > > >> > > > > > seconds before possibly re-connecting to broker1,
>> with
>> > or
>> > > > > >> without
>> > > > > >> > > > > KIP-291.
>> > > > > >> > > > > > Did I miss something here?
>> > > > > >> > > > > >
>> > > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits
>> of
>> > > this
>> > > > > >> KIP to
>> > > > > >> > > > user
>> > > > > >> > > > > or
>> > > > > >> > > > > > system administrator, e.g. whether this KIP decreases
>> > > > average
>> > > > > >> > > latency,
>> > > > > >> > > > > > 999th percentile latency, probably of exception
>> exposed
>> > to
>> > > > > >> client
>> > > > > >> > > etc.
>> > > > > >> > > > It
>> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > >> > > > > >
>> > > > > >> > > > > > 3) Does this KIP help improve user experience only
>> when
>> > > > there
>> > > > > is
>> > > > > >> > > issue
>> > > > > >> > > > > with
>> > > > > >> > > > > > broker, e.g. significant backlog in the request queue
>> > due
>> > > to
>> > > > > >> slow
>> > > > > >> > > disk
>> > > > > >> > > > as
>> > > > > >> > > > > > described in the Google doc? Or is this KIP also
>> useful
>> > > when
>> > > > > >> there
>> > > > > >> > is
>> > > > > >> > > > no
>> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful to
>> > > clarify
>> > > > > >> this
>> > > > > >> > to
>> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > Thanks much,
>> > > > > >> > > > > > Dong
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
>> > > > > >> lucasatucla@gmail.com
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > > > Hi Eno,
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Sorry for the delay in getting the experiment
>> results.
>> > > > > >> > > > > > > Here is a link to the positive impact achieved by
>> > > > > implementing
>> > > > > >> > the
>> > > > > >> > > > > > proposed
>> > > > > >> > > > > > > change:
>> > > > > >> > > > > > > https://docs.google.com/document/d/
>> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > >> > > > > > > Please take a look when you have time and let me
>> know
>> > > your
>> > > > > >> > > feedback.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Regards,
>> > > > > >> > > > > > > Lucas
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
>> > > kafka@harsha.io>
>> > > > > >> wrote:
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might
>> suit
>> > > our
>> > > > > >> > > > requirements
>> > > > > >> > > > > > > > better.
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Thanks,
>> > > > > >> > > > > > > > Harsha
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
>> > > > > >> > > > lucasatucla@gmail.com
>> > > > > >> > > > > >
>> > > > > >> > > > > > > > wrote:
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > If I understand correctly, the replication
>> quota
>> > > > > mechanism
>> > > > > >> > > > proposed
>> > > > > >> > > > > > in
>> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
>> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > Lucas
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
>> > > > > kafka@harsha.io
>> > > > > >> >
>> > > > > >> > > > wrote:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > > > > > > One more question, any thoughts on making
>> this
>> > > > > >> configurable
>> > > > > >> > > > > > > > > > and also allowing subset of data requests to
>> be
>> > > > > >> > prioritized.
>> > > > > >> > > > For
>> > > > > >> > > > > > > > example
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out a
>> > > broker
>> > > > > and
>> > > > > >> > bring
>> > > > > >> > > > new
>> > > > > >> > > > > > one
>> > > > > >> > > > > > > > it
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > will try to become follower and have lot of
>> > fetch
>> > > > > >> requests
>> > > > > >> > to
>> > > > > >> > > > > other
>> > > > > >> > > > > > > > > leaders
>> > > > > >> > > > > > > > > > in clusters. This will negatively effect the
>> > > > > >> > > application/client
>> > > > > >> > > > > > > > > requests.
>> > > > > >> > > > > > > > > > We are also exploring the similar solution to
>> > > > > >> de-prioritize
>> > > > > >> > > if
>> > > > > >> > > > a
>> > > > > >> > > > > > new
>> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are
>> ok
>> > > with
>> > > > > the
>> > > > > >> > > replica
>> > > > > >> > > > > to
>> > > > > >> > > > > > be
>> > > > > >> > > > > > > > > > taking time but the leaders should prioritize
>> > the
>> > > > > client
>> > > > > >> > > > > requests.
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > Harsha
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
>> > > wrote:
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Hi Eno,
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Sorry for the delayed response.
>> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet,
>> so no
>> > > > > >> > experimental
>> > > > > >> > > > > > results
>> > > > > >> > > > > > > > so
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > far.
>> > > > > >> > > > > > > > > > > And I plan to test in out in the following
>> > days.
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > - You are absolutely right that the
>> priority
>> > > queue
>> > > > > >> does
>> > > > > >> > not
>> > > > > >> > > > > > > > completely
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > prevent
>> > > > > >> > > > > > > > > > > data requests being processed ahead of
>> > > controller
>> > > > > >> > requests.
>> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
>> > mitigate
>> > > > the
>> > > > > >> > effect
>> > > > > >> > > > of
>> > > > > >> > > > > > > stable
>> > > > > >> > > > > > > > > > > metadata.
>> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the
>> > > results
>> > > > > >> when I
>> > > > > >> > > have
>> > > > > >> > > > > it.
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Regards,
>> > > > > >> > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
>> Thereska
>> > <
>> > > > > >> > > > > > > > eno.thereska@gmail.com
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
>> > this.
>> > > A
>> > > > > >> couple
>> > > > > >> > of
>> > > > > >> > > > > > > > questions:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > > - did you notice any positive change
>> after
>> > > > > >> implementing
>> > > > > >> > > > this
>> > > > > >> > > > > > KIP?
>> > > > > >> > > > > > > > > I'm
>> > > > > >> > > > > > > > > > > > wondering if you have any experimental
>> > results
>> > > > > that
>> > > > > >> > show
>> > > > > >> > > > the
>> > > > > >> > > > > > > > benefit
>> > > > > >> > > > > > > > > of
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > two queues.
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in
>> > > > addressing
>> > > > > >> the
>> > > > > >> > > > > problem
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > > KIP
>> > > > > >> > > > > > > > > > > > identifies. Even with priority queues,
>> you
>> > > will
>> > > > > >> > sometimes
>> > > > > >> > > > > > > (often?)
>> > > > > >> > > > > > > > > have
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > case that data plane requests will be
>> ahead
>> > of
>> > > > the
>> > > > > >> > > control
>> > > > > >> > > > > > plane
>> > > > > >> > > > > > > > > > > requests.
>> > > > > >> > > > > > > > > > > > This happens because the system might
>> have
>> > > > already
>> > > > > >> > > started
>> > > > > >> > > > > > > > > processing
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > data plane requests before the control
>> plane
>> > > > ones
>> > > > > >> > > arrived.
>> > > > > >> > > > So
>> > > > > >> > > > > > it
>> > > > > >> > > > > > > > > would
>> > > > > >> > > > > > > > > > > be
>> > > > > >> > > > > > > > > > > > good to know what % of the problem this
>> KIP
>> > > > > >> addresses.
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > Eno
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
>> > > > > >> > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > Change looks good.
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas
>> > Wang
>> > > <
>> > > > > >> > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
>> updated
>> > > the
>> > > > > KIP.
>> > > > > >> > > Please
>> > > > > >> > > > > > take
>> > > > > >> > > > > > > > > > another
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > look.
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted
>> Yu
>> > <
>> > > > > >> > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > It would be good if you can include
>> > the
>> > > > > >> default
>> > > > > >> > > value
>> > > > > >> > > > > for
>> > > > > >> > > > > > > > this
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > new
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > config
>> > > > > >> > > > > > > > > > > > > > > in the KIP.
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
>> Lucas
>> > > > Wang
>> > > > > <
>> > > > > >> > > > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a
>> new
>> > > > > config,
>> > > > > >> > > > instead
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > > reusing
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > existing one.
>> > > > > >> > > > > > > > > > > > > > > > Please take another look when you
>> > have
>> > > > > time.
>> > > > > >> > > > Thanks a
>> > > > > >> > > > > > > lot!
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM,
>> Ted
>> > > Yu
>> > > > <
>> > > > > >> > > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource
>> if
>> > > > > control
>> > > > > >> > > request
>> > > > > >> > > > > > rate
>> > > > > >> > > > > > > is
>> > > > > >> > > > > > > > > low
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > I don't know if control request
>> > rate
>> > > > can
>> > > > > >> get
>> > > > > >> > to
>> > > > > >> > > > > > > 100,000,
>> > > > > >> > > > > > > > > > > likely
>> > > > > >> > > > > > > > > > > > > not.
>> > > > > >> > > > > > > > > > > > > > > Then
>> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that
>> for
>> > > data
>> > > > > >> > requests
>> > > > > >> > > > > seems
>> > > > > >> > > > > > > > high.
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13
>> PM,
>> > > > Lucas
>> > > > > >> Wang
>> > > > > >> > <
>> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > >> > > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at
>> this
>> > > > KIP.
>> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting
>> of
>> > > > > >> > > > > > "queued.max.requests"
>> > > > > >> > > > > > > in
>> > > > > >> > > > > > > > > > > > cluster A
>> > > > > >> > > > > > > > > > > > > > is
>> > > > > >> > > > > > > > > > > > > > > > > 1000,
>> > > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster
>> B
>> > is
>> > > > > >> 100,000.
>> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference
>> might
>> > > have
>> > > > > >> > indicated
>> > > > > >> > > > > that
>> > > > > >> > > > > > > > > machines
>> > > > > >> > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > cluster
>> > > > > >> > > > > > > > > > > > > > > > B
>> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
>> > > > "queued.max.requests",
>> > > > > >> the
>> > > > > >> > > > > > > > > > > controlRequestQueue
>> > > > > >> > > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > > cluster
>> > > > > >> > > > > > > > > > > > > > > > > B
>> > > > > >> > > > > > > > > > > > > > > > > > automatically
>> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
>> > > > > explicitly
>> > > > > >> > > > bothering
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > > > > > operators.
>> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
>> > argument
>> > > > can
>> > > > > be
>> > > > > >> > that
>> > > > > >> > > > > maybe
>> > > > > >> > > > > > > > > that's
>> > > > > >> > > > > > > > > > a
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > waste
>> > > > > >> > > > > > > > > > > > > > of
>> > > > > >> > > > > > > > > > > > > > > > > > resource if control request
>> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may
>> > want
>> > > > to
>> > > > > >> fine
>> > > > > >> > > tune
>> > > > > >> > > > > the
>> > > > > >> > > > > > > > > > capacity
>> > > > > >> > > > > > > > > > > of
>> > > > > >> > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach,
>> and
>> > > can
>> > > > > >> change
>> > > > > >> > > it
>> > > > > >> > > > if
>> > > > > >> > > > > > you
>> > > > > >> > > > > > > > or
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > anyone
>> > > > > >> > > > > > > > > > > > > > else
>> > > > > >> > > > > > > > > > > > > > > > > feels
>> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
>> > > > config.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11
>> PM,
>> > > Ted
>> > > > > Yu
>> > > > > >> <
>> > > > > >> > > > > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
>> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
>> Alternatives,
>> > #2,
>> > > > can
>> > > > > >> you
>> > > > > >> > > > > > elaborate
>> > > > > >> > > > > > > a
>> > > > > >> > > > > > > > > bit
>> > > > > >> > > > > > > > > > > more
>> > > > > >> > > > > > > > > > > > > on
>> > > > > >> > > > > > > > > > > > > > > why
>> > > > > >> > > > > > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
>> > > impact
>> > > > ?
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
>> 2:00
>> > PM,
>> > > > > Dong
>> > > > > >> > Lin <
>> > > > > >> > > > > > > > > > > > lindong28@gmail.com
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks
>> > good
>> > > > > >> overall.
>> > > > > >> > > > Some
>> > > > > >> > > > > > > > > comments
>> > > > > >> > > > > > > > > > > > below:
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the
>> > full
>> > > > > mbean
>> > > > > >> for
>> > > > > >> > > the
>> > > > > >> > > > > new
>> > > > > >> > > > > > > > > metrics
>> > > > > >> > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > KIP.
>> > > > > >> > > > > > > > > > > > > > > > > Can
>> > > > > >> > > > > > > > > > > > > > > > > > > you
>> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
>> > > > Interface
>> > > > > >> > > section
>> > > > > >> > > > > > > similar
>> > > > > >> > > > > > > > > to
>> > > > > >> > > > > > > > > > > > KIP-237
>> > > > > >> > > > > > > > > > > > > > > > > > > > <
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > 237%3A+More+Controller+Health+
>> > > > > >> Metrics>
>> > > > > >> > > > > > > > > > > > > > > > > > > > ?
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow
>> the
>> > > same
>> > > > > >> > pattern
>> > > > > >> > > as
>> > > > > >> > > > > > > KIP-153
>> > > > > >> > > > > > > > > > > > > > > > > > > > <
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
>> > > > > >> > > > > > > > > > > > > metric>,
>> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
>> existing
>> > > > sensor
>> > > > > >> name
>> > > > > >> > > > > > > > > "BytesInPerSec"
>> > > > > >> > > > > > > > > > > and
>> > > > > >> > > > > > > > > > > > > add
>> > > > > >> > > > > > > > > > > > > > a
>> > > > > >> > > > > > > > > > > > > > > > new
>> > > > > >> > > > > > > > > > > > > > > > > > > sensor
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> "ReplicationBytesInPerSec",
>> > > > rather
>> > > > > >> than
>> > > > > >> > > > > > replacing
>> > > > > >> > > > > > > > > the
>> > > > > >> > > > > > > > > > > > sensor
>> > > > > >> > > > > > > > > > > > > > > name "
>> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
>> > > > > >> > > > > "ClientBytesInPerSec".
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
>> > > changes
>> > > > > the
>> > > > > >> > > > semantics
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > broker
>> > > > > >> > > > > > > > > > > > > > > config
>> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
>> > because
>> > > > the
>> > > > > >> > number
>> > > > > >> > > of
>> > > > > >> > > > > > total
>> > > > > >> > > > > > > > > > > requests
>> > > > > >> > > > > > > > > > > > > > queued
>> > > > > >> > > > > > > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
>> > > bounded
>> > > > > by
>> > > > > >> > > > > > > > > > > "queued.max.requests".
>> > > > > >> > > > > > > > > > > > > This
>> > > > > >> > > > > > > > > > > > > > > > > > probably
>> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in
>> the
>> > > > > Public
>> > > > > >> > > > > Interfaces
>> > > > > >> > > > > > > > > section
>> > > > > >> > > > > > > > > > > for
>> > > > > >> > > > > > > > > > > > > > > > > discussion.
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
>> > 12:45
>> > > > PM,
>> > > > > >> Lucas
>> > > > > >> > > > Wang
>> > > > > >> > > > > <
>> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to
>> add a
>> > > > > >> separate
>> > > > > >> > > queue
>> > > > > >> > > > > for
>> > > > > >> > > > > > > > > > > controller
>> > > > > >> > > > > > > > > > > > > > > > requests:
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > 291%
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > 3A+Have+separate+queues+for+
>> > > > > >> > > > > > > > > > control+requests+and+data+
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > requests
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a
>> look
>> > > and
>> > > > > >> let me
>> > > > > >> > > > know
>> > > > > >> > > > > > your
>> > > > > >> > > > > > > > > > > feedback?
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your
>> > time!
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Regards,
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > -Regards,
>> > Mayuresh R. Gharat
>> > (862) 250-7125
>> >
>>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Hi Lucas,

I guess my question can be rephrased to "do we expect user to ever change
the controller request queue capacity"? If we agree that 20 is already a
very generous default number and we do not expect user to change it, is it
still necessary to expose this as a config?

Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lu...@gmail.com> wrote:

> @Becket
> 1. Thanks for the comment. You are right that normally there should be just
> one controller request because of muting,
> and I had NOT intended to say there would be many enqueued controller
> requests.
> I went through the KIP again, and I'm not sure which part conveys that
> info.
> I'd be happy to revise if you point it out the section.
>
> 2. Though it should not happen in normal conditions, the current design
> does not preclude multiple controllers running
> at the same time, hence if we don't have the controller queue capacity
> config and simply make its capacity to be 1,
> network threads handling requests from different controllers will be
> blocked during those troublesome times,
> which is probably not what we want. On the other hand, adding the extra
> config with a default value, say 20, guards us from issues in those
> troublesome times, and IMO there isn't much downside of adding the extra
> config.
>
> @Mayuresh
> Good catch, this sentence is an obsolete statement based on a previous
> design. I've revised the wording in the KIP.
>
> Thanks,
> Lucas
>
> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com> wrote:
>
> > Hi Lucas,
> >
> > Thanks for the KIP.
> > I am trying to understand why you think "The memory consumption can rise
> > given the total number of queued requests can go up to 2x" in the impact
> > section. Normally the requests from controller to a Broker are not high
> > volume, right ?
> >
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > Thanks for the KIP, Lucas. Separating the control plane from the data
> > plane
> > > makes a lot of sense.
> > >
> > > In the KIP you mentioned that the controller request queue may have
> many
> > > requests in it. Will this be a common case? The controller requests
> still
> > > goes through the SocketServer. The SocketServer will mute the channel
> > once
> > > a request is read and put into the request channel. So assuming there
> is
> > > only one connection between controller and each broker, on the broker
> > side,
> > > there should be only one controller request in the controller request
> > queue
> > > at any given time. If that is the case, do we need a separate
> controller
> > > request queue capacity config? The default value 20 means that we
> expect
> > > there are 20 controller switches to happen in a short period of time. I
> > am
> > > not sure whether someone should increase the controller request queue
> > > capacity to handle such case, as it seems indicating something very
> wrong
> > > has happened.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Thanks for the update Lucas.
> > > >
> > > > I think the motivation section is intuitive. It will be good to learn
> > > more
> > > > about the comments from other reviewers.
> > > >
> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Dong,
> > > > >
> > > > > I've updated the motivation section of the KIP by explaining the
> > cases
> > > > that
> > > > > would have user impacts.
> > > > > Please take a look at let me know your comments.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Dong,
> > > > > >
> > > > > > The simulation of disk being slow is merely for me to easily
> > > construct
> > > > a
> > > > > > testing scenario
> > > > > > with a backlog of produce requests. In production, other than the
> > > disk
> > > > > > being slow, a backlog of
> > > > > > produce requests may also be caused by high produce QPS.
> > > > > > In that case, we may not want to kill the broker and that's when
> > this
> > > > KIP
> > > > > > can be useful, both for JBOD
> > > > > > and non-JBOD setup.
> > > > > >
> > > > > > Going back to your previous question about each ProduceRequest
> > > covering
> > > > > 20
> > > > > > partitions that are randomly
> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that
> > tries
> > > to
> > > > > > switch the current broker, say broker0, from leader to follower
> > > > > > *for one of the partitions*, say *test-0*. For the sake of
> > argument,
> > > > > > let's also assume the other brokers, say broker1, have *stopped*
> > > > fetching
> > > > > > from
> > > > > > the current broker, i.e. broker0.
> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> > > will
> > > > be
> > > > > > put into the purgatory,
> > > > > >         and since they'll never be replicated to other brokers
> > > (because
> > > > > of
> > > > > > the assumption made above), they will
> > > > > >         be completed either when the LeaderAndISR request is
> > > processed
> > > > or
> > > > > > when the timeout happens.
> > > > > >   1.2 With this KIP, broker0 will immediately transition the
> > > partition
> > > > > > test-0 to become a follower,
> > > > > >         after the current broker sees the replication of the
> > > remaining
> > > > 19
> > > > > > partitions, it can send a response indicating that
> > > > > >         it's no longer the leader for the "test-0".
> > > > > >   To see the latency difference between 1.1 and 1.2, let's say
> > there
> > > > are
> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8
> io
> > > > > threads,
> > > > > >   so each io thread will process approximately 3000 produce
> > requests.
> > > > Now
> > > > > > let's investigate the io thread that finally processed the
> > > > LeaderAndISR.
> > > > > >   For the 3000 produce requests, if we model the time when their
> > > > > remaining
> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > > > request
> > > > > is
> > > > > > processed at time t3000.
> > > > > >   Without this KIP, the 1st produce request would have waited an
> > > extra
> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000
> -
> > > t1,
> > > > > etc.
> > > > > >   Roughly speaking, the latency difference is bigger for the
> > earlier
> > > > > > produce requests than for the later ones. For the same reason,
> the
> > > more
> > > > > > ProduceRequests queued
> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped by
> the
> > > > > > produce timeout).
> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > > > >   There will be no latency differences in this case, but
> > > > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> > > local
> > > > > log,
> > > > > >         and eventually be truncated after processing the
> > > LeaderAndISR.
> > > > > > This is what's referred to as
> > > > > >         "some unofficial definition of data loss in terms of
> > messages
> > > > > > beyond the high watermark".
> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
> > > > LeaderAndISR
> > > > > > is immediately processed, the response to producers will have
> > > > > >         the NotLeaderForPartition error, causing producers to
> retry
> > > > > >
> > > > > > This explanation above is the benefit for reducing the latency
> of a
> > > > > broker
> > > > > > becoming the follower,
> > > > > > closely related is reducing the latency of a broker becoming the
> > > > leader.
> > > > > > In this case, the benefit is even more obvious, if other brokers
> > have
> > > > > > resigned leadership, and the
> > > > > > current broker should take leadership. Any delay in processing
> the
> > > > > > LeaderAndISR will be perceived
> > > > > > by clients as unavailability. In extreme cases, this can cause
> > failed
> > > > > > produce requests if the retries are
> > > > > > exhausted.
> > > > > >
> > > > > > Another two types of controller requests are UpdateMetadata and
> > > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > > For UpdateMetadata requests, delayed processing means clients
> > > receiving
> > > > > > stale metadata, e.g. with the wrong leadership info
> > > > > > for certain partitions, and the effect is more retries or even
> > fatal
> > > > > > failure if the retries are exhausted.
> > > > > >
> > > > > > For StopReplica requests, a long queuing time may degrade the
> > > > performance
> > > > > > of topic deletion.
> > > > > >
> > > > > > Regarding your last question of the delay for
> > DescribeLogDirsRequest,
> > > > you
> > > > > > are right
> > > > > > that this KIP cannot help with the latency in getting the log
> dirs
> > > > info,
> > > > > > and it's only relevant
> > > > > > when controller requests are involved.
> > > > > >
> > > > > > Regards,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hey Jun,
> > > > > >>
> > > > > >> Thanks much for the comments. It is good point. So the feature
> may
> > > be
> > > > > >> useful for JBOD use-case. I have one question below.
> > > > > >>
> > > > > >> Hey Lucas,
> > > > > >>
> > > > > >> Do you think this feature is also useful for non-JBOD setup or
> it
> > is
> > > > > only
> > > > > >> useful for the JBOD setup? It may be useful to understand this.
> > > > > >>
> > > > > >> When the broker is setup using JBOD, in order to move leaders on
> > the
> > > > > >> failed
> > > > > >> disk to other disks, the system operator first needs to get the
> > list
> > > > of
> > > > > >> partitions on the failed disk. This is currently achieved using
> > > > > >> AdminClient.describeLogDirs(), which sends
> DescribeLogDirsRequest
> > to
> > > > the
> > > > > >> broker. If we only prioritize the controller requests, then the
> > > > > >> DescribeLogDirsRequest
> > > > > >> may still take a long time to be processed by the broker. So the
> > > > overall
> > > > > >> time to move leaders away from the failed disk may still be long
> > > even
> > > > > with
> > > > > >> this KIP. What do you think?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dong
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Thanks for the insightful comment, Jun.
> > > > > >> >
> > > > > >> > @Dong,
> > > > > >> > Since both of the two comments in your previous email are
> about
> > > the
> > > > > >> > benefits of this KIP and whether it's useful,
> > > > > >> > in light of Jun's last comment, do you agree that this KIP can
> > be
> > > > > >> > beneficial in the case mentioned by Jun?
> > > > > >> > Please let me know, thanks!
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Lucas
> > > > > >> >
> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > >> >
> > > > > >> > > Hi, Lucas, Dong,
> > > > > >> > >
> > > > > >> > > If all disks on a broker are slow, one probably should just
> > kill
> > > > the
> > > > > >> > > broker. In that case, this KIP may not help. If only one of
> > the
> > > > > disks
> > > > > >> on
> > > > > >> > a
> > > > > >> > > broker is slow, one may want to fail that disk and move the
> > > > leaders
> > > > > on
> > > > > >> > that
> > > > > >> > > disk to other brokers. In that case, being able to process
> the
> > > > > >> > LeaderAndIsr
> > > > > >> > > requests faster will potentially help the producers recover
> > > > quicker.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jun
> > > > > >> > >
> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hey Lucas,
> > > > > >> > > >
> > > > > >> > > > Thanks for the reply. Some follow up questions below.
> > > > > >> > > >
> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions
> > that
> > > > are
> > > > > >> > > randomly
> > > > > >> > > > distributed across all partitions, then each
> ProduceRequest
> > > will
> > > > > >> likely
> > > > > >> > > > cover some partitions for which the broker is still leader
> > > after
> > > > > it
> > > > > >> > > quickly
> > > > > >> > > > processes the
> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > > > processing
> > > > > >> these
> > > > > >> > > > ProduceRequest and request will still be very high with
> this
> > > > KIP.
> > > > > It
> > > > > >> > > seems
> > > > > >> > > > that most ProduceRequest will still timeout after 30
> > seconds.
> > > Is
> > > > > >> this
> > > > > >> > > > understanding correct?
> > > > > >> > > >
> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout
> after
> > > 30
> > > > > >> > seconds,
> > > > > >> > > > then it is less clear how this KIP reduces average produce
> > > > > latency.
> > > > > >> Can
> > > > > >> > > you
> > > > > >> > > > clarify what metrics can be improved by this KIP?
> > > > > >> > > >
> > > > > >> > > > Not sure why system operator directly cares number of
> > > truncated
> > > > > >> > messages.
> > > > > >> > > > Do you mean this KIP can improve average throughput or
> > reduce
> > > > > >> message
> > > > > >> > > > duplication? It will be good to understand this.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dong
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Dong,
> > > > > >> > > > >
> > > > > >> > > > > Thanks for your valuable comments. Please see my reply
> > > below.
> > > > > >> > > > >
> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> > > consider
> > > > a
> > > > > >> more
> > > > > >> > > > common
> > > > > >> > > > > scenario
> > > > > >> > > > > where broker0 is the leader of many partitions. And
> let's
> > > say
> > > > > for
> > > > > >> > some
> > > > > >> > > > > reason its IO becomes slow.
> > > > > >> > > > > The number of leader partitions on broker0 is so large,
> > say
> > > > 10K,
> > > > > >> that
> > > > > >> > > the
> > > > > >> > > > > cluster is skewed,
> > > > > >> > > > > and the operator would like to shift the leadership for
> a
> > > lot
> > > > of
> > > > > >> > > > > partitions, say 9K, to other brokers,
> > > > > >> > > > > either manually or through some service like cruise
> > control.
> > > > > >> > > > > With this KIP, not only will the leadership transitions
> > > finish
> > > > > >> more
> > > > > >> > > > > quickly, helping the cluster itself becoming more
> > balanced,
> > > > > >> > > > > but all existing producers corresponding to the 9K
> > > partitions
> > > > > will
> > > > > >> > get
> > > > > >> > > > the
> > > > > >> > > > > errors relatively quickly
> > > > > >> > > > > rather than relying on their timeout, thanks to the
> > batched
> > > > > async
> > > > > >> ZK
> > > > > >> > > > > operations.
> > > > > >> > > > > To me it's a useful feature to have during such
> > troublesome
> > > > > times.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > 2. The experiments in the Google Doc have shown that
> with
> > > this
> > > > > KIP
> > > > > >> > many
> > > > > >> > > > > producers
> > > > > >> > > > > receive an explicit error NotLeaderForPartition, based
> on
> > > > which
> > > > > >> they
> > > > > >> > > > retry
> > > > > >> > > > > immediately.
> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
> their
> > > > single
> > > > > >> > > message
> > > > > >> > > > is
> > > > > >> > > > > much smaller
> > > > > >> > > > > compared with the case of timing out without the KIP (30
> > > > seconds
> > > > > >> for
> > > > > >> > > > timing
> > > > > >> > > > > out + quick retry).
> > > > > >> > > > > One might argue that reducing the timing out on the
> > producer
> > > > > side
> > > > > >> can
> > > > > >> > > > > achieve the same result,
> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > > > >> > > > >
> > > > > >> > > > > Also *IF* there were a metric to show the number of
> > > truncated
> > > > > >> > messages
> > > > > >> > > on
> > > > > >> > > > > brokers,
> > > > > >> > > > > with the experiments done in the Google Doc, it should
> be
> > > easy
> > > > > to
> > > > > >> see
> > > > > >> > > > that
> > > > > >> > > > > a lot fewer messages need
> > > > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > > > avoids
> > > > > >> > > appending
> > > > > >> > > > > of messages
> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > > > operator
> > > > > >> and
> > > > > >> > ask
> > > > > >> > > > > whether
> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
> > answer
> > > > is
> > > > > >> yes.
> > > > > >> > > > >
> > > > > >> > > > > 3. To answer your question, I think it might be helpful
> to
> > > > > >> construct
> > > > > >> > > some
> > > > > >> > > > > formulas.
> > > > > >> > > > > To simplify the modeling, I'm going back to the case
> where
> > > > there
> > > > > >> is
> > > > > >> > > only
> > > > > >> > > > > ONE partition involved.
> > > > > >> > > > > Following the experiments in the Google Doc, let's say
> > > broker0
> > > > > >> > becomes
> > > > > >> > > > the
> > > > > >> > > > > follower at time t0,
> > > > > >> > > > > and after t0 there were still N produce requests in its
> > > > request
> > > > > >> > queue.
> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
> broker0
> > > can
> > > > > >> reply
> > > > > >> > > with
> > > > > >> > > > an
> > > > > >> > > > > NotLeaderForPartition exception,
> > > > > >> > > > > let's use M1 to denote the average processing time of
> > > replying
> > > > > >> with
> > > > > >> > > such
> > > > > >> > > > an
> > > > > >> > > > > error message.
> > > > > >> > > > > Without this KIP, the broker will need to append
> messages
> > to
> > > > > >> > segments,
> > > > > >> > > > > which may trigger a flush to disk,
> > > > > >> > > > > let's use M2 to denote the average processing time for
> > such
> > > > > logic.
> > > > > >> > > > > Then the average extra latency incurred without this KIP
> > is
> > > N
> > > > *
> > > > > >> (M2 -
> > > > > >> > > > M1) /
> > > > > >> > > > > 2.
> > > > > >> > > > >
> > > > > >> > > > > In practice, M2 should always be larger than M1, which
> > means
> > > > as
> > > > > >> long
> > > > > >> > > as N
> > > > > >> > > > > is positive,
> > > > > >> > > > > we would see improvements on the average latency.
> > > > > >> > > > > There does not need to be significant backlog of
> requests
> > in
> > > > the
> > > > > >> > > request
> > > > > >> > > > > queue,
> > > > > >> > > > > or severe degradation of disk performance to have the
> > > > > improvement.
> > > > > >> > > > >
> > > > > >> > > > > Regards,
> > > > > >> > > > > Lucas
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > [1] For instance, reducing the timeout on the producer
> > side
> > > > can
> > > > > >> > trigger
> > > > > >> > > > > unnecessary duplicate requests
> > > > > >> > > > > when the corresponding leader broker is overloaded,
> > > > exacerbating
> > > > > >> the
> > > > > >> > > > > situation.
> > > > > >> > > > >
> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > > lindong28@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hey Lucas,
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks much for the detailed documentation of the
> > > > experiment.
> > > > > >> > > > > >
> > > > > >> > > > > > Initially I also think having a separate queue for
> > > > controller
> > > > > >> > > requests
> > > > > >> > > > is
> > > > > >> > > > > > useful because, as you mentioned in the summary
> section
> > of
> > > > the
> > > > > >> > Google
> > > > > >> > > > > doc,
> > > > > >> > > > > > controller requests are generally more important than
> > data
> > > > > >> requests
> > > > > >> > > and
> > > > > >> > > > > we
> > > > > >> > > > > > probably want controller requests to be processed
> > sooner.
> > > > But
> > > > > >> then
> > > > > >> > > Eno
> > > > > >> > > > > has
> > > > > >> > > > > > two very good questions which I am not sure the Google
> > doc
> > > > has
> > > > > >> > > answered
> > > > > >> > > > > > explicitly. Could you help with the following
> questions?
> > > > > >> > > > > >
> > > > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > > > KIP-291
> > > > > to
> > > > > >> > > users.
> > > > > >> > > > > The
> > > > > >> > > > > > experiment setup in the Google doc simulates the
> > scenario
> > > > that
> > > > > >> > broker
> > > > > >> > > > is
> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow
> disk.
> > > It
> > > > > >> > currently
> > > > > >> > > > > > assumes that there is only 1 partition. But in the
> > common
> > > > > >> scenario,
> > > > > >> > > it
> > > > > >> > > > is
> > > > > >> > > > > > probably reasonable to assume that there are many
> other
> > > > > >> partitions
> > > > > >> > > that
> > > > > >> > > > > are
> > > > > >> > > > > > also actively produced to and ProduceRequest to these
> > > > > partition
> > > > > >> > also
> > > > > >> > > > > takes
> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > > > become
> > > > > >> > > follower
> > > > > >> > > > > for
> > > > > >> > > > > > the partition 0 soon, it probably still needs to
> process
> > > the
> > > > > >> > > > > ProduceRequest
> > > > > >> > > > > > slowly t in the queue because these ProduceRequests
> > cover
> > > > > other
> > > > > >> > > > > partitions.
> > > > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> > > seconds
> > > > > and
> > > > > >> > most
> > > > > >> > > > > > clients will still likely timeout after 30 seconds.
> Then
> > > it
> > > > is
> > > > > >> not
> > > > > >> > > > > > obviously what is the benefit to client since client
> > will
> > > > > >> timeout
> > > > > >> > > after
> > > > > >> > > > > 30
> > > > > >> > > > > > seconds before possibly re-connecting to broker1, with
> > or
> > > > > >> without
> > > > > >> > > > > KIP-291.
> > > > > >> > > > > > Did I miss something here?
> > > > > >> > > > > >
> > > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits
> of
> > > this
> > > > > >> KIP to
> > > > > >> > > > user
> > > > > >> > > > > or
> > > > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > > > average
> > > > > >> > > latency,
> > > > > >> > > > > > 999th percentile latency, probably of exception
> exposed
> > to
> > > > > >> client
> > > > > >> > > etc.
> > > > > >> > > > It
> > > > > >> > > > > > is probably useful to clarify this.
> > > > > >> > > > > >
> > > > > >> > > > > > 3) Does this KIP help improve user experience only
> when
> > > > there
> > > > > is
> > > > > >> > > issue
> > > > > >> > > > > with
> > > > > >> > > > > > broker, e.g. significant backlog in the request queue
> > due
> > > to
> > > > > >> slow
> > > > > >> > > disk
> > > > > >> > > > as
> > > > > >> > > > > > described in the Google doc? Or is this KIP also
> useful
> > > when
> > > > > >> there
> > > > > >> > is
> > > > > >> > > > no
> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> > > clarify
> > > > > >> this
> > > > > >> > to
> > > > > >> > > > > > understand the benefit of this KIP.
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks much,
> > > > > >> > > > > > Dong
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > > > >> lucasatucla@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi Eno,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Sorry for the delay in getting the experiment
> results.
> > > > > >> > > > > > > Here is a link to the positive impact achieved by
> > > > > implementing
> > > > > >> > the
> > > > > >> > > > > > proposed
> > > > > >> > > > > > > change:
> > > > > >> > > > > > > https://docs.google.com/document/d/
> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > >> > > > > > > Please take a look when you have time and let me
> know
> > > your
> > > > > >> > > feedback.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Regards,
> > > > > >> > > > > > > Lucas
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > > kafka@harsha.io>
> > > > > >> wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might
> suit
> > > our
> > > > > >> > > > requirements
> > > > > >> > > > > > > > better.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > > Harsha
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > > > >> > > > lucasatucla@gmail.com
> > > > > >> > > > > >
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Hi Harsha,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If I understand correctly, the replication quota
> > > > > mechanism
> > > > > >> > > > proposed
> > > > > >> > > > > > in
> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > > > >> > > > > > > > > Have you tried it out?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > > Lucas
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > > > kafka@harsha.io
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > > > One more question, any thoughts on making this
> > > > > >> configurable
> > > > > >> > > > > > > > > > and also allowing subset of data requests to
> be
> > > > > >> > prioritized.
> > > > > >> > > > For
> > > > > >> > > > > > > > example
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> > > broker
> > > > > and
> > > > > >> > bring
> > > > > >> > > > new
> > > > > >> > > > > > one
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > will try to become follower and have lot of
> > fetch
> > > > > >> requests
> > > > > >> > to
> > > > > >> > > > > other
> > > > > >> > > > > > > > > leaders
> > > > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > > > >> > > application/client
> > > > > >> > > > > > > > > requests.
> > > > > >> > > > > > > > > > We are also exploring the similar solution to
> > > > > >> de-prioritize
> > > > > >> > > if
> > > > > >> > > > a
> > > > > >> > > > > > new
> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> > > with
> > > > > the
> > > > > >> > > replica
> > > > > >> > > > > to
> > > > > >> > > > > > be
> > > > > >> > > > > > > > > > taking time but the leaders should prioritize
> > the
> > > > > client
> > > > > >> > > > > requests.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > Harsha
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi Eno,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet, so
> no
> > > > > >> > experimental
> > > > > >> > > > > > results
> > > > > >> > > > > > > > so
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > far.
> > > > > >> > > > > > > > > > > And I plan to test in out in the following
> > days.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > - You are absolutely right that the priority
> > > queue
> > > > > >> does
> > > > > >> > not
> > > > > >> > > > > > > > completely
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > prevent
> > > > > >> > > > > > > > > > > data requests being processed ahead of
> > > controller
> > > > > >> > requests.
> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
> > mitigate
> > > > the
> > > > > >> > effect
> > > > > >> > > > of
> > > > > >> > > > > > > stable
> > > > > >> > > > > > > > > > > metadata.
> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the
> > > results
> > > > > >> when I
> > > > > >> > > have
> > > > > >> > > > > it.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Regards,
> > > > > >> > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
> Thereska
> > <
> > > > > >> > > > > > > > eno.thereska@gmail.com
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Hi Lucas,
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
> > this.
> > > A
> > > > > >> couple
> > > > > >> > of
> > > > > >> > > > > > > > questions:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > > - did you notice any positive change after
> > > > > >> implementing
> > > > > >> > > > this
> > > > > >> > > > > > KIP?
> > > > > >> > > > > > > > > I'm
> > > > > >> > > > > > > > > > > > wondering if you have any experimental
> > results
> > > > > that
> > > > > >> > show
> > > > > >> > > > the
> > > > > >> > > > > > > > benefit
> > > > > >> > > > > > > > > of
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > two queues.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > > > addressing
> > > > > >> the
> > > > > >> > > > > problem
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > KIP
> > > > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> > > will
> > > > > >> > sometimes
> > > > > >> > > > > > > (often?)
> > > > > >> > > > > > > > > have
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > case that data plane requests will be
> ahead
> > of
> > > > the
> > > > > >> > > control
> > > > > >> > > > > > plane
> > > > > >> > > > > > > > > > > requests.
> > > > > >> > > > > > > > > > > > This happens because the system might have
> > > > already
> > > > > >> > > started
> > > > > >> > > > > > > > > processing
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > data plane requests before the control
> plane
> > > > ones
> > > > > >> > > arrived.
> > > > > >> > > > So
> > > > > >> > > > > > it
> > > > > >> > > > > > > > > would
> > > > > >> > > > > > > > > > > be
> > > > > >> > > > > > > > > > > > good to know what % of the problem this
> KIP
> > > > > >> addresses.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > Eno
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > > > >> > > > > yuzhihong@gmail.com
> > > > > >> > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Change looks good.
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas
> > Wang
> > > <
> > > > > >> > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
> updated
> > > the
> > > > > KIP.
> > > > > >> > > Please
> > > > > >> > > > > > take
> > > > > >> > > > > > > > > > another
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > look.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted
> Yu
> > <
> > > > > >> > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > It would be good if you can include
> > the
> > > > > >> default
> > > > > >> > > value
> > > > > >> > > > > for
> > > > > >> > > > > > > > this
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > new
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > config
> > > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
> Lucas
> > > > Wang
> > > > > <
> > > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a
> new
> > > > > config,
> > > > > >> > > > instead
> > > > > >> > > > > of
> > > > > >> > > > > > > > > reusing
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > existing one.
> > > > > >> > > > > > > > > > > > > > > > Please take another look when you
> > have
> > > > > time.
> > > > > >> > > > Thanks a
> > > > > >> > > > > > > lot!
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM,
> Ted
> > > Yu
> > > > <
> > > > > >> > > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource
> if
> > > > > control
> > > > > >> > > request
> > > > > >> > > > > > rate
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > > low
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > I don't know if control request
> > rate
> > > > can
> > > > > >> get
> > > > > >> > to
> > > > > >> > > > > > > 100,000,
> > > > > >> > > > > > > > > > > likely
> > > > > >> > > > > > > > > > > > > not.
> > > > > >> > > > > > > > > > > > > > > Then
> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> > > data
> > > > > >> > requests
> > > > > >> > > > > seems
> > > > > >> > > > > > > > high.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13
> PM,
> > > > Lucas
> > > > > >> Wang
> > > > > >> > <
> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at
> this
> > > > KIP.
> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > > > >> > > > > > "queued.max.requests"
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > > > > > cluster A
> > > > > >> > > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B
> > is
> > > > > >> 100,000.
> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> > > have
> > > > > >> > indicated
> > > > > >> > > > > that
> > > > > >> > > > > > > > > machines
> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > cluster
> > > > > >> > > > > > > > > > > > > > > > B
> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > > "queued.max.requests",
> > > > > >> the
> > > > > >> > > > > > > > > > > controlRequestQueue
> > > > > >> > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > cluster
> > > > > >> > > > > > > > > > > > > > > > > B
> > > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > > > explicitly
> > > > > >> > > > bothering
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > > > operators.
> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
> > argument
> > > > can
> > > > > be
> > > > > >> > that
> > > > > >> > > > > maybe
> > > > > >> > > > > > > > > that's
> > > > > >> > > > > > > > > > a
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > waste
> > > > > >> > > > > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may
> > want
> > > > to
> > > > > >> fine
> > > > > >> > > tune
> > > > > >> > > > > the
> > > > > >> > > > > > > > > > capacity
> > > > > >> > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach,
> and
> > > can
> > > > > >> change
> > > > > >> > > it
> > > > > >> > > > if
> > > > > >> > > > > > you
> > > > > >> > > > > > > > or
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > anyone
> > > > > >> > > > > > > > > > > > > > else
> > > > > >> > > > > > > > > > > > > > > > > feels
> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > > > config.
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11
> PM,
> > > Ted
> > > > > Yu
> > > > > >> <
> > > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives,
> > #2,
> > > > can
> > > > > >> you
> > > > > >> > > > > > elaborate
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > > bit
> > > > > >> > > > > > > > > > > more
> > > > > >> > > > > > > > > > > > > on
> > > > > >> > > > > > > > > > > > > > > why
> > > > > >> > > > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> > > impact
> > > > ?
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00
> > PM,
> > > > > Dong
> > > > > >> > Lin <
> > > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks
> > good
> > > > > >> overall.
> > > > > >> > > > Some
> > > > > >> > > > > > > > > comments
> > > > > >> > > > > > > > > > > > below:
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the
> > full
> > > > > mbean
> > > > > >> for
> > > > > >> > > the
> > > > > >> > > > > new
> > > > > >> > > > > > > > > metrics
> > > > > >> > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > KIP.
> > > > > >> > > > > > > > > > > > > > > > > Can
> > > > > >> > > > > > > > > > > > > > > > > > > you
> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > > > Interface
> > > > > >> > > section
> > > > > >> > > > > > > similar
> > > > > >> > > > > > > > > to
> > > > > >> > > > > > > > > > > > KIP-237
> > > > > >> > > > > > > > > > > > > > > > > > > > <
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > 237%3A+More+Controller+Health+
> > > > > >> Metrics>
> > > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow
> the
> > > same
> > > > > >> > pattern
> > > > > >> > > as
> > > > > >> > > > > > > KIP-153
> > > > > >> > > > > > > > > > > > > > > > > > > > <
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > >
> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > >> > > > > > > > > > > > > metric>,
> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > > > sensor
> > > > > >> name
> > > > > >> > > > > > > > > "BytesInPerSec"
> > > > > >> > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > add
> > > > > >> > > > > > > > > > > > > > a
> > > > > >> > > > > > > > > > > > > > > > new
> > > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > > >> > > > > > > > > > > > > > > > > > > >
> "ReplicationBytesInPerSec",
> > > > rather
> > > > > >> than
> > > > > >> > > > > > replacing
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > sensor
> > > > > >> > > > > > > > > > > > > > > name "
> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > > > >> > > > > "ClientBytesInPerSec".
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> > > changes
> > > > > the
> > > > > >> > > > semantics
> > > > > >> > > > > > of
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > broker
> > > > > >> > > > > > > > > > > > > > > config
> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
> > because
> > > > the
> > > > > >> > number
> > > > > >> > > of
> > > > > >> > > > > > total
> > > > > >> > > > > > > > > > > requests
> > > > > >> > > > > > > > > > > > > > queued
> > > > > >> > > > > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> > > bounded
> > > > > by
> > > > > >> > > > > > > > > > > "queued.max.requests".
> > > > > >> > > > > > > > > > > > > This
> > > > > >> > > > > > > > > > > > > > > > > > probably
> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in
> the
> > > > > Public
> > > > > >> > > > > Interfaces
> > > > > >> > > > > > > > > section
> > > > > >> > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> > 12:45
> > > > PM,
> > > > > >> Lucas
> > > > > >> > > > Wang
> > > > > >> > > > > <
> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to
> add a
> > > > > >> separate
> > > > > >> > > queue
> > > > > >> > > > > for
> > > > > >> > > > > > > > > > > controller
> > > > > >> > > > > > > > > > > > > > > > requests:
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> https://cwiki.apache.org/
> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > 291%
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > 3A+Have+separate+queues+for+
> > > > > >> > > > > > > > > > control+requests+and+data+
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > requests
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a
> look
> > > and
> > > > > >> let me
> > > > > >> > > > know
> > > > > >> > > > > > your
> > > > > >> > > > > > > > > > > feedback?
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your
> > time!
> > > > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

@Becket
1. Thanks for the comment. You are right that normally there should be just
one controller request because of muting,
and I had NOT intended to say there would be many enqueued controller
requests.
I went through the KIP again, and I'm not sure which part conveys that
info.
I'd be happy to revise if you point it out the section.

2. Though it should not happen in normal conditions, the current design
does not preclude multiple controllers running
at the same time, hence if we don't have the controller queue capacity
config and simply make its capacity to be 1,
network threads handling requests from different controllers will be
blocked during those troublesome times,
which is probably not what we want. On the other hand, adding the extra
config with a default value, say 20, guards us from issues in those
troublesome times, and IMO there isn't much downside of adding the extra
config.

@Mayuresh
Good catch, this sentence is an obsolete statement based on a previous
design. I've revised the wording in the KIP.

Thanks,
Lucas

On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
gharatmayuresh15@gmail.com> wrote:

> Hi Lucas,
>
> Thanks for the KIP.
> I am trying to understand why you think "The memory consumption can rise
> given the total number of queued requests can go up to 2x" in the impact
> section. Normally the requests from controller to a Broker are not high
> volume, right ?
>
>
> Thanks,
>
> Mayuresh
>
> On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the KIP, Lucas. Separating the control plane from the data
> plane
> > makes a lot of sense.
> >
> > In the KIP you mentioned that the controller request queue may have many
> > requests in it. Will this be a common case? The controller requests still
> > goes through the SocketServer. The SocketServer will mute the channel
> once
> > a request is read and put into the request channel. So assuming there is
> > only one connection between controller and each broker, on the broker
> side,
> > there should be only one controller request in the controller request
> queue
> > at any given time. If that is the case, do we need a separate controller
> > request queue capacity config? The default value 20 means that we expect
> > there are 20 controller switches to happen in a short period of time. I
> am
> > not sure whether someone should increase the controller request queue
> > capacity to handle such case, as it seems indicating something very wrong
> > has happened.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Thanks for the update Lucas.
> > >
> > > I think the motivation section is intuitive. It will be good to learn
> > more
> > > about the comments from other reviewers.
> > >
> > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > I've updated the motivation section of the KIP by explaining the
> cases
> > > that
> > > > would have user impacts.
> > > > Please take a look at let me know your comments.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Dong,
> > > > >
> > > > > The simulation of disk being slow is merely for me to easily
> > construct
> > > a
> > > > > testing scenario
> > > > > with a backlog of produce requests. In production, other than the
> > disk
> > > > > being slow, a backlog of
> > > > > produce requests may also be caused by high produce QPS.
> > > > > In that case, we may not want to kill the broker and that's when
> this
> > > KIP
> > > > > can be useful, both for JBOD
> > > > > and non-JBOD setup.
> > > > >
> > > > > Going back to your previous question about each ProduceRequest
> > covering
> > > > 20
> > > > > partitions that are randomly
> > > > > distributed, let's say a LeaderAndIsr request is enqueued that
> tries
> > to
> > > > > switch the current broker, say broker0, from leader to follower
> > > > > *for one of the partitions*, say *test-0*. For the sake of
> argument,
> > > > > let's also assume the other brokers, say broker1, have *stopped*
> > > fetching
> > > > > from
> > > > > the current broker, i.e. broker0.
> > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> > will
> > > be
> > > > > put into the purgatory,
> > > > >         and since they'll never be replicated to other brokers
> > (because
> > > > of
> > > > > the assumption made above), they will
> > > > >         be completed either when the LeaderAndISR request is
> > processed
> > > or
> > > > > when the timeout happens.
> > > > >   1.2 With this KIP, broker0 will immediately transition the
> > partition
> > > > > test-0 to become a follower,
> > > > >         after the current broker sees the replication of the
> > remaining
> > > 19
> > > > > partitions, it can send a response indicating that
> > > > >         it's no longer the leader for the "test-0".
> > > > >   To see the latency difference between 1.1 and 1.2, let's say
> there
> > > are
> > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> > > > threads,
> > > > >   so each io thread will process approximately 3000 produce
> requests.
> > > Now
> > > > > let's investigate the io thread that finally processed the
> > > LeaderAndISR.
> > > > >   For the 3000 produce requests, if we model the time when their
> > > > remaining
> > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > > request
> > > > is
> > > > > processed at time t3000.
> > > > >   Without this KIP, the 1st produce request would have waited an
> > extra
> > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 -
> > t1,
> > > > etc.
> > > > >   Roughly speaking, the latency difference is bigger for the
> earlier
> > > > > produce requests than for the later ones. For the same reason, the
> > more
> > > > > ProduceRequests queued
> > > > >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > > > > produce timeout).
> > > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > > >   There will be no latency differences in this case, but
> > > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> > local
> > > > log,
> > > > >         and eventually be truncated after processing the
> > LeaderAndISR.
> > > > > This is what's referred to as
> > > > >         "some unofficial definition of data loss in terms of
> messages
> > > > > beyond the high watermark".
> > > > >   2.2 with this KIP, we can mitigate the effect since if the
> > > LeaderAndISR
> > > > > is immediately processed, the response to producers will have
> > > > >         the NotLeaderForPartition error, causing producers to retry
> > > > >
> > > > > This explanation above is the benefit for reducing the latency of a
> > > > broker
> > > > > becoming the follower,
> > > > > closely related is reducing the latency of a broker becoming the
> > > leader.
> > > > > In this case, the benefit is even more obvious, if other brokers
> have
> > > > > resigned leadership, and the
> > > > > current broker should take leadership. Any delay in processing the
> > > > > LeaderAndISR will be perceived
> > > > > by clients as unavailability. In extreme cases, this can cause
> failed
> > > > > produce requests if the retries are
> > > > > exhausted.
> > > > >
> > > > > Another two types of controller requests are UpdateMetadata and
> > > > > StopReplica, which I'll briefly discuss as follows:
> > > > > For UpdateMetadata requests, delayed processing means clients
> > receiving
> > > > > stale metadata, e.g. with the wrong leadership info
> > > > > for certain partitions, and the effect is more retries or even
> fatal
> > > > > failure if the retries are exhausted.
> > > > >
> > > > > For StopReplica requests, a long queuing time may degrade the
> > > performance
> > > > > of topic deletion.
> > > > >
> > > > > Regarding your last question of the delay for
> DescribeLogDirsRequest,
> > > you
> > > > > are right
> > > > > that this KIP cannot help with the latency in getting the log dirs
> > > info,
> > > > > and it's only relevant
> > > > > when controller requests are involved.
> > > > >
> > > > > Regards,
> > > > > Lucas
> > > > >
> > > > >
> > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hey Jun,
> > > > >>
> > > > >> Thanks much for the comments. It is good point. So the feature may
> > be
> > > > >> useful for JBOD use-case. I have one question below.
> > > > >>
> > > > >> Hey Lucas,
> > > > >>
> > > > >> Do you think this feature is also useful for non-JBOD setup or it
> is
> > > > only
> > > > >> useful for the JBOD setup? It may be useful to understand this.
> > > > >>
> > > > >> When the broker is setup using JBOD, in order to move leaders on
> the
> > > > >> failed
> > > > >> disk to other disks, the system operator first needs to get the
> list
> > > of
> > > > >> partitions on the failed disk. This is currently achieved using
> > > > >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest
> to
> > > the
> > > > >> broker. If we only prioritize the controller requests, then the
> > > > >> DescribeLogDirsRequest
> > > > >> may still take a long time to be processed by the broker. So the
> > > overall
> > > > >> time to move leaders away from the failed disk may still be long
> > even
> > > > with
> > > > >> this KIP. What do you think?
> > > > >>
> > > > >> Thanks,
> > > > >> Dong
> > > > >>
> > > > >>
> > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Thanks for the insightful comment, Jun.
> > > > >> >
> > > > >> > @Dong,
> > > > >> > Since both of the two comments in your previous email are about
> > the
> > > > >> > benefits of this KIP and whether it's useful,
> > > > >> > in light of Jun's last comment, do you agree that this KIP can
> be
> > > > >> > beneficial in the case mentioned by Jun?
> > > > >> > Please let me know, thanks!
> > > > >> >
> > > > >> > Regards,
> > > > >> > Lucas
> > > > >> >
> > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > >> >
> > > > >> > > Hi, Lucas, Dong,
> > > > >> > >
> > > > >> > > If all disks on a broker are slow, one probably should just
> kill
> > > the
> > > > >> > > broker. In that case, this KIP may not help. If only one of
> the
> > > > disks
> > > > >> on
> > > > >> > a
> > > > >> > > broker is slow, one may want to fail that disk and move the
> > > leaders
> > > > on
> > > > >> > that
> > > > >> > > disk to other brokers. In that case, being able to process the
> > > > >> > LeaderAndIsr
> > > > >> > > requests faster will potentially help the producers recover
> > > quicker.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Jun
> > > > >> > >
> > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <lindong28@gmail.com
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > Hey Lucas,
> > > > >> > > >
> > > > >> > > > Thanks for the reply. Some follow up questions below.
> > > > >> > > >
> > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions
> that
> > > are
> > > > >> > > randomly
> > > > >> > > > distributed across all partitions, then each ProduceRequest
> > will
> > > > >> likely
> > > > >> > > > cover some partitions for which the broker is still leader
> > after
> > > > it
> > > > >> > > quickly
> > > > >> > > > processes the
> > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > > processing
> > > > >> these
> > > > >> > > > ProduceRequest and request will still be very high with this
> > > KIP.
> > > > It
> > > > >> > > seems
> > > > >> > > > that most ProduceRequest will still timeout after 30
> seconds.
> > Is
> > > > >> this
> > > > >> > > > understanding correct?
> > > > >> > > >
> > > > >> > > > Regarding 2, if most ProduceRequest will still timeout after
> > 30
> > > > >> > seconds,
> > > > >> > > > then it is less clear how this KIP reduces average produce
> > > > latency.
> > > > >> Can
> > > > >> > > you
> > > > >> > > > clarify what metrics can be improved by this KIP?
> > > > >> > > >
> > > > >> > > > Not sure why system operator directly cares number of
> > truncated
> > > > >> > messages.
> > > > >> > > > Do you mean this KIP can improve average throughput or
> reduce
> > > > >> message
> > > > >> > > > duplication? It will be good to understand this.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dong
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > Hi Dong,
> > > > >> > > > >
> > > > >> > > > > Thanks for your valuable comments. Please see my reply
> > below.
> > > > >> > > > >
> > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> > consider
> > > a
> > > > >> more
> > > > >> > > > common
> > > > >> > > > > scenario
> > > > >> > > > > where broker0 is the leader of many partitions. And let's
> > say
> > > > for
> > > > >> > some
> > > > >> > > > > reason its IO becomes slow.
> > > > >> > > > > The number of leader partitions on broker0 is so large,
> say
> > > 10K,
> > > > >> that
> > > > >> > > the
> > > > >> > > > > cluster is skewed,
> > > > >> > > > > and the operator would like to shift the leadership for a
> > lot
> > > of
> > > > >> > > > > partitions, say 9K, to other brokers,
> > > > >> > > > > either manually or through some service like cruise
> control.
> > > > >> > > > > With this KIP, not only will the leadership transitions
> > finish
> > > > >> more
> > > > >> > > > > quickly, helping the cluster itself becoming more
> balanced,
> > > > >> > > > > but all existing producers corresponding to the 9K
> > partitions
> > > > will
> > > > >> > get
> > > > >> > > > the
> > > > >> > > > > errors relatively quickly
> > > > >> > > > > rather than relying on their timeout, thanks to the
> batched
> > > > async
> > > > >> ZK
> > > > >> > > > > operations.
> > > > >> > > > > To me it's a useful feature to have during such
> troublesome
> > > > times.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > 2. The experiments in the Google Doc have shown that with
> > this
> > > > KIP
> > > > >> > many
> > > > >> > > > > producers
> > > > >> > > > > receive an explicit error NotLeaderForPartition, based on
> > > which
> > > > >> they
> > > > >> > > > retry
> > > > >> > > > > immediately.
> > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for their
> > > single
> > > > >> > > message
> > > > >> > > > is
> > > > >> > > > > much smaller
> > > > >> > > > > compared with the case of timing out without the KIP (30
> > > seconds
> > > > >> for
> > > > >> > > > timing
> > > > >> > > > > out + quick retry).
> > > > >> > > > > One might argue that reducing the timing out on the
> producer
> > > > side
> > > > >> can
> > > > >> > > > > achieve the same result,
> > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > > >> > > > >
> > > > >> > > > > Also *IF* there were a metric to show the number of
> > truncated
> > > > >> > messages
> > > > >> > > on
> > > > >> > > > > brokers,
> > > > >> > > > > with the experiments done in the Google Doc, it should be
> > easy
> > > > to
> > > > >> see
> > > > >> > > > that
> > > > >> > > > > a lot fewer messages need
> > > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > > avoids
> > > > >> > > appending
> > > > >> > > > > of messages
> > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > > operator
> > > > >> and
> > > > >> > ask
> > > > >> > > > > whether
> > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
> answer
> > > is
> > > > >> yes.
> > > > >> > > > >
> > > > >> > > > > 3. To answer your question, I think it might be helpful to
> > > > >> construct
> > > > >> > > some
> > > > >> > > > > formulas.
> > > > >> > > > > To simplify the modeling, I'm going back to the case where
> > > there
> > > > >> is
> > > > >> > > only
> > > > >> > > > > ONE partition involved.
> > > > >> > > > > Following the experiments in the Google Doc, let's say
> > broker0
> > > > >> > becomes
> > > > >> > > > the
> > > > >> > > > > follower at time t0,
> > > > >> > > > > and after t0 there were still N produce requests in its
> > > request
> > > > >> > queue.
> > > > >> > > > > With the up-to-date metadata brought by this KIP, broker0
> > can
> > > > >> reply
> > > > >> > > with
> > > > >> > > > an
> > > > >> > > > > NotLeaderForPartition exception,
> > > > >> > > > > let's use M1 to denote the average processing time of
> > replying
> > > > >> with
> > > > >> > > such
> > > > >> > > > an
> > > > >> > > > > error message.
> > > > >> > > > > Without this KIP, the broker will need to append messages
> to
> > > > >> > segments,
> > > > >> > > > > which may trigger a flush to disk,
> > > > >> > > > > let's use M2 to denote the average processing time for
> such
> > > > logic.
> > > > >> > > > > Then the average extra latency incurred without this KIP
> is
> > N
> > > *
> > > > >> (M2 -
> > > > >> > > > M1) /
> > > > >> > > > > 2.
> > > > >> > > > >
> > > > >> > > > > In practice, M2 should always be larger than M1, which
> means
> > > as
> > > > >> long
> > > > >> > > as N
> > > > >> > > > > is positive,
> > > > >> > > > > we would see improvements on the average latency.
> > > > >> > > > > There does not need to be significant backlog of requests
> in
> > > the
> > > > >> > > request
> > > > >> > > > > queue,
> > > > >> > > > > or severe degradation of disk performance to have the
> > > > improvement.
> > > > >> > > > >
> > > > >> > > > > Regards,
> > > > >> > > > > Lucas
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > [1] For instance, reducing the timeout on the producer
> side
> > > can
> > > > >> > trigger
> > > > >> > > > > unnecessary duplicate requests
> > > > >> > > > > when the corresponding leader broker is overloaded,
> > > exacerbating
> > > > >> the
> > > > >> > > > > situation.
> > > > >> > > > >
> > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> > lindong28@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hey Lucas,
> > > > >> > > > > >
> > > > >> > > > > > Thanks much for the detailed documentation of the
> > > experiment.
> > > > >> > > > > >
> > > > >> > > > > > Initially I also think having a separate queue for
> > > controller
> > > > >> > > requests
> > > > >> > > > is
> > > > >> > > > > > useful because, as you mentioned in the summary section
> of
> > > the
> > > > >> > Google
> > > > >> > > > > doc,
> > > > >> > > > > > controller requests are generally more important than
> data
> > > > >> requests
> > > > >> > > and
> > > > >> > > > > we
> > > > >> > > > > > probably want controller requests to be processed
> sooner.
> > > But
> > > > >> then
> > > > >> > > Eno
> > > > >> > > > > has
> > > > >> > > > > > two very good questions which I am not sure the Google
> doc
> > > has
> > > > >> > > answered
> > > > >> > > > > > explicitly. Could you help with the following questions?
> > > > >> > > > > >
> > > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > > KIP-291
> > > > to
> > > > >> > > users.
> > > > >> > > > > The
> > > > >> > > > > > experiment setup in the Google doc simulates the
> scenario
> > > that
> > > > >> > broker
> > > > >> > > > is
> > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow disk.
> > It
> > > > >> > currently
> > > > >> > > > > > assumes that there is only 1 partition. But in the
> common
> > > > >> scenario,
> > > > >> > > it
> > > > >> > > > is
> > > > >> > > > > > probably reasonable to assume that there are many other
> > > > >> partitions
> > > > >> > > that
> > > > >> > > > > are
> > > > >> > > > > > also actively produced to and ProduceRequest to these
> > > > partition
> > > > >> > also
> > > > >> > > > > takes
> > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > > become
> > > > >> > > follower
> > > > >> > > > > for
> > > > >> > > > > > the partition 0 soon, it probably still needs to process
> > the
> > > > >> > > > > ProduceRequest
> > > > >> > > > > > slowly t in the queue because these ProduceRequests
> cover
> > > > other
> > > > >> > > > > partitions.
> > > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> > seconds
> > > > and
> > > > >> > most
> > > > >> > > > > > clients will still likely timeout after 30 seconds. Then
> > it
> > > is
> > > > >> not
> > > > >> > > > > > obviously what is the benefit to client since client
> will
> > > > >> timeout
> > > > >> > > after
> > > > >> > > > > 30
> > > > >> > > > > > seconds before possibly re-connecting to broker1, with
> or
> > > > >> without
> > > > >> > > > > KIP-291.
> > > > >> > > > > > Did I miss something here?
> > > > >> > > > > >
> > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits of
> > this
> > > > >> KIP to
> > > > >> > > > user
> > > > >> > > > > or
> > > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > > average
> > > > >> > > latency,
> > > > >> > > > > > 999th percentile latency, probably of exception exposed
> to
> > > > >> client
> > > > >> > > etc.
> > > > >> > > > It
> > > > >> > > > > > is probably useful to clarify this.
> > > > >> > > > > >
> > > > >> > > > > > 3) Does this KIP help improve user experience only when
> > > there
> > > > is
> > > > >> > > issue
> > > > >> > > > > with
> > > > >> > > > > > broker, e.g. significant backlog in the request queue
> due
> > to
> > > > >> slow
> > > > >> > > disk
> > > > >> > > > as
> > > > >> > > > > > described in the Google doc? Or is this KIP also useful
> > when
> > > > >> there
> > > > >> > is
> > > > >> > > > no
> > > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> > clarify
> > > > >> this
> > > > >> > to
> > > > >> > > > > > understand the benefit of this KIP.
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Thanks much,
> > > > >> > > > > > Dong
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > > >> lucasatucla@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Eno,
> > > > >> > > > > > >
> > > > >> > > > > > > Sorry for the delay in getting the experiment results.
> > > > >> > > > > > > Here is a link to the positive impact achieved by
> > > > implementing
> > > > >> > the
> > > > >> > > > > > proposed
> > > > >> > > > > > > change:
> > > > >> > > > > > > https://docs.google.com/document/d/
> > > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > >> > > > > > > Please take a look when you have time and let me know
> > your
> > > > >> > > feedback.
> > > > >> > > > > > >
> > > > >> > > > > > > Regards,
> > > > >> > > > > > > Lucas
> > > > >> > > > > > >
> > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> > kafka@harsha.io>
> > > > >> wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Thanks for the pointer. Will take a look might suit
> > our
> > > > >> > > > requirements
> > > > >> > > > > > > > better.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Harsha
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > > >> > > > lucasatucla@gmail.com
> > > > >> > > > > >
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Hi Harsha,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > If I understand correctly, the replication quota
> > > > mechanism
> > > > >> > > > proposed
> > > > >> > > > > > in
> > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > > >> > > > > > > > > Have you tried it out?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > > Lucas
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > > kafka@harsha.io
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Hi Lucas,
> > > > >> > > > > > > > > > One more question, any thoughts on making this
> > > > >> configurable
> > > > >> > > > > > > > > > and also allowing subset of data requests to be
> > > > >> > prioritized.
> > > > >> > > > For
> > > > >> > > > > > > > example
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> > broker
> > > > and
> > > > >> > bring
> > > > >> > > > new
> > > > >> > > > > > one
> > > > >> > > > > > > > it
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > will try to become follower and have lot of
> fetch
> > > > >> requests
> > > > >> > to
> > > > >> > > > > other
> > > > >> > > > > > > > > leaders
> > > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > > >> > > application/client
> > > > >> > > > > > > > > requests.
> > > > >> > > > > > > > > > We are also exploring the similar solution to
> > > > >> de-prioritize
> > > > >> > > if
> > > > >> > > > a
> > > > >> > > > > > new
> > > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> > with
> > > > the
> > > > >> > > replica
> > > > >> > > > > to
> > > > >> > > > > > be
> > > > >> > > > > > > > > > taking time but the leaders should prioritize
> the
> > > > client
> > > > >> > > > > requests.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > Harsha
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Hi Eno,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Sorry for the delayed response.
> > > > >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > > > >> > experimental
> > > > >> > > > > > results
> > > > >> > > > > > > > so
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > far.
> > > > >> > > > > > > > > > > And I plan to test in out in the following
> days.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > - You are absolutely right that the priority
> > queue
> > > > >> does
> > > > >> > not
> > > > >> > > > > > > > completely
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > prevent
> > > > >> > > > > > > > > > > data requests being processed ahead of
> > controller
> > > > >> > requests.
> > > > >> > > > > > > > > > > That being said, I expect it to greatly
> mitigate
> > > the
> > > > >> > effect
> > > > >> > > > of
> > > > >> > > > > > > stable
> > > > >> > > > > > > > > > > metadata.
> > > > >> > > > > > > > > > > In any case, I'll try it out and post the
> > results
> > > > >> when I
> > > > >> > > have
> > > > >> > > > > it.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Regards,
> > > > >> > > > > > > > > > > Lucas
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska
> <
> > > > >> > > > > > > > eno.thereska@gmail.com
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > Hi Lucas,
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
> this.
> > A
> > > > >> couple
> > > > >> > of
> > > > >> > > > > > > > questions:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > > - did you notice any positive change after
> > > > >> implementing
> > > > >> > > > this
> > > > >> > > > > > KIP?
> > > > >> > > > > > > > > I'm
> > > > >> > > > > > > > > > > > wondering if you have any experimental
> results
> > > > that
> > > > >> > show
> > > > >> > > > the
> > > > >> > > > > > > > benefit
> > > > >> > > > > > > > > of
> > > > >> > > > > > > > > > > the
> > > > >> > > > > > > > > > > > two queues.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > > addressing
> > > > >> the
> > > > >> > > > > problem
> > > > >> > > > > > > the
> > > > >> > > > > > > > > KIP
> > > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> > will
> > > > >> > sometimes
> > > > >> > > > > > > (often?)
> > > > >> > > > > > > > > have
> > > > >> > > > > > > > > > > the
> > > > >> > > > > > > > > > > > case that data plane requests will be ahead
> of
> > > the
> > > > >> > > control
> > > > >> > > > > > plane
> > > > >> > > > > > > > > > > requests.
> > > > >> > > > > > > > > > > > This happens because the system might have
> > > already
> > > > >> > > started
> > > > >> > > > > > > > > processing
> > > > >> > > > > > > > > > > the
> > > > >> > > > > > > > > > > > data plane requests before the control plane
> > > ones
> > > > >> > > arrived.
> > > > >> > > > So
> > > > >> > > > > > it
> > > > >> > > > > > > > > would
> > > > >> > > > > > > > > > > be
> > > > >> > > > > > > > > > > > good to know what % of the problem this KIP
> > > > >> addresses.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks
> > > > >> > > > > > > > > > > > Eno
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > > >> > > > > yuzhihong@gmail.com
> > > > >> > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Change looks good.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Thanks
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas
> Wang
> > <
> > > > >> > > > > > > > lucasatucla@gmail.com
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Hi Ted,
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated
> > the
> > > > KIP.
> > > > >> > > Please
> > > > >> > > > > > take
> > > > >> > > > > > > > > > another
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > look.
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Lucas
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu
> <
> > > > >> > > > > > > yuzhihong@gmail.com
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > It would be good if you can include
> the
> > > > >> default
> > > > >> > > value
> > > > >> > > > > for
> > > > >> > > > > > > > this
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > new
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > config
> > > > >> > > > > > > > > > > > > > > in the KIP.
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Thanks
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas
> > > Wang
> > > > <
> > > > >> > > > > > > > > > lucasatucla@gmail.com
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> > > > config,
> > > > >> > > > instead
> > > > >> > > > > of
> > > > >> > > > > > > > > reusing
> > > > >> > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > existing one.
> > > > >> > > > > > > > > > > > > > > > Please take another look when you
> have
> > > > time.
> > > > >> > > > Thanks a
> > > > >> > > > > > > lot!
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Lucas
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted
> > Yu
> > > <
> > > > >> > > > > > > > yuzhihong@gmail.com
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> > > > control
> > > > >> > > request
> > > > >> > > > > > rate
> > > > >> > > > > > > is
> > > > >> > > > > > > > > low
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > I don't know if control request
> rate
> > > can
> > > > >> get
> > > > >> > to
> > > > >> > > > > > > 100,000,
> > > > >> > > > > > > > > > > likely
> > > > >> > > > > > > > > > > > > not.
> > > > >> > > > > > > > > > > > > > > Then
> > > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> > data
> > > > >> > requests
> > > > >> > > > > seems
> > > > >> > > > > > > > high.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM,
> > > Lucas
> > > > >> Wang
> > > > >> > <
> > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this
> > > KIP.
> > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > > >> > > > > > "queued.max.requests"
> > > > >> > > > > > > in
> > > > >> > > > > > > > > > > > cluster A
> > > > >> > > > > > > > > > > > > > is
> > > > >> > > > > > > > > > > > > > > > > 1000,
> > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B
> is
> > > > >> 100,000.
> > > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> > have
> > > > >> > indicated
> > > > >> > > > > that
> > > > >> > > > > > > > > machines
> > > > >> > > > > > > > > > > in
> > > > >> > > > > > > > > > > > > > > cluster
> > > > >> > > > > > > > > > > > > > > > B
> > > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > By reusing the
> > > "queued.max.requests",
> > > > >> the
> > > > >> > > > > > > > > > > controlRequestQueue
> > > > >> > > > > > > > > > > > in
> > > > >> > > > > > > > > > > > > > > > cluster
> > > > >> > > > > > > > > > > > > > > > > B
> > > > >> > > > > > > > > > > > > > > > > > automatically
> > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > > explicitly
> > > > >> > > > bothering
> > > > >> > > > > > the
> > > > >> > > > > > > > > > > > operators.
> > > > >> > > > > > > > > > > > > > > > > > I understand the counter
> argument
> > > can
> > > > be
> > > > >> > that
> > > > >> > > > > maybe
> > > > >> > > > > > > > > that's
> > > > >> > > > > > > > > > a
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > waste
> > > > >> > > > > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may
> want
> > > to
> > > > >> fine
> > > > >> > > tune
> > > > >> > > > > the
> > > > >> > > > > > > > > > capacity
> > > > >> > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and
> > can
> > > > >> change
> > > > >> > > it
> > > > >> > > > if
> > > > >> > > > > > you
> > > > >> > > > > > > > or
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > anyone
> > > > >> > > > > > > > > > > > > > else
> > > > >> > > > > > > > > > > > > > > > > feels
> > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > > config.
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > > > > > > > Lucas
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM,
> > Ted
> > > > Yu
> > > > >> <
> > > > >> > > > > > > > > > yuzhihong@gmail.com
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives,
> #2,
> > > can
> > > > >> you
> > > > >> > > > > > elaborate
> > > > >> > > > > > > a
> > > > >> > > > > > > > > bit
> > > > >> > > > > > > > > > > more
> > > > >> > > > > > > > > > > > > on
> > > > >> > > > > > > > > > > > > > > why
> > > > >> > > > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> > impact
> > > ?
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00
> PM,
> > > > Dong
> > > > >> > Lin <
> > > > >> > > > > > > > > > > > lindong28@gmail.com
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks
> good
> > > > >> overall.
> > > > >> > > > Some
> > > > >> > > > > > > > > comments
> > > > >> > > > > > > > > > > > below:
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the
> full
> > > > mbean
> > > > >> for
> > > > >> > > the
> > > > >> > > > > new
> > > > >> > > > > > > > > metrics
> > > > >> > > > > > > > > > > in
> > > > >> > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > KIP.
> > > > >> > > > > > > > > > > > > > > > > Can
> > > > >> > > > > > > > > > > > > > > > > > > you
> > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > > Interface
> > > > >> > > section
> > > > >> > > > > > > similar
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > > KIP-237
> > > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> 237%3A+More+Controller+Health+
> > > > >> Metrics>
> > > > >> > > > > > > > > > > > > > > > > > > > ?
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the
> > same
> > > > >> > pattern
> > > > >> > > as
> > > > >> > > > > > > KIP-153
> > > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > >
> > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > >> > > > > > > > > > > > > metric>,
> > > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > > sensor
> > > > >> name
> > > > >> > > > > > > > > "BytesInPerSec"
> > > > >> > > > > > > > > > > and
> > > > >> > > > > > > > > > > > > add
> > > > >> > > > > > > > > > > > > > a
> > > > >> > > > > > > > > > > > > > > > new
> > > > >> > > > > > > > > > > > > > > > > > > sensor
> > > > >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
> > > rather
> > > > >> than
> > > > >> > > > > > replacing
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > sensor
> > > > >> > > > > > > > > > > > > > > name "
> > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > > >> > > > > "ClientBytesInPerSec".
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> > changes
> > > > the
> > > > >> > > > semantics
> > > > >> > > > > > of
> > > > >> > > > > > > > the
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > > broker
> > > > >> > > > > > > > > > > > > > > config
> > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
> because
> > > the
> > > > >> > number
> > > > >> > > of
> > > > >> > > > > > total
> > > > >> > > > > > > > > > > requests
> > > > >> > > > > > > > > > > > > > queued
> > > > >> > > > > > > > > > > > > > > > in
> > > > >> > > > > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> > bounded
> > > > by
> > > > >> > > > > > > > > > > "queued.max.requests".
> > > > >> > > > > > > > > > > > > This
> > > > >> > > > > > > > > > > > > > > > > > probably
> > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> > > > Public
> > > > >> > > > > Interfaces
> > > > >> > > > > > > > > section
> > > > >> > > > > > > > > > > for
> > > > >> > > > > > > > > > > > > > > > > discussion.
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
> 12:45
> > > PM,
> > > > >> Lucas
> > > > >> > > > Wang
> > > > >> > > > > <
> > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> > > > >> separate
> > > > >> > > queue
> > > > >> > > > > for
> > > > >> > > > > > > > > > > controller
> > > > >> > > > > > > > > > > > > > > > requests:
> > > > >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > 291%
> > > > >> > > > > > > > > > > > > > > > > > > > >
> 3A+Have+separate+queues+for+
> > > > >> > > > > > > > > > control+requests+and+data+
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > requests
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a look
> > and
> > > > >> let me
> > > > >> > > > know
> > > > >> > > > > > your
> > > > >> > > > > > > > > > > feedback?
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your
> time!
> > > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Mayuresh Gharat <gh...@gmail.com>.

Hi Lucas,

Thanks for the KIP.
I am trying to understand why you think "The memory consumption can rise
given the total number of queued requests can go up to 2x" in the impact
section. Normally the requests from controller to a Broker are not high
volume, right ?


Thanks,

Mayuresh

On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <be...@gmail.com> wrote:

> Thanks for the KIP, Lucas. Separating the control plane from the data plane
> makes a lot of sense.
>
> In the KIP you mentioned that the controller request queue may have many
> requests in it. Will this be a common case? The controller requests still
> goes through the SocketServer. The SocketServer will mute the channel once
> a request is read and put into the request channel. So assuming there is
> only one connection between controller and each broker, on the broker side,
> there should be only one controller request in the controller request queue
> at any given time. If that is the case, do we need a separate controller
> request queue capacity config? The default value 20 means that we expect
> there are 20 controller switches to happen in a short period of time. I am
> not sure whether someone should increase the controller request queue
> capacity to handle such case, as it seems indicating something very wrong
> has happened.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Thanks for the update Lucas.
> >
> > I think the motivation section is intuitive. It will be good to learn
> more
> > about the comments from other reviewers.
> >
> > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Dong,
> > >
> > > I've updated the motivation section of the KIP by explaining the cases
> > that
> > > would have user impacts.
> > > Please take a look at let me know your comments.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > The simulation of disk being slow is merely for me to easily
> construct
> > a
> > > > testing scenario
> > > > with a backlog of produce requests. In production, other than the
> disk
> > > > being slow, a backlog of
> > > > produce requests may also be caused by high produce QPS.
> > > > In that case, we may not want to kill the broker and that's when this
> > KIP
> > > > can be useful, both for JBOD
> > > > and non-JBOD setup.
> > > >
> > > > Going back to your previous question about each ProduceRequest
> covering
> > > 20
> > > > partitions that are randomly
> > > > distributed, let's say a LeaderAndIsr request is enqueued that tries
> to
> > > > switch the current broker, say broker0, from leader to follower
> > > > *for one of the partitions*, say *test-0*. For the sake of argument,
> > > > let's also assume the other brokers, say broker1, have *stopped*
> > fetching
> > > > from
> > > > the current broker, i.e. broker0.
> > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> will
> > be
> > > > put into the purgatory,
> > > >         and since they'll never be replicated to other brokers
> (because
> > > of
> > > > the assumption made above), they will
> > > >         be completed either when the LeaderAndISR request is
> processed
> > or
> > > > when the timeout happens.
> > > >   1.2 With this KIP, broker0 will immediately transition the
> partition
> > > > test-0 to become a follower,
> > > >         after the current broker sees the replication of the
> remaining
> > 19
> > > > partitions, it can send a response indicating that
> > > >         it's no longer the leader for the "test-0".
> > > >   To see the latency difference between 1.1 and 1.2, let's say there
> > are
> > > > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> > > threads,
> > > >   so each io thread will process approximately 3000 produce requests.
> > Now
> > > > let's investigate the io thread that finally processed the
> > LeaderAndISR.
> > > >   For the 3000 produce requests, if we model the time when their
> > > remaining
> > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > request
> > > is
> > > > processed at time t3000.
> > > >   Without this KIP, the 1st produce request would have waited an
> extra
> > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 -
> t1,
> > > etc.
> > > >   Roughly speaking, the latency difference is bigger for the earlier
> > > > produce requests than for the later ones. For the same reason, the
> more
> > > > ProduceRequests queued
> > > >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > > > produce timeout).
> > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > >   There will be no latency differences in this case, but
> > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> local
> > > log,
> > > >         and eventually be truncated after processing the
> LeaderAndISR.
> > > > This is what's referred to as
> > > >         "some unofficial definition of data loss in terms of messages
> > > > beyond the high watermark".
> > > >   2.2 with this KIP, we can mitigate the effect since if the
> > LeaderAndISR
> > > > is immediately processed, the response to producers will have
> > > >         the NotLeaderForPartition error, causing producers to retry
> > > >
> > > > This explanation above is the benefit for reducing the latency of a
> > > broker
> > > > becoming the follower,
> > > > closely related is reducing the latency of a broker becoming the
> > leader.
> > > > In this case, the benefit is even more obvious, if other brokers have
> > > > resigned leadership, and the
> > > > current broker should take leadership. Any delay in processing the
> > > > LeaderAndISR will be perceived
> > > > by clients as unavailability. In extreme cases, this can cause failed
> > > > produce requests if the retries are
> > > > exhausted.
> > > >
> > > > Another two types of controller requests are UpdateMetadata and
> > > > StopReplica, which I'll briefly discuss as follows:
> > > > For UpdateMetadata requests, delayed processing means clients
> receiving
> > > > stale metadata, e.g. with the wrong leadership info
> > > > for certain partitions, and the effect is more retries or even fatal
> > > > failure if the retries are exhausted.
> > > >
> > > > For StopReplica requests, a long queuing time may degrade the
> > performance
> > > > of topic deletion.
> > > >
> > > > Regarding your last question of the delay for DescribeLogDirsRequest,
> > you
> > > > are right
> > > > that this KIP cannot help with the latency in getting the log dirs
> > info,
> > > > and it's only relevant
> > > > when controller requests are involved.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > >
> > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com>
> wrote:
> > > >
> > > >> Hey Jun,
> > > >>
> > > >> Thanks much for the comments. It is good point. So the feature may
> be
> > > >> useful for JBOD use-case. I have one question below.
> > > >>
> > > >> Hey Lucas,
> > > >>
> > > >> Do you think this feature is also useful for non-JBOD setup or it is
> > > only
> > > >> useful for the JBOD setup? It may be useful to understand this.
> > > >>
> > > >> When the broker is setup using JBOD, in order to move leaders on the
> > > >> failed
> > > >> disk to other disks, the system operator first needs to get the list
> > of
> > > >> partitions on the failed disk. This is currently achieved using
> > > >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to
> > the
> > > >> broker. If we only prioritize the controller requests, then the
> > > >> DescribeLogDirsRequest
> > > >> may still take a long time to be processed by the broker. So the
> > overall
> > > >> time to move leaders away from the failed disk may still be long
> even
> > > with
> > > >> this KIP. What do you think?
> > > >>
> > > >> Thanks,
> > > >> Dong
> > > >>
> > > >>
> > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Thanks for the insightful comment, Jun.
> > > >> >
> > > >> > @Dong,
> > > >> > Since both of the two comments in your previous email are about
> the
> > > >> > benefits of this KIP and whether it's useful,
> > > >> > in light of Jun's last comment, do you agree that this KIP can be
> > > >> > beneficial in the case mentioned by Jun?
> > > >> > Please let me know, thanks!
> > > >> >
> > > >> > Regards,
> > > >> > Lucas
> > > >> >
> > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >> >
> > > >> > > Hi, Lucas, Dong,
> > > >> > >
> > > >> > > If all disks on a broker are slow, one probably should just kill
> > the
> > > >> > > broker. In that case, this KIP may not help. If only one of the
> > > disks
> > > >> on
> > > >> > a
> > > >> > > broker is slow, one may want to fail that disk and move the
> > leaders
> > > on
> > > >> > that
> > > >> > > disk to other brokers. In that case, being able to process the
> > > >> > LeaderAndIsr
> > > >> > > requests faster will potentially help the producers recover
> > quicker.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jun
> > > >> > >
> > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > Hey Lucas,
> > > >> > > >
> > > >> > > > Thanks for the reply. Some follow up questions below.
> > > >> > > >
> > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that
> > are
> > > >> > > randomly
> > > >> > > > distributed across all partitions, then each ProduceRequest
> will
> > > >> likely
> > > >> > > > cover some partitions for which the broker is still leader
> after
> > > it
> > > >> > > quickly
> > > >> > > > processes the
> > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > processing
> > > >> these
> > > >> > > > ProduceRequest and request will still be very high with this
> > KIP.
> > > It
> > > >> > > seems
> > > >> > > > that most ProduceRequest will still timeout after 30 seconds.
> Is
> > > >> this
> > > >> > > > understanding correct?
> > > >> > > >
> > > >> > > > Regarding 2, if most ProduceRequest will still timeout after
> 30
> > > >> > seconds,
> > > >> > > > then it is less clear how this KIP reduces average produce
> > > latency.
> > > >> Can
> > > >> > > you
> > > >> > > > clarify what metrics can be improved by this KIP?
> > > >> > > >
> > > >> > > > Not sure why system operator directly cares number of
> truncated
> > > >> > messages.
> > > >> > > > Do you mean this KIP can improve average throughput or reduce
> > > >> message
> > > >> > > > duplication? It will be good to understand this.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dong
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Hi Dong,
> > > >> > > > >
> > > >> > > > > Thanks for your valuable comments. Please see my reply
> below.
> > > >> > > > >
> > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> consider
> > a
> > > >> more
> > > >> > > > common
> > > >> > > > > scenario
> > > >> > > > > where broker0 is the leader of many partitions. And let's
> say
> > > for
> > > >> > some
> > > >> > > > > reason its IO becomes slow.
> > > >> > > > > The number of leader partitions on broker0 is so large, say
> > 10K,
> > > >> that
> > > >> > > the
> > > >> > > > > cluster is skewed,
> > > >> > > > > and the operator would like to shift the leadership for a
> lot
> > of
> > > >> > > > > partitions, say 9K, to other brokers,
> > > >> > > > > either manually or through some service like cruise control.
> > > >> > > > > With this KIP, not only will the leadership transitions
> finish
> > > >> more
> > > >> > > > > quickly, helping the cluster itself becoming more balanced,
> > > >> > > > > but all existing producers corresponding to the 9K
> partitions
> > > will
> > > >> > get
> > > >> > > > the
> > > >> > > > > errors relatively quickly
> > > >> > > > > rather than relying on their timeout, thanks to the batched
> > > async
> > > >> ZK
> > > >> > > > > operations.
> > > >> > > > > To me it's a useful feature to have during such troublesome
> > > times.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2. The experiments in the Google Doc have shown that with
> this
> > > KIP
> > > >> > many
> > > >> > > > > producers
> > > >> > > > > receive an explicit error NotLeaderForPartition, based on
> > which
> > > >> they
> > > >> > > > retry
> > > >> > > > > immediately.
> > > >> > > > > Therefore the latency (~14 seconds+quick retry) for their
> > single
> > > >> > > message
> > > >> > > > is
> > > >> > > > > much smaller
> > > >> > > > > compared with the case of timing out without the KIP (30
> > seconds
> > > >> for
> > > >> > > > timing
> > > >> > > > > out + quick retry).
> > > >> > > > > One might argue that reducing the timing out on the producer
> > > side
> > > >> can
> > > >> > > > > achieve the same result,
> > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > >> > > > >
> > > >> > > > > Also *IF* there were a metric to show the number of
> truncated
> > > >> > messages
> > > >> > > on
> > > >> > > > > brokers,
> > > >> > > > > with the experiments done in the Google Doc, it should be
> easy
> > > to
> > > >> see
> > > >> > > > that
> > > >> > > > > a lot fewer messages need
> > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > avoids
> > > >> > > appending
> > > >> > > > > of messages
> > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > operator
> > > >> and
> > > >> > ask
> > > >> > > > > whether
> > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer
> > is
> > > >> yes.
> > > >> > > > >
> > > >> > > > > 3. To answer your question, I think it might be helpful to
> > > >> construct
> > > >> > > some
> > > >> > > > > formulas.
> > > >> > > > > To simplify the modeling, I'm going back to the case where
> > there
> > > >> is
> > > >> > > only
> > > >> > > > > ONE partition involved.
> > > >> > > > > Following the experiments in the Google Doc, let's say
> broker0
> > > >> > becomes
> > > >> > > > the
> > > >> > > > > follower at time t0,
> > > >> > > > > and after t0 there were still N produce requests in its
> > request
> > > >> > queue.
> > > >> > > > > With the up-to-date metadata brought by this KIP, broker0
> can
> > > >> reply
> > > >> > > with
> > > >> > > > an
> > > >> > > > > NotLeaderForPartition exception,
> > > >> > > > > let's use M1 to denote the average processing time of
> replying
> > > >> with
> > > >> > > such
> > > >> > > > an
> > > >> > > > > error message.
> > > >> > > > > Without this KIP, the broker will need to append messages to
> > > >> > segments,
> > > >> > > > > which may trigger a flush to disk,
> > > >> > > > > let's use M2 to denote the average processing time for such
> > > logic.
> > > >> > > > > Then the average extra latency incurred without this KIP is
> N
> > *
> > > >> (M2 -
> > > >> > > > M1) /
> > > >> > > > > 2.
> > > >> > > > >
> > > >> > > > > In practice, M2 should always be larger than M1, which means
> > as
> > > >> long
> > > >> > > as N
> > > >> > > > > is positive,
> > > >> > > > > we would see improvements on the average latency.
> > > >> > > > > There does not need to be significant backlog of requests in
> > the
> > > >> > > request
> > > >> > > > > queue,
> > > >> > > > > or severe degradation of disk performance to have the
> > > improvement.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Lucas
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > [1] For instance, reducing the timeout on the producer side
> > can
> > > >> > trigger
> > > >> > > > > unnecessary duplicate requests
> > > >> > > > > when the corresponding leader broker is overloaded,
> > exacerbating
> > > >> the
> > > >> > > > > situation.
> > > >> > > > >
> > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > >> > wrote:
> > > >> > > > >
> > > >> > > > > > Hey Lucas,
> > > >> > > > > >
> > > >> > > > > > Thanks much for the detailed documentation of the
> > experiment.
> > > >> > > > > >
> > > >> > > > > > Initially I also think having a separate queue for
> > controller
> > > >> > > requests
> > > >> > > > is
> > > >> > > > > > useful because, as you mentioned in the summary section of
> > the
> > > >> > Google
> > > >> > > > > doc,
> > > >> > > > > > controller requests are generally more important than data
> > > >> requests
> > > >> > > and
> > > >> > > > > we
> > > >> > > > > > probably want controller requests to be processed sooner.
> > But
> > > >> then
> > > >> > > Eno
> > > >> > > > > has
> > > >> > > > > > two very good questions which I am not sure the Google doc
> > has
> > > >> > > answered
> > > >> > > > > > explicitly. Could you help with the following questions?
> > > >> > > > > >
> > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > KIP-291
> > > to
> > > >> > > users.
> > > >> > > > > The
> > > >> > > > > > experiment setup in the Google doc simulates the scenario
> > that
> > > >> > broker
> > > >> > > > is
> > > >> > > > > > very slow handling ProduceRequest due to e.g. slow disk.
> It
> > > >> > currently
> > > >> > > > > > assumes that there is only 1 partition. But in the common
> > > >> scenario,
> > > >> > > it
> > > >> > > > is
> > > >> > > > > > probably reasonable to assume that there are many other
> > > >> partitions
> > > >> > > that
> > > >> > > > > are
> > > >> > > > > > also actively produced to and ProduceRequest to these
> > > partition
> > > >> > also
> > > >> > > > > takes
> > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > become
> > > >> > > follower
> > > >> > > > > for
> > > >> > > > > > the partition 0 soon, it probably still needs to process
> the
> > > >> > > > > ProduceRequest
> > > >> > > > > > slowly t in the queue because these ProduceRequests cover
> > > other
> > > >> > > > > partitions.
> > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> seconds
> > > and
> > > >> > most
> > > >> > > > > > clients will still likely timeout after 30 seconds. Then
> it
> > is
> > > >> not
> > > >> > > > > > obviously what is the benefit to client since client will
> > > >> timeout
> > > >> > > after
> > > >> > > > > 30
> > > >> > > > > > seconds before possibly re-connecting to broker1, with or
> > > >> without
> > > >> > > > > KIP-291.
> > > >> > > > > > Did I miss something here?
> > > >> > > > > >
> > > >> > > > > > 2) I guess Eno's is asking for the specific benefits of
> this
> > > >> KIP to
> > > >> > > > user
> > > >> > > > > or
> > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > average
> > > >> > > latency,
> > > >> > > > > > 999th percentile latency, probably of exception exposed to
> > > >> client
> > > >> > > etc.
> > > >> > > > It
> > > >> > > > > > is probably useful to clarify this.
> > > >> > > > > >
> > > >> > > > > > 3) Does this KIP help improve user experience only when
> > there
> > > is
> > > >> > > issue
> > > >> > > > > with
> > > >> > > > > > broker, e.g. significant backlog in the request queue due
> to
> > > >> slow
> > > >> > > disk
> > > >> > > > as
> > > >> > > > > > described in the Google doc? Or is this KIP also useful
> when
> > > >> there
> > > >> > is
> > > >> > > > no
> > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> clarify
> > > >> this
> > > >> > to
> > > >> > > > > > understand the benefit of this KIP.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Thanks much,
> > > >> > > > > > Dong
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > >> lucasatucla@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Eno,
> > > >> > > > > > >
> > > >> > > > > > > Sorry for the delay in getting the experiment results.
> > > >> > > > > > > Here is a link to the positive impact achieved by
> > > implementing
> > > >> > the
> > > >> > > > > > proposed
> > > >> > > > > > > change:
> > > >> > > > > > > https://docs.google.com/document/d/
> > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > >> > > > > > > Please take a look when you have time and let me know
> your
> > > >> > > feedback.
> > > >> > > > > > >
> > > >> > > > > > > Regards,
> > > >> > > > > > > Lucas
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> kafka@harsha.io>
> > > >> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Thanks for the pointer. Will take a look might suit
> our
> > > >> > > > requirements
> > > >> > > > > > > > better.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Harsha
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > >> > > > lucasatucla@gmail.com
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hi Harsha,
> > > >> > > > > > > > >
> > > >> > > > > > > > > If I understand correctly, the replication quota
> > > mechanism
> > > >> > > > proposed
> > > >> > > > > > in
> > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > >> > > > > > > > > Have you tried it out?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > > Lucas
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > kafka@harsha.io
> > > >> >
> > > >> > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > One more question, any thoughts on making this
> > > >> configurable
> > > >> > > > > > > > > > and also allowing subset of data requests to be
> > > >> > prioritized.
> > > >> > > > For
> > > >> > > > > > > > example
> > > >> > > > > > > > >
> > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> broker
> > > and
> > > >> > bring
> > > >> > > > new
> > > >> > > > > > one
> > > >> > > > > > > > it
> > > >> > > > > > > > >
> > > >> > > > > > > > > > will try to become follower and have lot of fetch
> > > >> requests
> > > >> > to
> > > >> > > > > other
> > > >> > > > > > > > > leaders
> > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > >> > > application/client
> > > >> > > > > > > > > requests.
> > > >> > > > > > > > > > We are also exploring the similar solution to
> > > >> de-prioritize
> > > >> > > if
> > > >> > > > a
> > > >> > > > > > new
> > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> with
> > > the
> > > >> > > replica
> > > >> > > > > to
> > > >> > > > > > be
> > > >> > > > > > > > > > taking time but the leaders should prioritize the
> > > client
> > > >> > > > > requests.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Harsha
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Hi Eno,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Sorry for the delayed response.
> > > >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > > >> > experimental
> > > >> > > > > > results
> > > >> > > > > > > > so
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > far.
> > > >> > > > > > > > > > > And I plan to test in out in the following days.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > - You are absolutely right that the priority
> queue
> > > >> does
> > > >> > not
> > > >> > > > > > > > completely
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > prevent
> > > >> > > > > > > > > > > data requests being processed ahead of
> controller
> > > >> > requests.
> > > >> > > > > > > > > > > That being said, I expect it to greatly mitigate
> > the
> > > >> > effect
> > > >> > > > of
> > > >> > > > > > > stable
> > > >> > > > > > > > > > > metadata.
> > > >> > > > > > > > > > > In any case, I'll try it out and post the
> results
> > > >> when I
> > > >> > > have
> > > >> > > > > it.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > Lucas
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > >> > > > > > > > eno.thereska@gmail.com
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Sorry for the delay, just had a look at this.
> A
> > > >> couple
> > > >> > of
> > > >> > > > > > > > questions:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > - did you notice any positive change after
> > > >> implementing
> > > >> > > > this
> > > >> > > > > > KIP?
> > > >> > > > > > > > > I'm
> > > >> > > > > > > > > > > > wondering if you have any experimental results
> > > that
> > > >> > show
> > > >> > > > the
> > > >> > > > > > > > benefit
> > > >> > > > > > > > > of
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > two queues.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > addressing
> > > >> the
> > > >> > > > > problem
> > > >> > > > > > > the
> > > >> > > > > > > > > KIP
> > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> will
> > > >> > sometimes
> > > >> > > > > > > (often?)
> > > >> > > > > > > > > have
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > case that data plane requests will be ahead of
> > the
> > > >> > > control
> > > >> > > > > > plane
> > > >> > > > > > > > > > > requests.
> > > >> > > > > > > > > > > > This happens because the system might have
> > already
> > > >> > > started
> > > >> > > > > > > > > processing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > data plane requests before the control plane
> > ones
> > > >> > > arrived.
> > > >> > > > So
> > > >> > > > > > it
> > > >> > > > > > > > > would
> > > >> > > > > > > > > > > be
> > > >> > > > > > > > > > > > good to know what % of the problem this KIP
> > > >> addresses.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > Eno
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > >> > > > > yuzhihong@gmail.com
> > > >> > > > > > >
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Change looks good.
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang
> <
> > > >> > > > > > > > lucasatucla@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated
> the
> > > KIP.
> > > >> > > Please
> > > >> > > > > > take
> > > >> > > > > > > > > > another
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > look.
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > >> > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > It would be good if you can include the
> > > >> default
> > > >> > > value
> > > >> > > > > for
> > > >> > > > > > > > this
> > > >> > > > > > > > >
> > > >> > > > > > > > > > new
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > in the KIP.
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas
> > Wang
> > > <
> > > >> > > > > > > > > > lucasatucla@gmail.com
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> > > config,
> > > >> > > > instead
> > > >> > > > > of
> > > >> > > > > > > > > reusing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > existing one.
> > > >> > > > > > > > > > > > > > > > Please take another look when you have
> > > time.
> > > >> > > > Thanks a
> > > >> > > > > > > lot!
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted
> Yu
> > <
> > > >> > > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> > > control
> > > >> > > request
> > > >> > > > > > rate
> > > >> > > > > > > is
> > > >> > > > > > > > > low
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > I don't know if control request rate
> > can
> > > >> get
> > > >> > to
> > > >> > > > > > > 100,000,
> > > >> > > > > > > > > > > likely
> > > >> > > > > > > > > > > > > not.
> > > >> > > > > > > > > > > > > > > Then
> > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> data
> > > >> > requests
> > > >> > > > > seems
> > > >> > > > > > > > high.
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM,
> > Lucas
> > > >> Wang
> > > >> > <
> > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this
> > KIP.
> > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > >> > > > > > "queued.max.requests"
> > > >> > > > > > > in
> > > >> > > > > > > > > > > > cluster A
> > > >> > > > > > > > > > > > > > is
> > > >> > > > > > > > > > > > > > > > > 1000,
> > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> > > >> 100,000.
> > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> have
> > > >> > indicated
> > > >> > > > > that
> > > >> > > > > > > > > machines
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > By reusing the
> > "queued.max.requests",
> > > >> the
> > > >> > > > > > > > > > > controlRequestQueue
> > > >> > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > automatically
> > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > explicitly
> > > >> > > > bothering
> > > >> > > > > > the
> > > >> > > > > > > > > > > > operators.
> > > >> > > > > > > > > > > > > > > > > > I understand the counter argument
> > can
> > > be
> > > >> > that
> > > >> > > > > maybe
> > > >> > > > > > > > > that's
> > > >> > > > > > > > > > a
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > waste
> > > >> > > > > > > > > > > > > > of
> > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > >> > > > > > > > > > > > > > > > > > rate is low and operators may want
> > to
> > > >> fine
> > > >> > > tune
> > > >> > > > > the
> > > >> > > > > > > > > > capacity
> > > >> > > > > > > > > > > of
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and
> can
> > > >> change
> > > >> > > it
> > > >> > > > if
> > > >> > > > > > you
> > > >> > > > > > > > or
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > anyone
> > > >> > > > > > > > > > > > > > else
> > > >> > > > > > > > > > > > > > > > > feels
> > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > config.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM,
> Ted
> > > Yu
> > > >> <
> > > >> > > > > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2,
> > can
> > > >> you
> > > >> > > > > > elaborate
> > > >> > > > > > > a
> > > >> > > > > > > > > bit
> > > >> > > > > > > > > > > more
> > > >> > > > > > > > > > > > > on
> > > >> > > > > > > > > > > > > > > why
> > > >> > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> impact
> > ?
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> > > Dong
> > > >> > Lin <
> > > >> > > > > > > > > > > > lindong28@gmail.com
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> > > >> overall.
> > > >> > > > Some
> > > >> > > > > > > > > comments
> > > >> > > > > > > > > > > > below:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> > > mbean
> > > >> for
> > > >> > > the
> > > >> > > > > new
> > > >> > > > > > > > > metrics
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > KIP.
> > > >> > > > > > > > > > > > > > > > > Can
> > > >> > > > > > > > > > > > > > > > > > > you
> > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > Interface
> > > >> > > section
> > > >> > > > > > > similar
> > > >> > > > > > > > > to
> > > >> > > > > > > > > > > > KIP-237
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> > > >> Metrics>
> > > >> > > > > > > > > > > > > > > > > > > > ?
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the
> same
> > > >> > pattern
> > > >> > > as
> > > >> > > > > > > KIP-153
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > >
> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > >> > > > > > > > > > > > > metric>,
> > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > sensor
> > > >> name
> > > >> > > > > > > > > "BytesInPerSec"
> > > >> > > > > > > > > > > and
> > > >> > > > > > > > > > > > > add
> > > >> > > > > > > > > > > > > > a
> > > >> > > > > > > > > > > > > > > > new
> > > >> > > > > > > > > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
> > rather
> > > >> than
> > > >> > > > > > replacing
> > > >> > > > > > > > > the
> > > >> > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > name "
> > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > >> > > > > "ClientBytesInPerSec".
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> changes
> > > the
> > > >> > > > semantics
> > > >> > > > > > of
> > > >> > > > > > > > the
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > broker
> > > >> > > > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because
> > the
> > > >> > number
> > > >> > > of
> > > >> > > > > > total
> > > >> > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > queued
> > > >> > > > > > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> bounded
> > > by
> > > >> > > > > > > > > > > "queued.max.requests".
> > > >> > > > > > > > > > > > > This
> > > >> > > > > > > > > > > > > > > > > > probably
> > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> > > Public
> > > >> > > > > Interfaces
> > > >> > > > > > > > > section
> > > >> > > > > > > > > > > for
> > > >> > > > > > > > > > > > > > > > > discussion.
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45
> > PM,
> > > >> Lucas
> > > >> > > > Wang
> > > >> > > > > <
> > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> > > >> separate
> > > >> > > queue
> > > >> > > > > for
> > > >> > > > > > > > > > > controller
> > > >> > > > > > > > > > > > > > > > requests:
> > > >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > 291%
> > > >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > >> > > > > > > > > > control+requests+and+data+
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a look
> and
> > > >> let me
> > > >> > > > know
> > > >> > > > > > your
> > > >> > > > > > > > > > > feedback?
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Joel Koshy <jj...@gmail.com>.

Hey Becket - good point. Lucas and I were talking about this offline last
week. It is true that there is only one request in flight for processing.
However, there may be more during a controller failover but it should not
be very high - basically the maximum number of controller failures that can
occur whilst handling any controller request.

This is in fact the more significant issue that may not have been fully
captured in the KIP motivation. i.e., right now, the server processes one
request at a time. Assuming there is moderate to heavy load on the broker,
while it is handling a controller request, it will accumulate a deluge of
regular client requests that will enter the request queue. After the sole
controller request is handled it will read the next controller request into
the request queue. So we end up with a single controller request, then a
potentially large number of regular requests, then a single controller
request, then a large number of regular requests and so on. This can become
especially problematic when you have many small controller requests (say if
you are steadily moving a few partitions at a time) spread over a short
span of time). With the prioritized queue this changes to: handle a
controller request, handle a vector of regular requests (where the vector
size is the number of request handler threads), handle the next controller
request, and so on. The maximum time between handling adjacent controller
requests will be within (*min(local time) of the vector of regular requests*).
So it helps significantly. We also considered the possibility of NOT
muting/unmuting the controller socket to help address this. This would also
mean we would need to pin the handling of all controller requests to one
specific request handler thread in order to ensure order. That change is
probably not worth the effort and we expect the current proposal to be
adequate.

Thanks,

Joel

On Tue, Jul 17, 2018 at 5:06 AM, Becket Qin <be...@gmail.com> wrote:

> Thanks for the KIP, Lucas. Separating the control plane from the data plane
> makes a lot of sense.
>
> In the KIP you mentioned that the controller request queue may have many
> requests in it. Will this be a common case? The controller requests still
> goes through the SocketServer. The SocketServer will mute the channel once
> a request is read and put into the request channel. So assuming there is
> only one connection between controller and each broker, on the broker side,
> there should be only one controller request in the controller request queue
> at any given time. If that is the case, do we need a separate controller
> request queue capacity config? The default value 20 means that we expect
> there are 20 controller switches to happen in a short period of time. I am
> not sure whether someone should increase the controller request queue
> capacity to handle such case, as it seems indicating something very wrong
> has happened.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Thanks for the update Lucas.
> >
> > I think the motivation section is intuitive. It will be good to learn
> more
> > about the comments from other reviewers.
> >
> > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Dong,
> > >
> > > I've updated the motivation section of the KIP by explaining the cases
> > that
> > > would have user impacts.
> > > Please take a look at let me know your comments.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > The simulation of disk being slow is merely for me to easily
> construct
> > a
> > > > testing scenario
> > > > with a backlog of produce requests. In production, other than the
> disk
> > > > being slow, a backlog of
> > > > produce requests may also be caused by high produce QPS.
> > > > In that case, we may not want to kill the broker and that's when this
> > KIP
> > > > can be useful, both for JBOD
> > > > and non-JBOD setup.
> > > >
> > > > Going back to your previous question about each ProduceRequest
> covering
> > > 20
> > > > partitions that are randomly
> > > > distributed, let's say a LeaderAndIsr request is enqueued that tries
> to
> > > > switch the current broker, say broker0, from leader to follower
> > > > *for one of the partitions*, say *test-0*. For the sake of argument,
> > > > let's also assume the other brokers, say broker1, have *stopped*
> > fetching
> > > > from
> > > > the current broker, i.e. broker0.
> > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR
> will
> > be
> > > > put into the purgatory,
> > > >         and since they'll never be replicated to other brokers
> (because
> > > of
> > > > the assumption made above), they will
> > > >         be completed either when the LeaderAndISR request is
> processed
> > or
> > > > when the timeout happens.
> > > >   1.2 With this KIP, broker0 will immediately transition the
> partition
> > > > test-0 to become a follower,
> > > >         after the current broker sees the replication of the
> remaining
> > 19
> > > > partitions, it can send a response indicating that
> > > >         it's no longer the leader for the "test-0".
> > > >   To see the latency difference between 1.1 and 1.2, let's say there
> > are
> > > > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> > > threads,
> > > >   so each io thread will process approximately 3000 produce requests.
> > Now
> > > > let's investigate the io thread that finally processed the
> > LeaderAndISR.
> > > >   For the 3000 produce requests, if we model the time when their
> > > remaining
> > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> > request
> > > is
> > > > processed at time t3000.
> > > >   Without this KIP, the 1st produce request would have waited an
> extra
> > > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 -
> t1,
> > > etc.
> > > >   Roughly speaking, the latency difference is bigger for the earlier
> > > > produce requests than for the later ones. For the same reason, the
> more
> > > > ProduceRequests queued
> > > >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > > > produce timeout).
> > > > 2. If the enqueued produce requests have acks=0 or acks=1
> > > >   There will be no latency differences in this case, but
> > > >   2.1 without this KIP, the records of partition test-0 in the
> > > > ProduceRequests ahead of the LeaderAndISR will be appended to the
> local
> > > log,
> > > >         and eventually be truncated after processing the
> LeaderAndISR.
> > > > This is what's referred to as
> > > >         "some unofficial definition of data loss in terms of messages
> > > > beyond the high watermark".
> > > >   2.2 with this KIP, we can mitigate the effect since if the
> > LeaderAndISR
> > > > is immediately processed, the response to producers will have
> > > >         the NotLeaderForPartition error, causing producers to retry
> > > >
> > > > This explanation above is the benefit for reducing the latency of a
> > > broker
> > > > becoming the follower,
> > > > closely related is reducing the latency of a broker becoming the
> > leader.
> > > > In this case, the benefit is even more obvious, if other brokers have
> > > > resigned leadership, and the
> > > > current broker should take leadership. Any delay in processing the
> > > > LeaderAndISR will be perceived
> > > > by clients as unavailability. In extreme cases, this can cause failed
> > > > produce requests if the retries are
> > > > exhausted.
> > > >
> > > > Another two types of controller requests are UpdateMetadata and
> > > > StopReplica, which I'll briefly discuss as follows:
> > > > For UpdateMetadata requests, delayed processing means clients
> receiving
> > > > stale metadata, e.g. with the wrong leadership info
> > > > for certain partitions, and the effect is more retries or even fatal
> > > > failure if the retries are exhausted.
> > > >
> > > > For StopReplica requests, a long queuing time may degrade the
> > performance
> > > > of topic deletion.
> > > >
> > > > Regarding your last question of the delay for DescribeLogDirsRequest,
> > you
> > > > are right
> > > > that this KIP cannot help with the latency in getting the log dirs
> > info,
> > > > and it's only relevant
> > > > when controller requests are involved.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > >
> > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com>
> wrote:
> > > >
> > > >> Hey Jun,
> > > >>
> > > >> Thanks much for the comments. It is good point. So the feature may
> be
> > > >> useful for JBOD use-case. I have one question below.
> > > >>
> > > >> Hey Lucas,
> > > >>
> > > >> Do you think this feature is also useful for non-JBOD setup or it is
> > > only
> > > >> useful for the JBOD setup? It may be useful to understand this.
> > > >>
> > > >> When the broker is setup using JBOD, in order to move leaders on the
> > > >> failed
> > > >> disk to other disks, the system operator first needs to get the list
> > of
> > > >> partitions on the failed disk. This is currently achieved using
> > > >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to
> > the
> > > >> broker. If we only prioritize the controller requests, then the
> > > >> DescribeLogDirsRequest
> > > >> may still take a long time to be processed by the broker. So the
> > overall
> > > >> time to move leaders away from the failed disk may still be long
> even
> > > with
> > > >> this KIP. What do you think?
> > > >>
> > > >> Thanks,
> > > >> Dong
> > > >>
> > > >>
> > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Thanks for the insightful comment, Jun.
> > > >> >
> > > >> > @Dong,
> > > >> > Since both of the two comments in your previous email are about
> the
> > > >> > benefits of this KIP and whether it's useful,
> > > >> > in light of Jun's last comment, do you agree that this KIP can be
> > > >> > beneficial in the case mentioned by Jun?
> > > >> > Please let me know, thanks!
> > > >> >
> > > >> > Regards,
> > > >> > Lucas
> > > >> >
> > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >> >
> > > >> > > Hi, Lucas, Dong,
> > > >> > >
> > > >> > > If all disks on a broker are slow, one probably should just kill
> > the
> > > >> > > broker. In that case, this KIP may not help. If only one of the
> > > disks
> > > >> on
> > > >> > a
> > > >> > > broker is slow, one may want to fail that disk and move the
> > leaders
> > > on
> > > >> > that
> > > >> > > disk to other brokers. In that case, being able to process the
> > > >> > LeaderAndIsr
> > > >> > > requests faster will potentially help the producers recover
> > quicker.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jun
> > > >> > >
> > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > Hey Lucas,
> > > >> > > >
> > > >> > > > Thanks for the reply. Some follow up questions below.
> > > >> > > >
> > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that
> > are
> > > >> > > randomly
> > > >> > > > distributed across all partitions, then each ProduceRequest
> will
> > > >> likely
> > > >> > > > cover some partitions for which the broker is still leader
> after
> > > it
> > > >> > > quickly
> > > >> > > > processes the
> > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> > processing
> > > >> these
> > > >> > > > ProduceRequest and request will still be very high with this
> > KIP.
> > > It
> > > >> > > seems
> > > >> > > > that most ProduceRequest will still timeout after 30 seconds.
> Is
> > > >> this
> > > >> > > > understanding correct?
> > > >> > > >
> > > >> > > > Regarding 2, if most ProduceRequest will still timeout after
> 30
> > > >> > seconds,
> > > >> > > > then it is less clear how this KIP reduces average produce
> > > latency.
> > > >> Can
> > > >> > > you
> > > >> > > > clarify what metrics can be improved by this KIP?
> > > >> > > >
> > > >> > > > Not sure why system operator directly cares number of
> truncated
> > > >> > messages.
> > > >> > > > Do you mean this KIP can improve average throughput or reduce
> > > >> message
> > > >> > > > duplication? It will be good to understand this.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dong
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Hi Dong,
> > > >> > > > >
> > > >> > > > > Thanks for your valuable comments. Please see my reply
> below.
> > > >> > > > >
> > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
> consider
> > a
> > > >> more
> > > >> > > > common
> > > >> > > > > scenario
> > > >> > > > > where broker0 is the leader of many partitions. And let's
> say
> > > for
> > > >> > some
> > > >> > > > > reason its IO becomes slow.
> > > >> > > > > The number of leader partitions on broker0 is so large, say
> > 10K,
> > > >> that
> > > >> > > the
> > > >> > > > > cluster is skewed,
> > > >> > > > > and the operator would like to shift the leadership for a
> lot
> > of
> > > >> > > > > partitions, say 9K, to other brokers,
> > > >> > > > > either manually or through some service like cruise control.
> > > >> > > > > With this KIP, not only will the leadership transitions
> finish
> > > >> more
> > > >> > > > > quickly, helping the cluster itself becoming more balanced,
> > > >> > > > > but all existing producers corresponding to the 9K
> partitions
> > > will
> > > >> > get
> > > >> > > > the
> > > >> > > > > errors relatively quickly
> > > >> > > > > rather than relying on their timeout, thanks to the batched
> > > async
> > > >> ZK
> > > >> > > > > operations.
> > > >> > > > > To me it's a useful feature to have during such troublesome
> > > times.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2. The experiments in the Google Doc have shown that with
> this
> > > KIP
> > > >> > many
> > > >> > > > > producers
> > > >> > > > > receive an explicit error NotLeaderForPartition, based on
> > which
> > > >> they
> > > >> > > > retry
> > > >> > > > > immediately.
> > > >> > > > > Therefore the latency (~14 seconds+quick retry) for their
> > single
> > > >> > > message
> > > >> > > > is
> > > >> > > > > much smaller
> > > >> > > > > compared with the case of timing out without the KIP (30
> > seconds
> > > >> for
> > > >> > > > timing
> > > >> > > > > out + quick retry).
> > > >> > > > > One might argue that reducing the timing out on the producer
> > > side
> > > >> can
> > > >> > > > > achieve the same result,
> > > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > > >> > > > >
> > > >> > > > > Also *IF* there were a metric to show the number of
> truncated
> > > >> > messages
> > > >> > > on
> > > >> > > > > brokers,
> > > >> > > > > with the experiments done in the Google Doc, it should be
> easy
> > > to
> > > >> see
> > > >> > > > that
> > > >> > > > > a lot fewer messages need
> > > >> > > > > to be truncated on broker0 since the up-to-date metadata
> > avoids
> > > >> > > appending
> > > >> > > > > of messages
> > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> > operator
> > > >> and
> > > >> > ask
> > > >> > > > > whether
> > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer
> > is
> > > >> yes.
> > > >> > > > >
> > > >> > > > > 3. To answer your question, I think it might be helpful to
> > > >> construct
> > > >> > > some
> > > >> > > > > formulas.
> > > >> > > > > To simplify the modeling, I'm going back to the case where
> > there
> > > >> is
> > > >> > > only
> > > >> > > > > ONE partition involved.
> > > >> > > > > Following the experiments in the Google Doc, let's say
> broker0
> > > >> > becomes
> > > >> > > > the
> > > >> > > > > follower at time t0,
> > > >> > > > > and after t0 there were still N produce requests in its
> > request
> > > >> > queue.
> > > >> > > > > With the up-to-date metadata brought by this KIP, broker0
> can
> > > >> reply
> > > >> > > with
> > > >> > > > an
> > > >> > > > > NotLeaderForPartition exception,
> > > >> > > > > let's use M1 to denote the average processing time of
> replying
> > > >> with
> > > >> > > such
> > > >> > > > an
> > > >> > > > > error message.
> > > >> > > > > Without this KIP, the broker will need to append messages to
> > > >> > segments,
> > > >> > > > > which may trigger a flush to disk,
> > > >> > > > > let's use M2 to denote the average processing time for such
> > > logic.
> > > >> > > > > Then the average extra latency incurred without this KIP is
> N
> > *
> > > >> (M2 -
> > > >> > > > M1) /
> > > >> > > > > 2.
> > > >> > > > >
> > > >> > > > > In practice, M2 should always be larger than M1, which means
> > as
> > > >> long
> > > >> > > as N
> > > >> > > > > is positive,
> > > >> > > > > we would see improvements on the average latency.
> > > >> > > > > There does not need to be significant backlog of requests in
> > the
> > > >> > > request
> > > >> > > > > queue,
> > > >> > > > > or severe degradation of disk performance to have the
> > > improvement.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Lucas
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > [1] For instance, reducing the timeout on the producer side
> > can
> > > >> > trigger
> > > >> > > > > unnecessary duplicate requests
> > > >> > > > > when the corresponding leader broker is overloaded,
> > exacerbating
> > > >> the
> > > >> > > > > situation.
> > > >> > > > >
> > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > >> > wrote:
> > > >> > > > >
> > > >> > > > > > Hey Lucas,
> > > >> > > > > >
> > > >> > > > > > Thanks much for the detailed documentation of the
> > experiment.
> > > >> > > > > >
> > > >> > > > > > Initially I also think having a separate queue for
> > controller
> > > >> > > requests
> > > >> > > > is
> > > >> > > > > > useful because, as you mentioned in the summary section of
> > the
> > > >> > Google
> > > >> > > > > doc,
> > > >> > > > > > controller requests are generally more important than data
> > > >> requests
> > > >> > > and
> > > >> > > > > we
> > > >> > > > > > probably want controller requests to be processed sooner.
> > But
> > > >> then
> > > >> > > Eno
> > > >> > > > > has
> > > >> > > > > > two very good questions which I am not sure the Google doc
> > has
> > > >> > > answered
> > > >> > > > > > explicitly. Could you help with the following questions?
> > > >> > > > > >
> > > >> > > > > > 1) It is not very clear what is the actual benefit of
> > KIP-291
> > > to
> > > >> > > users.
> > > >> > > > > The
> > > >> > > > > > experiment setup in the Google doc simulates the scenario
> > that
> > > >> > broker
> > > >> > > > is
> > > >> > > > > > very slow handling ProduceRequest due to e.g. slow disk.
> It
> > > >> > currently
> > > >> > > > > > assumes that there is only 1 partition. But in the common
> > > >> scenario,
> > > >> > > it
> > > >> > > > is
> > > >> > > > > > probably reasonable to assume that there are many other
> > > >> partitions
> > > >> > > that
> > > >> > > > > are
> > > >> > > > > > also actively produced to and ProduceRequest to these
> > > partition
> > > >> > also
> > > >> > > > > takes
> > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> > become
> > > >> > > follower
> > > >> > > > > for
> > > >> > > > > > the partition 0 soon, it probably still needs to process
> the
> > > >> > > > > ProduceRequest
> > > >> > > > > > slowly t in the queue because these ProduceRequests cover
> > > other
> > > >> > > > > partitions.
> > > >> > > > > > Thus most ProduceRequest will still timeout after 30
> seconds
> > > and
> > > >> > most
> > > >> > > > > > clients will still likely timeout after 30 seconds. Then
> it
> > is
> > > >> not
> > > >> > > > > > obviously what is the benefit to client since client will
> > > >> timeout
> > > >> > > after
> > > >> > > > > 30
> > > >> > > > > > seconds before possibly re-connecting to broker1, with or
> > > >> without
> > > >> > > > > KIP-291.
> > > >> > > > > > Did I miss something here?
> > > >> > > > > >
> > > >> > > > > > 2) I guess Eno's is asking for the specific benefits of
> this
> > > >> KIP to
> > > >> > > > user
> > > >> > > > > or
> > > >> > > > > > system administrator, e.g. whether this KIP decreases
> > average
> > > >> > > latency,
> > > >> > > > > > 999th percentile latency, probably of exception exposed to
> > > >> client
> > > >> > > etc.
> > > >> > > > It
> > > >> > > > > > is probably useful to clarify this.
> > > >> > > > > >
> > > >> > > > > > 3) Does this KIP help improve user experience only when
> > there
> > > is
> > > >> > > issue
> > > >> > > > > with
> > > >> > > > > > broker, e.g. significant backlog in the request queue due
> to
> > > >> slow
> > > >> > > disk
> > > >> > > > as
> > > >> > > > > > described in the Google doc? Or is this KIP also useful
> when
> > > >> there
> > > >> > is
> > > >> > > > no
> > > >> > > > > > ongoing issue in the cluster? It might be helpful to
> clarify
> > > >> this
> > > >> > to
> > > >> > > > > > understand the benefit of this KIP.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Thanks much,
> > > >> > > > > > Dong
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > > >> lucasatucla@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Eno,
> > > >> > > > > > >
> > > >> > > > > > > Sorry for the delay in getting the experiment results.
> > > >> > > > > > > Here is a link to the positive impact achieved by
> > > implementing
> > > >> > the
> > > >> > > > > > proposed
> > > >> > > > > > > change:
> > > >> > > > > > > https://docs.google.com/document/d/
> > > 1ge2jjp5aPTBber6zaIT9AdhW
> > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > >> > > > > > > Please take a look when you have time and let me know
> your
> > > >> > > feedback.
> > > >> > > > > > >
> > > >> > > > > > > Regards,
> > > >> > > > > > > Lucas
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
> kafka@harsha.io>
> > > >> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Thanks for the pointer. Will take a look might suit
> our
> > > >> > > > requirements
> > > >> > > > > > > > better.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Harsha
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > >> > > > lucasatucla@gmail.com
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hi Harsha,
> > > >> > > > > > > > >
> > > >> > > > > > > > > If I understand correctly, the replication quota
> > > mechanism
> > > >> > > > proposed
> > > >> > > > > > in
> > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > >> > > > > > > > > Have you tried it out?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > > Lucas
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > > kafka@harsha.io
> > > >> >
> > > >> > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > One more question, any thoughts on making this
> > > >> configurable
> > > >> > > > > > > > > > and also allowing subset of data requests to be
> > > >> > prioritized.
> > > >> > > > For
> > > >> > > > > > > > example
> > > >> > > > > > > > >
> > > >> > > > > > > > > > ,we notice in our cluster when we take out a
> broker
> > > and
> > > >> > bring
> > > >> > > > new
> > > >> > > > > > one
> > > >> > > > > > > > it
> > > >> > > > > > > > >
> > > >> > > > > > > > > > will try to become follower and have lot of fetch
> > > >> requests
> > > >> > to
> > > >> > > > > other
> > > >> > > > > > > > > leaders
> > > >> > > > > > > > > > in clusters. This will negatively effect the
> > > >> > > application/client
> > > >> > > > > > > > > requests.
> > > >> > > > > > > > > > We are also exploring the similar solution to
> > > >> de-prioritize
> > > >> > > if
> > > >> > > > a
> > > >> > > > > > new
> > > >> > > > > > > > > > replica comes in for fetch requests, we are ok
> with
> > > the
> > > >> > > replica
> > > >> > > > > to
> > > >> > > > > > be
> > > >> > > > > > > > > > taking time but the leaders should prioritize the
> > > client
> > > >> > > > > requests.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Harsha
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
> wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Hi Eno,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Sorry for the delayed response.
> > > >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > > >> > experimental
> > > >> > > > > > results
> > > >> > > > > > > > so
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > far.
> > > >> > > > > > > > > > > And I plan to test in out in the following days.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > - You are absolutely right that the priority
> queue
> > > >> does
> > > >> > not
> > > >> > > > > > > > completely
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > prevent
> > > >> > > > > > > > > > > data requests being processed ahead of
> controller
> > > >> > requests.
> > > >> > > > > > > > > > > That being said, I expect it to greatly mitigate
> > the
> > > >> > effect
> > > >> > > > of
> > > >> > > > > > > stable
> > > >> > > > > > > > > > > metadata.
> > > >> > > > > > > > > > > In any case, I'll try it out and post the
> results
> > > >> when I
> > > >> > > have
> > > >> > > > > it.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > Lucas
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > >> > > > > > > > eno.thereska@gmail.com
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Hi Lucas,
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Sorry for the delay, just had a look at this.
> A
> > > >> couple
> > > >> > of
> > > >> > > > > > > > questions:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > - did you notice any positive change after
> > > >> implementing
> > > >> > > > this
> > > >> > > > > > KIP?
> > > >> > > > > > > > > I'm
> > > >> > > > > > > > > > > > wondering if you have any experimental results
> > > that
> > > >> > show
> > > >> > > > the
> > > >> > > > > > > > benefit
> > > >> > > > > > > > > of
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > two queues.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > - priority is usually not sufficient in
> > addressing
> > > >> the
> > > >> > > > > problem
> > > >> > > > > > > the
> > > >> > > > > > > > > KIP
> > > >> > > > > > > > > > > > identifies. Even with priority queues, you
> will
> > > >> > sometimes
> > > >> > > > > > > (often?)
> > > >> > > > > > > > > have
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > case that data plane requests will be ahead of
> > the
> > > >> > > control
> > > >> > > > > > plane
> > > >> > > > > > > > > > > requests.
> > > >> > > > > > > > > > > > This happens because the system might have
> > already
> > > >> > > started
> > > >> > > > > > > > > processing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > data plane requests before the control plane
> > ones
> > > >> > > arrived.
> > > >> > > > So
> > > >> > > > > > it
> > > >> > > > > > > > > would
> > > >> > > > > > > > > > > be
> > > >> > > > > > > > > > > > good to know what % of the problem this KIP
> > > >> addresses.
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > Eno
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > >> > > > > yuzhihong@gmail.com
> > > >> > > > > > >
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Change looks good.
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang
> <
> > > >> > > > > > > > lucasatucla@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated
> the
> > > KIP.
> > > >> > > Please
> > > >> > > > > > take
> > > >> > > > > > > > > > another
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > look.
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > >> > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > It would be good if you can include the
> > > >> default
> > > >> > > value
> > > >> > > > > for
> > > >> > > > > > > > this
> > > >> > > > > > > > >
> > > >> > > > > > > > > > new
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > in the KIP.
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas
> > Wang
> > > <
> > > >> > > > > > > > > > lucasatucla@gmail.com
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> > > config,
> > > >> > > > instead
> > > >> > > > > of
> > > >> > > > > > > > > reusing
> > > >> > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > existing one.
> > > >> > > > > > > > > > > > > > > > Please take another look when you have
> > > time.
> > > >> > > > Thanks a
> > > >> > > > > > > lot!
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted
> Yu
> > <
> > > >> > > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> > > control
> > > >> > > request
> > > >> > > > > > rate
> > > >> > > > > > > is
> > > >> > > > > > > > > low
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > I don't know if control request rate
> > can
> > > >> get
> > > >> > to
> > > >> > > > > > > 100,000,
> > > >> > > > > > > > > > > likely
> > > >> > > > > > > > > > > > > not.
> > > >> > > > > > > > > > > > > > > Then
> > > >> > > > > > > > > > > > > > > > > using the same bound as that for
> data
> > > >> > requests
> > > >> > > > > seems
> > > >> > > > > > > > high.
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM,
> > Lucas
> > > >> Wang
> > > >> > <
> > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this
> > KIP.
> > > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > >> > > > > > "queued.max.requests"
> > > >> > > > > > > in
> > > >> > > > > > > > > > > > cluster A
> > > >> > > > > > > > > > > > > > is
> > > >> > > > > > > > > > > > > > > > > 1000,
> > > >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> > > >> 100,000.
> > > >> > > > > > > > > > > > > > > > > > The 100 times difference might
> have
> > > >> > indicated
> > > >> > > > > that
> > > >> > > > > > > > > machines
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > have larger memory.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > By reusing the
> > "queued.max.requests",
> > > >> the
> > > >> > > > > > > > > > > controlRequestQueue
> > > >> > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > cluster
> > > >> > > > > > > > > > > > > > > > > B
> > > >> > > > > > > > > > > > > > > > > > automatically
> > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > > explicitly
> > > >> > > > bothering
> > > >> > > > > > the
> > > >> > > > > > > > > > > > operators.
> > > >> > > > > > > > > > > > > > > > > > I understand the counter argument
> > can
> > > be
> > > >> > that
> > > >> > > > > maybe
> > > >> > > > > > > > > that's
> > > >> > > > > > > > > > a
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > waste
> > > >> > > > > > > > > > > > > > of
> > > >> > > > > > > > > > > > > > > > > > resource if control request
> > > >> > > > > > > > > > > > > > > > > > rate is low and operators may want
> > to
> > > >> fine
> > > >> > > tune
> > > >> > > > > the
> > > >> > > > > > > > > > capacity
> > > >> > > > > > > > > > > of
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and
> can
> > > >> change
> > > >> > > it
> > > >> > > > if
> > > >> > > > > > you
> > > >> > > > > > > > or
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > anyone
> > > >> > > > > > > > > > > > > > else
> > > >> > > > > > > > > > > > > > > > > feels
> > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> > config.
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM,
> Ted
> > > Yu
> > > >> <
> > > >> > > > > > > > > > yuzhihong@gmail.com
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Lucas:
> > > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2,
> > can
> > > >> you
> > > >> > > > > > elaborate
> > > >> > > > > > > a
> > > >> > > > > > > > > bit
> > > >> > > > > > > > > > > more
> > > >> > > > > > > > > > > > > on
> > > >> > > > > > > > > > > > > > > why
> > > >> > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
> impact
> > ?
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> > > Dong
> > > >> > Lin <
> > > >> > > > > > > > > > > > lindong28@gmail.com
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> > > >> overall.
> > > >> > > > Some
> > > >> > > > > > > > > comments
> > > >> > > > > > > > > > > > below:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> > > mbean
> > > >> for
> > > >> > > the
> > > >> > > > > new
> > > >> > > > > > > > > metrics
> > > >> > > > > > > > > > > in
> > > >> > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > KIP.
> > > >> > > > > > > > > > > > > > > > > Can
> > > >> > > > > > > > > > > > > > > > > > > you
> > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> > Interface
> > > >> > > section
> > > >> > > > > > > similar
> > > >> > > > > > > > > to
> > > >> > > > > > > > > > > > KIP-237
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> > > >> Metrics>
> > > >> > > > > > > > > > > > > > > > > > > > ?
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the
> same
> > > >> > pattern
> > > >> > > as
> > > >> > > > > > > KIP-153
> > > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > 153%3A+Include+only+client+
> traffic+in+BytesOutPerSec+
> > > >> > > > > > > > > > > > > metric>,
> > > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> > sensor
> > > >> name
> > > >> > > > > > > > > "BytesInPerSec"
> > > >> > > > > > > > > > > and
> > > >> > > > > > > > > > > > > add
> > > >> > > > > > > > > > > > > > a
> > > >> > > > > > > > > > > > > > > > new
> > > >> > > > > > > > > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
> > rather
> > > >> than
> > > >> > > > > > replacing
> > > >> > > > > > > > > the
> > > >> > > > > > > > > > > > sensor
> > > >> > > > > > > > > > > > > > > name "
> > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > >> > > > > "ClientBytesInPerSec".
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
> changes
> > > the
> > > >> > > > semantics
> > > >> > > > > > of
> > > >> > > > > > > > the
> > > >> > > > > > > > >
> > > >> > > > > > > > > > > broker
> > > >> > > > > > > > > > > > > > > config
> > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because
> > the
> > > >> > number
> > > >> > > of
> > > >> > > > > > total
> > > >> > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > queued
> > > >> > > > > > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
> bounded
> > > by
> > > >> > > > > > > > > > > "queued.max.requests".
> > > >> > > > > > > > > > > > > This
> > > >> > > > > > > > > > > > > > > > > > probably
> > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> > > Public
> > > >> > > > > Interfaces
> > > >> > > > > > > > > section
> > > >> > > > > > > > > > > for
> > > >> > > > > > > > > > > > > > > > > discussion.
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > > > > > > > Dong
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45
> > PM,
> > > >> Lucas
> > > >> > > > Wang
> > > >> > > > > <
> > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > >> > > > > > > > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> > > >> separate
> > > >> > > queue
> > > >> > > > > for
> > > >> > > > > > > > > > > controller
> > > >> > > > > > > > > > > > > > > > requests:
> > > >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > 291%
> > > >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > >> > > > > > > > > > control+requests+and+data+
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > > requests
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a look
> and
> > > >> let me
> > > >> > > > know
> > > >> > > > > > your
> > > >> > > > > > > > > > > feedback?
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > > >> > > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Becket Qin <be...@gmail.com>.

Thanks for the KIP, Lucas. Separating the control plane from the data plane
makes a lot of sense.

In the KIP you mentioned that the controller request queue may have many
requests in it. Will this be a common case? The controller requests still
goes through the SocketServer. The SocketServer will mute the channel once
a request is read and put into the request channel. So assuming there is
only one connection between controller and each broker, on the broker side,
there should be only one controller request in the controller request queue
at any given time. If that is the case, do we need a separate controller
request queue capacity config? The default value 20 means that we expect
there are 20 controller switches to happen in a short period of time. I am
not sure whether someone should increase the controller request queue
capacity to handle such case, as it seems indicating something very wrong
has happened.

Thanks,

Jiangjie (Becket) Qin


On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <li...@gmail.com> wrote:

> Thanks for the update Lucas.
>
> I think the motivation section is intuitive. It will be good to learn more
> about the comments from other reviewers.
>
> On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Dong,
> >
> > I've updated the motivation section of the KIP by explaining the cases
> that
> > would have user impacts.
> > Please take a look at let me know your comments.
> >
> > Thanks,
> > Lucas
> >
> > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Dong,
> > >
> > > The simulation of disk being slow is merely for me to easily construct
> a
> > > testing scenario
> > > with a backlog of produce requests. In production, other than the disk
> > > being slow, a backlog of
> > > produce requests may also be caused by high produce QPS.
> > > In that case, we may not want to kill the broker and that's when this
> KIP
> > > can be useful, both for JBOD
> > > and non-JBOD setup.
> > >
> > > Going back to your previous question about each ProduceRequest covering
> > 20
> > > partitions that are randomly
> > > distributed, let's say a LeaderAndIsr request is enqueued that tries to
> > > switch the current broker, say broker0, from leader to follower
> > > *for one of the partitions*, say *test-0*. For the sake of argument,
> > > let's also assume the other brokers, say broker1, have *stopped*
> fetching
> > > from
> > > the current broker, i.e. broker0.
> > > 1. If the enqueued produce requests have acks =  -1 (ALL)
> > >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR will
> be
> > > put into the purgatory,
> > >         and since they'll never be replicated to other brokers (because
> > of
> > > the assumption made above), they will
> > >         be completed either when the LeaderAndISR request is processed
> or
> > > when the timeout happens.
> > >   1.2 With this KIP, broker0 will immediately transition the partition
> > > test-0 to become a follower,
> > >         after the current broker sees the replication of the remaining
> 19
> > > partitions, it can send a response indicating that
> > >         it's no longer the leader for the "test-0".
> > >   To see the latency difference between 1.1 and 1.2, let's say there
> are
> > > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> > threads,
> > >   so each io thread will process approximately 3000 produce requests.
> Now
> > > let's investigate the io thread that finally processed the
> LeaderAndISR.
> > >   For the 3000 produce requests, if we model the time when their
> > remaining
> > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
> request
> > is
> > > processed at time t3000.
> > >   Without this KIP, the 1st produce request would have waited an extra
> > > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 - t1,
> > etc.
> > >   Roughly speaking, the latency difference is bigger for the earlier
> > > produce requests than for the later ones. For the same reason, the more
> > > ProduceRequests queued
> > >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > > produce timeout).
> > > 2. If the enqueued produce requests have acks=0 or acks=1
> > >   There will be no latency differences in this case, but
> > >   2.1 without this KIP, the records of partition test-0 in the
> > > ProduceRequests ahead of the LeaderAndISR will be appended to the local
> > log,
> > >         and eventually be truncated after processing the LeaderAndISR.
> > > This is what's referred to as
> > >         "some unofficial definition of data loss in terms of messages
> > > beyond the high watermark".
> > >   2.2 with this KIP, we can mitigate the effect since if the
> LeaderAndISR
> > > is immediately processed, the response to producers will have
> > >         the NotLeaderForPartition error, causing producers to retry
> > >
> > > This explanation above is the benefit for reducing the latency of a
> > broker
> > > becoming the follower,
> > > closely related is reducing the latency of a broker becoming the
> leader.
> > > In this case, the benefit is even more obvious, if other brokers have
> > > resigned leadership, and the
> > > current broker should take leadership. Any delay in processing the
> > > LeaderAndISR will be perceived
> > > by clients as unavailability. In extreme cases, this can cause failed
> > > produce requests if the retries are
> > > exhausted.
> > >
> > > Another two types of controller requests are UpdateMetadata and
> > > StopReplica, which I'll briefly discuss as follows:
> > > For UpdateMetadata requests, delayed processing means clients receiving
> > > stale metadata, e.g. with the wrong leadership info
> > > for certain partitions, and the effect is more retries or even fatal
> > > failure if the retries are exhausted.
> > >
> > > For StopReplica requests, a long queuing time may degrade the
> performance
> > > of topic deletion.
> > >
> > > Regarding your last question of the delay for DescribeLogDirsRequest,
> you
> > > are right
> > > that this KIP cannot help with the latency in getting the log dirs
> info,
> > > and it's only relevant
> > > when controller requests are involved.
> > >
> > > Regards,
> > > Lucas
> > >
> > >
> > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > >> Hey Jun,
> > >>
> > >> Thanks much for the comments. It is good point. So the feature may be
> > >> useful for JBOD use-case. I have one question below.
> > >>
> > >> Hey Lucas,
> > >>
> > >> Do you think this feature is also useful for non-JBOD setup or it is
> > only
> > >> useful for the JBOD setup? It may be useful to understand this.
> > >>
> > >> When the broker is setup using JBOD, in order to move leaders on the
> > >> failed
> > >> disk to other disks, the system operator first needs to get the list
> of
> > >> partitions on the failed disk. This is currently achieved using
> > >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to
> the
> > >> broker. If we only prioritize the controller requests, then the
> > >> DescribeLogDirsRequest
> > >> may still take a long time to be processed by the broker. So the
> overall
> > >> time to move leaders away from the failed disk may still be long even
> > with
> > >> this KIP. What do you think?
> > >>
> > >> Thanks,
> > >> Dong
> > >>
> > >>
> > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >>
> > >> > Thanks for the insightful comment, Jun.
> > >> >
> > >> > @Dong,
> > >> > Since both of the two comments in your previous email are about the
> > >> > benefits of this KIP and whether it's useful,
> > >> > in light of Jun's last comment, do you agree that this KIP can be
> > >> > beneficial in the case mentioned by Jun?
> > >> > Please let me know, thanks!
> > >> >
> > >> > Regards,
> > >> > Lucas
> > >> >
> > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
> > >> >
> > >> > > Hi, Lucas, Dong,
> > >> > >
> > >> > > If all disks on a broker are slow, one probably should just kill
> the
> > >> > > broker. In that case, this KIP may not help. If only one of the
> > disks
> > >> on
> > >> > a
> > >> > > broker is slow, one may want to fail that disk and move the
> leaders
> > on
> > >> > that
> > >> > > disk to other brokers. In that case, being able to process the
> > >> > LeaderAndIsr
> > >> > > requests faster will potentially help the producers recover
> quicker.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jun
> > >> > >
> > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > Hey Lucas,
> > >> > > >
> > >> > > > Thanks for the reply. Some follow up questions below.
> > >> > > >
> > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that
> are
> > >> > > randomly
> > >> > > > distributed across all partitions, then each ProduceRequest will
> > >> likely
> > >> > > > cover some partitions for which the broker is still leader after
> > it
> > >> > > quickly
> > >> > > > processes the
> > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
> processing
> > >> these
> > >> > > > ProduceRequest and request will still be very high with this
> KIP.
> > It
> > >> > > seems
> > >> > > > that most ProduceRequest will still timeout after 30 seconds. Is
> > >> this
> > >> > > > understanding correct?
> > >> > > >
> > >> > > > Regarding 2, if most ProduceRequest will still timeout after 30
> > >> > seconds,
> > >> > > > then it is less clear how this KIP reduces average produce
> > latency.
> > >> Can
> > >> > > you
> > >> > > > clarify what metrics can be improved by this KIP?
> > >> > > >
> > >> > > > Not sure why system operator directly cares number of truncated
> > >> > messages.
> > >> > > > Do you mean this KIP can improve average throughput or reduce
> > >> message
> > >> > > > duplication? It will be good to understand this.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Dong
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lucasatucla@gmail.com
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Hi Dong,
> > >> > > > >
> > >> > > > > Thanks for your valuable comments. Please see my reply below.
> > >> > > > >
> > >> > > > > 1. The Google doc showed only 1 partition. Now let's consider
> a
> > >> more
> > >> > > > common
> > >> > > > > scenario
> > >> > > > > where broker0 is the leader of many partitions. And let's say
> > for
> > >> > some
> > >> > > > > reason its IO becomes slow.
> > >> > > > > The number of leader partitions on broker0 is so large, say
> 10K,
> > >> that
> > >> > > the
> > >> > > > > cluster is skewed,
> > >> > > > > and the operator would like to shift the leadership for a lot
> of
> > >> > > > > partitions, say 9K, to other brokers,
> > >> > > > > either manually or through some service like cruise control.
> > >> > > > > With this KIP, not only will the leadership transitions finish
> > >> more
> > >> > > > > quickly, helping the cluster itself becoming more balanced,
> > >> > > > > but all existing producers corresponding to the 9K partitions
> > will
> > >> > get
> > >> > > > the
> > >> > > > > errors relatively quickly
> > >> > > > > rather than relying on their timeout, thanks to the batched
> > async
> > >> ZK
> > >> > > > > operations.
> > >> > > > > To me it's a useful feature to have during such troublesome
> > times.
> > >> > > > >
> > >> > > > >
> > >> > > > > 2. The experiments in the Google Doc have shown that with this
> > KIP
> > >> > many
> > >> > > > > producers
> > >> > > > > receive an explicit error NotLeaderForPartition, based on
> which
> > >> they
> > >> > > > retry
> > >> > > > > immediately.
> > >> > > > > Therefore the latency (~14 seconds+quick retry) for their
> single
> > >> > > message
> > >> > > > is
> > >> > > > > much smaller
> > >> > > > > compared with the case of timing out without the KIP (30
> seconds
> > >> for
> > >> > > > timing
> > >> > > > > out + quick retry).
> > >> > > > > One might argue that reducing the timing out on the producer
> > side
> > >> can
> > >> > > > > achieve the same result,
> > >> > > > > yet reducing the timeout has its own drawbacks[1].
> > >> > > > >
> > >> > > > > Also *IF* there were a metric to show the number of truncated
> > >> > messages
> > >> > > on
> > >> > > > > brokers,
> > >> > > > > with the experiments done in the Google Doc, it should be easy
> > to
> > >> see
> > >> > > > that
> > >> > > > > a lot fewer messages need
> > >> > > > > to be truncated on broker0 since the up-to-date metadata
> avoids
> > >> > > appending
> > >> > > > > of messages
> > >> > > > > in subsequent PRODUCE requests. If we talk to a system
> operator
> > >> and
> > >> > ask
> > >> > > > > whether
> > >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer
> is
> > >> yes.
> > >> > > > >
> > >> > > > > 3. To answer your question, I think it might be helpful to
> > >> construct
> > >> > > some
> > >> > > > > formulas.
> > >> > > > > To simplify the modeling, I'm going back to the case where
> there
> > >> is
> > >> > > only
> > >> > > > > ONE partition involved.
> > >> > > > > Following the experiments in the Google Doc, let's say broker0
> > >> > becomes
> > >> > > > the
> > >> > > > > follower at time t0,
> > >> > > > > and after t0 there were still N produce requests in its
> request
> > >> > queue.
> > >> > > > > With the up-to-date metadata brought by this KIP, broker0 can
> > >> reply
> > >> > > with
> > >> > > > an
> > >> > > > > NotLeaderForPartition exception,
> > >> > > > > let's use M1 to denote the average processing time of replying
> > >> with
> > >> > > such
> > >> > > > an
> > >> > > > > error message.
> > >> > > > > Without this KIP, the broker will need to append messages to
> > >> > segments,
> > >> > > > > which may trigger a flush to disk,
> > >> > > > > let's use M2 to denote the average processing time for such
> > logic.
> > >> > > > > Then the average extra latency incurred without this KIP is N
> *
> > >> (M2 -
> > >> > > > M1) /
> > >> > > > > 2.
> > >> > > > >
> > >> > > > > In practice, M2 should always be larger than M1, which means
> as
> > >> long
> > >> > > as N
> > >> > > > > is positive,
> > >> > > > > we would see improvements on the average latency.
> > >> > > > > There does not need to be significant backlog of requests in
> the
> > >> > > request
> > >> > > > > queue,
> > >> > > > > or severe degradation of disk performance to have the
> > improvement.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Lucas
> > >> > > > >
> > >> > > > >
> > >> > > > > [1] For instance, reducing the timeout on the producer side
> can
> > >> > trigger
> > >> > > > > unnecessary duplicate requests
> > >> > > > > when the corresponding leader broker is overloaded,
> exacerbating
> > >> the
> > >> > > > > situation.
> > >> > > > >
> > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <lindong28@gmail.com
> >
> > >> > wrote:
> > >> > > > >
> > >> > > > > > Hey Lucas,
> > >> > > > > >
> > >> > > > > > Thanks much for the detailed documentation of the
> experiment.
> > >> > > > > >
> > >> > > > > > Initially I also think having a separate queue for
> controller
> > >> > > requests
> > >> > > > is
> > >> > > > > > useful because, as you mentioned in the summary section of
> the
> > >> > Google
> > >> > > > > doc,
> > >> > > > > > controller requests are generally more important than data
> > >> requests
> > >> > > and
> > >> > > > > we
> > >> > > > > > probably want controller requests to be processed sooner.
> But
> > >> then
> > >> > > Eno
> > >> > > > > has
> > >> > > > > > two very good questions which I am not sure the Google doc
> has
> > >> > > answered
> > >> > > > > > explicitly. Could you help with the following questions?
> > >> > > > > >
> > >> > > > > > 1) It is not very clear what is the actual benefit of
> KIP-291
> > to
> > >> > > users.
> > >> > > > > The
> > >> > > > > > experiment setup in the Google doc simulates the scenario
> that
> > >> > broker
> > >> > > > is
> > >> > > > > > very slow handling ProduceRequest due to e.g. slow disk. It
> > >> > currently
> > >> > > > > > assumes that there is only 1 partition. But in the common
> > >> scenario,
> > >> > > it
> > >> > > > is
> > >> > > > > > probably reasonable to assume that there are many other
> > >> partitions
> > >> > > that
> > >> > > > > are
> > >> > > > > > also actively produced to and ProduceRequest to these
> > partition
> > >> > also
> > >> > > > > takes
> > >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can
> become
> > >> > > follower
> > >> > > > > for
> > >> > > > > > the partition 0 soon, it probably still needs to process the
> > >> > > > > ProduceRequest
> > >> > > > > > slowly t in the queue because these ProduceRequests cover
> > other
> > >> > > > > partitions.
> > >> > > > > > Thus most ProduceRequest will still timeout after 30 seconds
> > and
> > >> > most
> > >> > > > > > clients will still likely timeout after 30 seconds. Then it
> is
> > >> not
> > >> > > > > > obviously what is the benefit to client since client will
> > >> timeout
> > >> > > after
> > >> > > > > 30
> > >> > > > > > seconds before possibly re-connecting to broker1, with or
> > >> without
> > >> > > > > KIP-291.
> > >> > > > > > Did I miss something here?
> > >> > > > > >
> > >> > > > > > 2) I guess Eno's is asking for the specific benefits of this
> > >> KIP to
> > >> > > > user
> > >> > > > > or
> > >> > > > > > system administrator, e.g. whether this KIP decreases
> average
> > >> > > latency,
> > >> > > > > > 999th percentile latency, probably of exception exposed to
> > >> client
> > >> > > etc.
> > >> > > > It
> > >> > > > > > is probably useful to clarify this.
> > >> > > > > >
> > >> > > > > > 3) Does this KIP help improve user experience only when
> there
> > is
> > >> > > issue
> > >> > > > > with
> > >> > > > > > broker, e.g. significant backlog in the request queue due to
> > >> slow
> > >> > > disk
> > >> > > > as
> > >> > > > > > described in the Google doc? Or is this KIP also useful when
> > >> there
> > >> > is
> > >> > > > no
> > >> > > > > > ongoing issue in the cluster? It might be helpful to clarify
> > >> this
> > >> > to
> > >> > > > > > understand the benefit of this KIP.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Thanks much,
> > >> > > > > > Dong
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> > >> lucasatucla@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Eno,
> > >> > > > > > >
> > >> > > > > > > Sorry for the delay in getting the experiment results.
> > >> > > > > > > Here is a link to the positive impact achieved by
> > implementing
> > >> > the
> > >> > > > > > proposed
> > >> > > > > > > change:
> > >> > > > > > > https://docs.google.com/document/d/
> > 1ge2jjp5aPTBber6zaIT9AdhW
> > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > >> > > > > > > Please take a look when you have time and let me know your
> > >> > > feedback.
> > >> > > > > > >
> > >> > > > > > > Regards,
> > >> > > > > > > Lucas
> > >> > > > > > >
> > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io>
> > >> wrote:
> > >> > > > > > >
> > >> > > > > > > > Thanks for the pointer. Will take a look might suit our
> > >> > > > requirements
> > >> > > > > > > > better.
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Harsha
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > >> > > > lucasatucla@gmail.com
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Hi Harsha,
> > >> > > > > > > > >
> > >> > > > > > > > > If I understand correctly, the replication quota
> > mechanism
> > >> > > > proposed
> > >> > > > > > in
> > >> > > > > > > > > KIP-73 can be helpful in that scenario.
> > >> > > > > > > > > Have you tried it out?
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > > Lucas
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> > kafka@harsha.io
> > >> >
> > >> > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi Lucas,
> > >> > > > > > > > > > One more question, any thoughts on making this
> > >> configurable
> > >> > > > > > > > > > and also allowing subset of data requests to be
> > >> > prioritized.
> > >> > > > For
> > >> > > > > > > > example
> > >> > > > > > > > >
> > >> > > > > > > > > > ,we notice in our cluster when we take out a broker
> > and
> > >> > bring
> > >> > > > new
> > >> > > > > > one
> > >> > > > > > > > it
> > >> > > > > > > > >
> > >> > > > > > > > > > will try to become follower and have lot of fetch
> > >> requests
> > >> > to
> > >> > > > > other
> > >> > > > > > > > > leaders
> > >> > > > > > > > > > in clusters. This will negatively effect the
> > >> > > application/client
> > >> > > > > > > > > requests.
> > >> > > > > > > > > > We are also exploring the similar solution to
> > >> de-prioritize
> > >> > > if
> > >> > > > a
> > >> > > > > > new
> > >> > > > > > > > > > replica comes in for fetch requests, we are ok with
> > the
> > >> > > replica
> > >> > > > > to
> > >> > > > > > be
> > >> > > > > > > > > > taking time but the leaders should prioritize the
> > client
> > >> > > > > requests.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks,
> > >> > > > > > > > > > Harsha
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Hi Eno,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Sorry for the delayed response.
> > >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > >> > experimental
> > >> > > > > > results
> > >> > > > > > > > so
> > >> > > > > > > > >
> > >> > > > > > > > > > > far.
> > >> > > > > > > > > > > And I plan to test in out in the following days.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > - You are absolutely right that the priority queue
> > >> does
> > >> > not
> > >> > > > > > > > completely
> > >> > > > > > > > >
> > >> > > > > > > > > > > prevent
> > >> > > > > > > > > > > data requests being processed ahead of controller
> > >> > requests.
> > >> > > > > > > > > > > That being said, I expect it to greatly mitigate
> the
> > >> > effect
> > >> > > > of
> > >> > > > > > > stable
> > >> > > > > > > > > > > metadata.
> > >> > > > > > > > > > > In any case, I'll try it out and post the results
> > >> when I
> > >> > > have
> > >> > > > > it.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Regards,
> > >> > > > > > > > > > > Lucas
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > >> > > > > > > > eno.thereska@gmail.com
> > >> > > > > > > > > >
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Hi Lucas,
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Sorry for the delay, just had a look at this. A
> > >> couple
> > >> > of
> > >> > > > > > > > questions:
> > >> > > > > > > > >
> > >> > > > > > > > > > > > - did you notice any positive change after
> > >> implementing
> > >> > > > this
> > >> > > > > > KIP?
> > >> > > > > > > > > I'm
> > >> > > > > > > > > > > > wondering if you have any experimental results
> > that
> > >> > show
> > >> > > > the
> > >> > > > > > > > benefit
> > >> > > > > > > > > of
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > two queues.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > - priority is usually not sufficient in
> addressing
> > >> the
> > >> > > > > problem
> > >> > > > > > > the
> > >> > > > > > > > > KIP
> > >> > > > > > > > > > > > identifies. Even with priority queues, you will
> > >> > sometimes
> > >> > > > > > > (often?)
> > >> > > > > > > > > have
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > case that data plane requests will be ahead of
> the
> > >> > > control
> > >> > > > > > plane
> > >> > > > > > > > > > > requests.
> > >> > > > > > > > > > > > This happens because the system might have
> already
> > >> > > started
> > >> > > > > > > > > processing
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > data plane requests before the control plane
> ones
> > >> > > arrived.
> > >> > > > So
> > >> > > > > > it
> > >> > > > > > > > > would
> > >> > > > > > > > > > > be
> > >> > > > > > > > > > > > good to know what % of the problem this KIP
> > >> addresses.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Thanks
> > >> > > > > > > > > > > > Eno
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > >> > > > > yuzhihong@gmail.com
> > >> > > > > > >
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > Change looks good.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Thanks
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > >> > > > > > > > lucasatucla@gmail.com
> > >> > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Hi Ted,
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated the
> > KIP.
> > >> > > Please
> > >> > > > > > take
> > >> > > > > > > > > > another
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > look.
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Lucas
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > >> > > > > > > yuzhihong@gmail.com
> > >> > > > > > > > >
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > It would be good if you can include the
> > >> default
> > >> > > value
> > >> > > > > for
> > >> > > > > > > > this
> > >> > > > > > > > >
> > >> > > > > > > > > > new
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > config
> > >> > > > > > > > > > > > > > > in the KIP.
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Thanks
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas
> Wang
> > <
> > >> > > > > > > > > > lucasatucla@gmail.com
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Hi Ted, Dong
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> > config,
> > >> > > > instead
> > >> > > > > of
> > >> > > > > > > > > reusing
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > existing one.
> > >> > > > > > > > > > > > > > > > Please take another look when you have
> > time.
> > >> > > > Thanks a
> > >> > > > > > > lot!
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Lucas
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu
> <
> > >> > > > > > > > yuzhihong@gmail.com
> > >> > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> > control
> > >> > > request
> > >> > > > > > rate
> > >> > > > > > > is
> > >> > > > > > > > > low
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > I don't know if control request rate
> can
> > >> get
> > >> > to
> > >> > > > > > > 100,000,
> > >> > > > > > > > > > > likely
> > >> > > > > > > > > > > > > not.
> > >> > > > > > > > > > > > > > > Then
> > >> > > > > > > > > > > > > > > > > using the same bound as that for data
> > >> > requests
> > >> > > > > seems
> > >> > > > > > > > high.
> > >> > > > > > > > >
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM,
> Lucas
> > >> Wang
> > >> > <
> > >> > > > > > > > > > > > > lucasatucla@gmail.com >
> > >> > > > > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Hi Ted,
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this
> KIP.
> > >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > >> > > > > > "queued.max.requests"
> > >> > > > > > > in
> > >> > > > > > > > > > > > cluster A
> > >> > > > > > > > > > > > > > is
> > >> > > > > > > > > > > > > > > > > 1000,
> > >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> > >> 100,000.
> > >> > > > > > > > > > > > > > > > > > The 100 times difference might have
> > >> > indicated
> > >> > > > > that
> > >> > > > > > > > > machines
> > >> > > > > > > > > > > in
> > >> > > > > > > > > > > > > > > cluster
> > >> > > > > > > > > > > > > > > > B
> > >> > > > > > > > > > > > > > > > > > have larger memory.
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > By reusing the
> "queued.max.requests",
> > >> the
> > >> > > > > > > > > > > controlRequestQueue
> > >> > > > > > > > > > > > in
> > >> > > > > > > > > > > > > > > > cluster
> > >> > > > > > > > > > > > > > > > > B
> > >> > > > > > > > > > > > > > > > > > automatically
> > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> > explicitly
> > >> > > > bothering
> > >> > > > > > the
> > >> > > > > > > > > > > > operators.
> > >> > > > > > > > > > > > > > > > > > I understand the counter argument
> can
> > be
> > >> > that
> > >> > > > > maybe
> > >> > > > > > > > > that's
> > >> > > > > > > > > > a
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > waste
> > >> > > > > > > > > > > > > > of
> > >> > > > > > > > > > > > > > > > > > resource if control request
> > >> > > > > > > > > > > > > > > > > > rate is low and operators may want
> to
> > >> fine
> > >> > > tune
> > >> > > > > the
> > >> > > > > > > > > > capacity
> > >> > > > > > > > > > > of
> > >> > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and can
> > >> change
> > >> > > it
> > >> > > > if
> > >> > > > > > you
> > >> > > > > > > > or
> > >> > > > > > > > >
> > >> > > > > > > > > > > anyone
> > >> > > > > > > > > > > > > > else
> > >> > > > > > > > > > > > > > > > > feels
> > >> > > > > > > > > > > > > > > > > > strong about adding the extra
> config.
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Thanks,
> > >> > > > > > > > > > > > > > > > > > Lucas
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted
> > Yu
> > >> <
> > >> > > > > > > > > > yuzhihong@gmail.com
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > Lucas:
> > >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2,
> can
> > >> you
> > >> > > > > > elaborate
> > >> > > > > > > a
> > >> > > > > > > > > bit
> > >> > > > > > > > > > > more
> > >> > > > > > > > > > > > > on
> > >> > > > > > > > > > > > > > > why
> > >> > > > > > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > > > > separate config has bigger impact
> ?
> > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > Thanks
> > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> > Dong
> > >> > Lin <
> > >> > > > > > > > > > > > lindong28@gmail.com
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> > >> overall.
> > >> > > > Some
> > >> > > > > > > > > comments
> > >> > > > > > > > > > > > below:
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> > mbean
> > >> for
> > >> > > the
> > >> > > > > new
> > >> > > > > > > > > metrics
> > >> > > > > > > > > > > in
> > >> > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > KIP.
> > >> > > > > > > > > > > > > > > > > Can
> > >> > > > > > > > > > > > > > > > > > > you
> > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
> Interface
> > >> > > section
> > >> > > > > > > similar
> > >> > > > > > > > > to
> > >> > > > > > > > > > > > KIP-237
> > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> > >> Metrics>
> > >> > > > > > > > > > > > > > > > > > > > ?
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
> > >> > pattern
> > >> > > as
> > >> > > > > > > KIP-153
> > >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > >> > > > > > > > > > > > > metric>,
> > >> > > > > > > > > > > > > > > > > > > > where we keep the existing
> sensor
> > >> name
> > >> > > > > > > > > "BytesInPerSec"
> > >> > > > > > > > > > > and
> > >> > > > > > > > > > > > > add
> > >> > > > > > > > > > > > > > a
> > >> > > > > > > > > > > > > > > > new
> > >> > > > > > > > > > > > > > > > > > > sensor
> > >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
> rather
> > >> than
> > >> > > > > > replacing
> > >> > > > > > > > > the
> > >> > > > > > > > > > > > sensor
> > >> > > > > > > > > > > > > > > name "
> > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > >> > > > > "ClientBytesInPerSec".
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP changes
> > the
> > >> > > > semantics
> > >> > > > > > of
> > >> > > > > > > > the
> > >> > > > > > > > >
> > >> > > > > > > > > > > broker
> > >> > > > > > > > > > > > > > > config
> > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because
> the
> > >> > number
> > >> > > of
> > >> > > > > > total
> > >> > > > > > > > > > > requests
> > >> > > > > > > > > > > > > > queued
> > >> > > > > > > > > > > > > > > > in
> > >> > > > > > > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > > > > > broker will be no longer bounded
> > by
> > >> > > > > > > > > > > "queued.max.requests".
> > >> > > > > > > > > > > > > This
> > >> > > > > > > > > > > > > > > > > > probably
> > >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> > Public
> > >> > > > > Interfaces
> > >> > > > > > > > > section
> > >> > > > > > > > > > > for
> > >> > > > > > > > > > > > > > > > > discussion.
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > Thanks,
> > >> > > > > > > > > > > > > > > > > > > > Dong
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45
> PM,
> > >> Lucas
> > >> > > > Wang
> > >> > > > > <
> > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > >> > > > > > > > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> > >> separate
> > >> > > queue
> > >> > > > > for
> > >> > > > > > > > > > > controller
> > >> > > > > > > > > > > > > > > > requests:
> > >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > >> > > > > > > > > > confluence/display/KAFKA/KIP-
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > 291%
> > >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > >> > > > > > > > > > control+requests+and+data+
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > > requests
> > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > > Can you please take a look and
> > >> let me
> > >> > > > know
> > >> > > > > > your
> > >> > > > > > > > > > > feedback?
> > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > >> > > > > > > > > > > > > > > > > > > > > Regards,
> > >> > > > > > > > > > > > > > > > > > > > > Lucas
> > >> > > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Thanks for the update Lucas.

I think the motivation section is intuitive. It will be good to learn more
about the comments from other reviewers.

On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Dong,
>
> I've updated the motivation section of the KIP by explaining the cases that
> would have user impacts.
> Please take a look at let me know your comments.
>
> Thanks,
> Lucas
>
> On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Dong,
> >
> > The simulation of disk being slow is merely for me to easily construct a
> > testing scenario
> > with a backlog of produce requests. In production, other than the disk
> > being slow, a backlog of
> > produce requests may also be caused by high produce QPS.
> > In that case, we may not want to kill the broker and that's when this KIP
> > can be useful, both for JBOD
> > and non-JBOD setup.
> >
> > Going back to your previous question about each ProduceRequest covering
> 20
> > partitions that are randomly
> > distributed, let's say a LeaderAndIsr request is enqueued that tries to
> > switch the current broker, say broker0, from leader to follower
> > *for one of the partitions*, say *test-0*. For the sake of argument,
> > let's also assume the other brokers, say broker1, have *stopped* fetching
> > from
> > the current broker, i.e. broker0.
> > 1. If the enqueued produce requests have acks =  -1 (ALL)
> >   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR will be
> > put into the purgatory,
> >         and since they'll never be replicated to other brokers (because
> of
> > the assumption made above), they will
> >         be completed either when the LeaderAndISR request is processed or
> > when the timeout happens.
> >   1.2 With this KIP, broker0 will immediately transition the partition
> > test-0 to become a follower,
> >         after the current broker sees the replication of the remaining 19
> > partitions, it can send a response indicating that
> >         it's no longer the leader for the "test-0".
> >   To see the latency difference between 1.1 and 1.2, let's say there are
> > 24K produce requests ahead of the LeaderAndISR, and there are 8 io
> threads,
> >   so each io thread will process approximately 3000 produce requests. Now
> > let's investigate the io thread that finally processed the LeaderAndISR.
> >   For the 3000 produce requests, if we model the time when their
> remaining
> > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR request
> is
> > processed at time t3000.
> >   Without this KIP, the 1st produce request would have waited an extra
> > t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 - t1,
> etc.
> >   Roughly speaking, the latency difference is bigger for the earlier
> > produce requests than for the later ones. For the same reason, the more
> > ProduceRequests queued
> >   before the LeaderAndISR, the bigger benefit we get (capped by the
> > produce timeout).
> > 2. If the enqueued produce requests have acks=0 or acks=1
> >   There will be no latency differences in this case, but
> >   2.1 without this KIP, the records of partition test-0 in the
> > ProduceRequests ahead of the LeaderAndISR will be appended to the local
> log,
> >         and eventually be truncated after processing the LeaderAndISR.
> > This is what's referred to as
> >         "some unofficial definition of data loss in terms of messages
> > beyond the high watermark".
> >   2.2 with this KIP, we can mitigate the effect since if the LeaderAndISR
> > is immediately processed, the response to producers will have
> >         the NotLeaderForPartition error, causing producers to retry
> >
> > This explanation above is the benefit for reducing the latency of a
> broker
> > becoming the follower,
> > closely related is reducing the latency of a broker becoming the leader.
> > In this case, the benefit is even more obvious, if other brokers have
> > resigned leadership, and the
> > current broker should take leadership. Any delay in processing the
> > LeaderAndISR will be perceived
> > by clients as unavailability. In extreme cases, this can cause failed
> > produce requests if the retries are
> > exhausted.
> >
> > Another two types of controller requests are UpdateMetadata and
> > StopReplica, which I'll briefly discuss as follows:
> > For UpdateMetadata requests, delayed processing means clients receiving
> > stale metadata, e.g. with the wrong leadership info
> > for certain partitions, and the effect is more retries or even fatal
> > failure if the retries are exhausted.
> >
> > For StopReplica requests, a long queuing time may degrade the performance
> > of topic deletion.
> >
> > Regarding your last question of the delay for DescribeLogDirsRequest, you
> > are right
> > that this KIP cannot help with the latency in getting the log dirs info,
> > and it's only relevant
> > when controller requests are involved.
> >
> > Regards,
> > Lucas
> >
> >
> > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com> wrote:
> >
> >> Hey Jun,
> >>
> >> Thanks much for the comments. It is good point. So the feature may be
> >> useful for JBOD use-case. I have one question below.
> >>
> >> Hey Lucas,
> >>
> >> Do you think this feature is also useful for non-JBOD setup or it is
> only
> >> useful for the JBOD setup? It may be useful to understand this.
> >>
> >> When the broker is setup using JBOD, in order to move leaders on the
> >> failed
> >> disk to other disks, the system operator first needs to get the list of
> >> partitions on the failed disk. This is currently achieved using
> >> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
> >> broker. If we only prioritize the controller requests, then the
> >> DescribeLogDirsRequest
> >> may still take a long time to be processed by the broker. So the overall
> >> time to move leaders away from the failed disk may still be long even
> with
> >> this KIP. What do you think?
> >>
> >> Thanks,
> >> Dong
> >>
> >>
> >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >>
> >> > Thanks for the insightful comment, Jun.
> >> >
> >> > @Dong,
> >> > Since both of the two comments in your previous email are about the
> >> > benefits of this KIP and whether it's useful,
> >> > in light of Jun's last comment, do you agree that this KIP can be
> >> > beneficial in the case mentioned by Jun?
> >> > Please let me know, thanks!
> >> >
> >> > Regards,
> >> > Lucas
> >> >
> >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
> >> >
> >> > > Hi, Lucas, Dong,
> >> > >
> >> > > If all disks on a broker are slow, one probably should just kill the
> >> > > broker. In that case, this KIP may not help. If only one of the
> disks
> >> on
> >> > a
> >> > > broker is slow, one may want to fail that disk and move the leaders
> on
> >> > that
> >> > > disk to other brokers. In that case, being able to process the
> >> > LeaderAndIsr
> >> > > requests faster will potentially help the producers recover quicker.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com>
> wrote:
> >> > >
> >> > > > Hey Lucas,
> >> > > >
> >> > > > Thanks for the reply. Some follow up questions below.
> >> > > >
> >> > > > Regarding 1, if each ProduceRequest covers 20 partitions that are
> >> > > randomly
> >> > > > distributed across all partitions, then each ProduceRequest will
> >> likely
> >> > > > cover some partitions for which the broker is still leader after
> it
> >> > > quickly
> >> > > > processes the
> >> > > > LeaderAndIsrRequest. Then broker will still be slow in processing
> >> these
> >> > > > ProduceRequest and request will still be very high with this KIP.
> It
> >> > > seems
> >> > > > that most ProduceRequest will still timeout after 30 seconds. Is
> >> this
> >> > > > understanding correct?
> >> > > >
> >> > > > Regarding 2, if most ProduceRequest will still timeout after 30
> >> > seconds,
> >> > > > then it is less clear how this KIP reduces average produce
> latency.
> >> Can
> >> > > you
> >> > > > clarify what metrics can be improved by this KIP?
> >> > > >
> >> > > > Not sure why system operator directly cares number of truncated
> >> > messages.
> >> > > > Do you mean this KIP can improve average throughput or reduce
> >> message
> >> > > > duplication? It will be good to understand this.
> >> > > >
> >> > > > Thanks,
> >> > > > Dong
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Hi Dong,
> >> > > > >
> >> > > > > Thanks for your valuable comments. Please see my reply below.
> >> > > > >
> >> > > > > 1. The Google doc showed only 1 partition. Now let's consider a
> >> more
> >> > > > common
> >> > > > > scenario
> >> > > > > where broker0 is the leader of many partitions. And let's say
> for
> >> > some
> >> > > > > reason its IO becomes slow.
> >> > > > > The number of leader partitions on broker0 is so large, say 10K,
> >> that
> >> > > the
> >> > > > > cluster is skewed,
> >> > > > > and the operator would like to shift the leadership for a lot of
> >> > > > > partitions, say 9K, to other brokers,
> >> > > > > either manually or through some service like cruise control.
> >> > > > > With this KIP, not only will the leadership transitions finish
> >> more
> >> > > > > quickly, helping the cluster itself becoming more balanced,
> >> > > > > but all existing producers corresponding to the 9K partitions
> will
> >> > get
> >> > > > the
> >> > > > > errors relatively quickly
> >> > > > > rather than relying on their timeout, thanks to the batched
> async
> >> ZK
> >> > > > > operations.
> >> > > > > To me it's a useful feature to have during such troublesome
> times.
> >> > > > >
> >> > > > >
> >> > > > > 2. The experiments in the Google Doc have shown that with this
> KIP
> >> > many
> >> > > > > producers
> >> > > > > receive an explicit error NotLeaderForPartition, based on which
> >> they
> >> > > > retry
> >> > > > > immediately.
> >> > > > > Therefore the latency (~14 seconds+quick retry) for their single
> >> > > message
> >> > > > is
> >> > > > > much smaller
> >> > > > > compared with the case of timing out without the KIP (30 seconds
> >> for
> >> > > > timing
> >> > > > > out + quick retry).
> >> > > > > One might argue that reducing the timing out on the producer
> side
> >> can
> >> > > > > achieve the same result,
> >> > > > > yet reducing the timeout has its own drawbacks[1].
> >> > > > >
> >> > > > > Also *IF* there were a metric to show the number of truncated
> >> > messages
> >> > > on
> >> > > > > brokers,
> >> > > > > with the experiments done in the Google Doc, it should be easy
> to
> >> see
> >> > > > that
> >> > > > > a lot fewer messages need
> >> > > > > to be truncated on broker0 since the up-to-date metadata avoids
> >> > > appending
> >> > > > > of messages
> >> > > > > in subsequent PRODUCE requests. If we talk to a system operator
> >> and
> >> > ask
> >> > > > > whether
> >> > > > > they prefer fewer wasteful IOs, I bet most likely the answer is
> >> yes.
> >> > > > >
> >> > > > > 3. To answer your question, I think it might be helpful to
> >> construct
> >> > > some
> >> > > > > formulas.
> >> > > > > To simplify the modeling, I'm going back to the case where there
> >> is
> >> > > only
> >> > > > > ONE partition involved.
> >> > > > > Following the experiments in the Google Doc, let's say broker0
> >> > becomes
> >> > > > the
> >> > > > > follower at time t0,
> >> > > > > and after t0 there were still N produce requests in its request
> >> > queue.
> >> > > > > With the up-to-date metadata brought by this KIP, broker0 can
> >> reply
> >> > > with
> >> > > > an
> >> > > > > NotLeaderForPartition exception,
> >> > > > > let's use M1 to denote the average processing time of replying
> >> with
> >> > > such
> >> > > > an
> >> > > > > error message.
> >> > > > > Without this KIP, the broker will need to append messages to
> >> > segments,
> >> > > > > which may trigger a flush to disk,
> >> > > > > let's use M2 to denote the average processing time for such
> logic.
> >> > > > > Then the average extra latency incurred without this KIP is N *
> >> (M2 -
> >> > > > M1) /
> >> > > > > 2.
> >> > > > >
> >> > > > > In practice, M2 should always be larger than M1, which means as
> >> long
> >> > > as N
> >> > > > > is positive,
> >> > > > > we would see improvements on the average latency.
> >> > > > > There does not need to be significant backlog of requests in the
> >> > > request
> >> > > > > queue,
> >> > > > > or severe degradation of disk performance to have the
> improvement.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Lucas
> >> > > > >
> >> > > > >
> >> > > > > [1] For instance, reducing the timeout on the producer side can
> >> > trigger
> >> > > > > unnecessary duplicate requests
> >> > > > > when the corresponding leader broker is overloaded, exacerbating
> >> the
> >> > > > > situation.
> >> > > > >
> >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hey Lucas,
> >> > > > > >
> >> > > > > > Thanks much for the detailed documentation of the experiment.
> >> > > > > >
> >> > > > > > Initially I also think having a separate queue for controller
> >> > > requests
> >> > > > is
> >> > > > > > useful because, as you mentioned in the summary section of the
> >> > Google
> >> > > > > doc,
> >> > > > > > controller requests are generally more important than data
> >> requests
> >> > > and
> >> > > > > we
> >> > > > > > probably want controller requests to be processed sooner. But
> >> then
> >> > > Eno
> >> > > > > has
> >> > > > > > two very good questions which I am not sure the Google doc has
> >> > > answered
> >> > > > > > explicitly. Could you help with the following questions?
> >> > > > > >
> >> > > > > > 1) It is not very clear what is the actual benefit of KIP-291
> to
> >> > > users.
> >> > > > > The
> >> > > > > > experiment setup in the Google doc simulates the scenario that
> >> > broker
> >> > > > is
> >> > > > > > very slow handling ProduceRequest due to e.g. slow disk. It
> >> > currently
> >> > > > > > assumes that there is only 1 partition. But in the common
> >> scenario,
> >> > > it
> >> > > > is
> >> > > > > > probably reasonable to assume that there are many other
> >> partitions
> >> > > that
> >> > > > > are
> >> > > > > > also actively produced to and ProduceRequest to these
> partition
> >> > also
> >> > > > > takes
> >> > > > > > e.g. 2 seconds to be processed. So even if broker0 can become
> >> > > follower
> >> > > > > for
> >> > > > > > the partition 0 soon, it probably still needs to process the
> >> > > > > ProduceRequest
> >> > > > > > slowly t in the queue because these ProduceRequests cover
> other
> >> > > > > partitions.
> >> > > > > > Thus most ProduceRequest will still timeout after 30 seconds
> and
> >> > most
> >> > > > > > clients will still likely timeout after 30 seconds. Then it is
> >> not
> >> > > > > > obviously what is the benefit to client since client will
> >> timeout
> >> > > after
> >> > > > > 30
> >> > > > > > seconds before possibly re-connecting to broker1, with or
> >> without
> >> > > > > KIP-291.
> >> > > > > > Did I miss something here?
> >> > > > > >
> >> > > > > > 2) I guess Eno's is asking for the specific benefits of this
> >> KIP to
> >> > > > user
> >> > > > > or
> >> > > > > > system administrator, e.g. whether this KIP decreases average
> >> > > latency,
> >> > > > > > 999th percentile latency, probably of exception exposed to
> >> client
> >> > > etc.
> >> > > > It
> >> > > > > > is probably useful to clarify this.
> >> > > > > >
> >> > > > > > 3) Does this KIP help improve user experience only when there
> is
> >> > > issue
> >> > > > > with
> >> > > > > > broker, e.g. significant backlog in the request queue due to
> >> slow
> >> > > disk
> >> > > > as
> >> > > > > > described in the Google doc? Or is this KIP also useful when
> >> there
> >> > is
> >> > > > no
> >> > > > > > ongoing issue in the cluster? It might be helpful to clarify
> >> this
> >> > to
> >> > > > > > understand the benefit of this KIP.
> >> > > > > >
> >> > > > > >
> >> > > > > > Thanks much,
> >> > > > > > Dong
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> >> lucasatucla@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Eno,
> >> > > > > > >
> >> > > > > > > Sorry for the delay in getting the experiment results.
> >> > > > > > > Here is a link to the positive impact achieved by
> implementing
> >> > the
> >> > > > > > proposed
> >> > > > > > > change:
> >> > > > > > > https://docs.google.com/document/d/
> 1ge2jjp5aPTBber6zaIT9AdhW
> >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> >> > > > > > > Please take a look when you have time and let me know your
> >> > > feedback.
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Lucas
> >> > > > > > >
> >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io>
> >> wrote:
> >> > > > > > >
> >> > > > > > > > Thanks for the pointer. Will take a look might suit our
> >> > > > requirements
> >> > > > > > > > better.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Harsha
> >> > > > > > > >
> >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> >> > > > lucasatucla@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Hi Harsha,
> >> > > > > > > > >
> >> > > > > > > > > If I understand correctly, the replication quota
> mechanism
> >> > > > proposed
> >> > > > > > in
> >> > > > > > > > > KIP-73 can be helpful in that scenario.
> >> > > > > > > > > Have you tried it out?
> >> > > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > > Lucas
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
> kafka@harsha.io
> >> >
> >> > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Lucas,
> >> > > > > > > > > > One more question, any thoughts on making this
> >> configurable
> >> > > > > > > > > > and also allowing subset of data requests to be
> >> > prioritized.
> >> > > > For
> >> > > > > > > > example
> >> > > > > > > > >
> >> > > > > > > > > > ,we notice in our cluster when we take out a broker
> and
> >> > bring
> >> > > > new
> >> > > > > > one
> >> > > > > > > > it
> >> > > > > > > > >
> >> > > > > > > > > > will try to become follower and have lot of fetch
> >> requests
> >> > to
> >> > > > > other
> >> > > > > > > > > leaders
> >> > > > > > > > > > in clusters. This will negatively effect the
> >> > > application/client
> >> > > > > > > > > requests.
> >> > > > > > > > > > We are also exploring the similar solution to
> >> de-prioritize
> >> > > if
> >> > > > a
> >> > > > > > new
> >> > > > > > > > > > replica comes in for fetch requests, we are ok with
> the
> >> > > replica
> >> > > > > to
> >> > > > > > be
> >> > > > > > > > > > taking time but the leaders should prioritize the
> client
> >> > > > > requests.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Harsha
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Hi Eno,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Sorry for the delayed response.
> >> > > > > > > > > > > - I haven't implemented the feature yet, so no
> >> > experimental
> >> > > > > > results
> >> > > > > > > > so
> >> > > > > > > > >
> >> > > > > > > > > > > far.
> >> > > > > > > > > > > And I plan to test in out in the following days.
> >> > > > > > > > > > >
> >> > > > > > > > > > > - You are absolutely right that the priority queue
> >> does
> >> > not
> >> > > > > > > > completely
> >> > > > > > > > >
> >> > > > > > > > > > > prevent
> >> > > > > > > > > > > data requests being processed ahead of controller
> >> > requests.
> >> > > > > > > > > > > That being said, I expect it to greatly mitigate the
> >> > effect
> >> > > > of
> >> > > > > > > stable
> >> > > > > > > > > > > metadata.
> >> > > > > > > > > > > In any case, I'll try it out and post the results
> >> when I
> >> > > have
> >> > > > > it.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Regards,
> >> > > > > > > > > > > Lucas
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> >> > > > > > > > eno.thereska@gmail.com
> >> > > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Lucas,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Sorry for the delay, just had a look at this. A
> >> couple
> >> > of
> >> > > > > > > > questions:
> >> > > > > > > > >
> >> > > > > > > > > > > > - did you notice any positive change after
> >> implementing
> >> > > > this
> >> > > > > > KIP?
> >> > > > > > > > > I'm
> >> > > > > > > > > > > > wondering if you have any experimental results
> that
> >> > show
> >> > > > the
> >> > > > > > > > benefit
> >> > > > > > > > > of
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > two queues.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - priority is usually not sufficient in addressing
> >> the
> >> > > > > problem
> >> > > > > > > the
> >> > > > > > > > > KIP
> >> > > > > > > > > > > > identifies. Even with priority queues, you will
> >> > sometimes
> >> > > > > > > (often?)
> >> > > > > > > > > have
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > case that data plane requests will be ahead of the
> >> > > control
> >> > > > > > plane
> >> > > > > > > > > > > requests.
> >> > > > > > > > > > > > This happens because the system might have already
> >> > > started
> >> > > > > > > > > processing
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > data plane requests before the control plane ones
> >> > > arrived.
> >> > > > So
> >> > > > > > it
> >> > > > > > > > > would
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > good to know what % of the problem this KIP
> >> addresses.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > Eno
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> >> > > > > yuzhihong@gmail.com
> >> > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Change looks good.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> >> > > > > > > > lucasatucla@gmail.com
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Ted,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks for the suggestion. I've updated the
> KIP.
> >> > > Please
> >> > > > > > take
> >> > > > > > > > > > another
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > look.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> >> > > > > > > yuzhihong@gmail.com
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > It would be good if you can include the
> >> default
> >> > > value
> >> > > > > for
> >> > > > > > > > this
> >> > > > > > > > >
> >> > > > > > > > > > new
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > config
> >> > > > > > > > > > > > > > > in the KIP.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang
> <
> >> > > > > > > > > > lucasatucla@gmail.com
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Ted, Dong
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I've updated the KIP by adding a new
> config,
> >> > > > instead
> >> > > > > of
> >> > > > > > > > > reusing
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > existing one.
> >> > > > > > > > > > > > > > > > Please take another look when you have
> time.
> >> > > > Thanks a
> >> > > > > > > lot!
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> >> > > > > > > > yuzhihong@gmail.com
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > bq. that's a waste of resource if
> control
> >> > > request
> >> > > > > > rate
> >> > > > > > > is
> >> > > > > > > > > low
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > I don't know if control request rate can
> >> get
> >> > to
> >> > > > > > > 100,000,
> >> > > > > > > > > > > likely
> >> > > > > > > > > > > > > not.
> >> > > > > > > > > > > > > > > Then
> >> > > > > > > > > > > > > > > > > using the same bound as that for data
> >> > requests
> >> > > > > seems
> >> > > > > > > > high.
> >> > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas
> >> Wang
> >> > <
> >> > > > > > > > > > > > > lucasatucla@gmail.com >
> >> > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Hi Ted,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> >> > > > > > > > > > > > > > > > > > Let's say today the setting of
> >> > > > > > "queued.max.requests"
> >> > > > > > > in
> >> > > > > > > > > > > > cluster A
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > 1000,
> >> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> >> 100,000.
> >> > > > > > > > > > > > > > > > > > The 100 times difference might have
> >> > indicated
> >> > > > > that
> >> > > > > > > > > machines
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > B
> >> > > > > > > > > > > > > > > > > > have larger memory.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > By reusing the "queued.max.requests",
> >> the
> >> > > > > > > > > > > controlRequestQueue
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > B
> >> > > > > > > > > > > > > > > > > > automatically
> >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
> explicitly
> >> > > > bothering
> >> > > > > > the
> >> > > > > > > > > > > > operators.
> >> > > > > > > > > > > > > > > > > > I understand the counter argument can
> be
> >> > that
> >> > > > > maybe
> >> > > > > > > > > that's
> >> > > > > > > > > > a
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > waste
> >> > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > > resource if control request
> >> > > > > > > > > > > > > > > > > > rate is low and operators may want to
> >> fine
> >> > > tune
> >> > > > > the
> >> > > > > > > > > > capacity
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > controlRequestQueue.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I'm ok with either approach, and can
> >> change
> >> > > it
> >> > > > if
> >> > > > > > you
> >> > > > > > > > or
> >> > > > > > > > >
> >> > > > > > > > > > > anyone
> >> > > > > > > > > > > > > > else
> >> > > > > > > > > > > > > > > > > feels
> >> > > > > > > > > > > > > > > > > > strong about adding the extra config.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted
> Yu
> >> <
> >> > > > > > > > > > yuzhihong@gmail.com
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Lucas:
> >> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can
> >> you
> >> > > > > > elaborate
> >> > > > > > > a
> >> > > > > > > > > bit
> >> > > > > > > > > > > more
> >> > > > > > > > > > > > > on
> >> > > > > > > > > > > > > > > why
> >> > > > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > separate config has bigger impact ?
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM,
> Dong
> >> > Lin <
> >> > > > > > > > > > > > lindong28@gmail.com
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Hey Luca,
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> >> overall.
> >> > > > Some
> >> > > > > > > > > comments
> >> > > > > > > > > > > > below:
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - We usually specify the full
> mbean
> >> for
> >> > > the
> >> > > > > new
> >> > > > > > > > > metrics
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > KIP.
> >> > > > > > > > > > > > > > > > > Can
> >> > > > > > > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > > > > specify it in the Public Interface
> >> > > section
> >> > > > > > > similar
> >> > > > > > > > > to
> >> > > > > > > > > > > > KIP-237
> >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> >> Metrics>
> >> > > > > > > > > > > > > > > > > > > > ?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
> >> > pattern
> >> > > as
> >> > > > > > > KIP-153
> >> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> >> > > > > > > > > > > > > metric>,
> >> > > > > > > > > > > > > > > > > > > > where we keep the existing sensor
> >> name
> >> > > > > > > > > "BytesInPerSec"
> >> > > > > > > > > > > and
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > > > sensor
> >> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather
> >> than
> >> > > > > > replacing
> >> > > > > > > > > the
> >> > > > > > > > > > > > sensor
> >> > > > > > > > > > > > > > > name "
> >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> >> > > > > "ClientBytesInPerSec".
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP changes
> the
> >> > > > semantics
> >> > > > > > of
> >> > > > > > > > the
> >> > > > > > > > >
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > config
> >> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because the
> >> > number
> >> > > of
> >> > > > > > total
> >> > > > > > > > > > > requests
> >> > > > > > > > > > > > > > queued
> >> > > > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > > broker will be no longer bounded
> by
> >> > > > > > > > > > > "queued.max.requests".
> >> > > > > > > > > > > > > This
> >> > > > > > > > > > > > > > > > > > probably
> >> > > > > > > > > > > > > > > > > > > > needs to be specified in the
> Public
> >> > > > > Interfaces
> >> > > > > > > > > section
> >> > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > discussion.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > > > > > Dong
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM,
> >> Lucas
> >> > > > Wang
> >> > > > > <
> >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> >> > > > > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
> >> separate
> >> > > queue
> >> > > > > for
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > > requests:
> >> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> >> > > > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > >
> >> > > > > > > > > > > > 291%
> >> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> >> > > > > > > > > > control+requests+and+data+
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > > requests
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Can you please take a look and
> >> let me
> >> > > > know
> >> > > > > > your
> >> > > > > > > > > > > feedback?
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> >> > > > > > > > > > > > > > > > > > > > > Regards,
> >> > > > > > > > > > > > > > > > > > > > > Lucas
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Dong,

I've updated the motivation section of the KIP by explaining the cases that
would have user impacts.
Please take a look at let me know your comments.

Thanks,
Lucas

On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Dong,
>
> The simulation of disk being slow is merely for me to easily construct a
> testing scenario
> with a backlog of produce requests. In production, other than the disk
> being slow, a backlog of
> produce requests may also be caused by high produce QPS.
> In that case, we may not want to kill the broker and that's when this KIP
> can be useful, both for JBOD
> and non-JBOD setup.
>
> Going back to your previous question about each ProduceRequest covering 20
> partitions that are randomly
> distributed, let's say a LeaderAndIsr request is enqueued that tries to
> switch the current broker, say broker0, from leader to follower
> *for one of the partitions*, say *test-0*. For the sake of argument,
> let's also assume the other brokers, say broker1, have *stopped* fetching
> from
> the current broker, i.e. broker0.
> 1. If the enqueued produce requests have acks =  -1 (ALL)
>   1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR will be
> put into the purgatory,
>         and since they'll never be replicated to other brokers (because of
> the assumption made above), they will
>         be completed either when the LeaderAndISR request is processed or
> when the timeout happens.
>   1.2 With this KIP, broker0 will immediately transition the partition
> test-0 to become a follower,
>         after the current broker sees the replication of the remaining 19
> partitions, it can send a response indicating that
>         it's no longer the leader for the "test-0".
>   To see the latency difference between 1.1 and 1.2, let's say there are
> 24K produce requests ahead of the LeaderAndISR, and there are 8 io threads,
>   so each io thread will process approximately 3000 produce requests. Now
> let's investigate the io thread that finally processed the LeaderAndISR.
>   For the 3000 produce requests, if we model the time when their remaining
> 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR request is
> processed at time t3000.
>   Without this KIP, the 1st produce request would have waited an extra
> t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 - t1, etc.
>   Roughly speaking, the latency difference is bigger for the earlier
> produce requests than for the later ones. For the same reason, the more
> ProduceRequests queued
>   before the LeaderAndISR, the bigger benefit we get (capped by the
> produce timeout).
> 2. If the enqueued produce requests have acks=0 or acks=1
>   There will be no latency differences in this case, but
>   2.1 without this KIP, the records of partition test-0 in the
> ProduceRequests ahead of the LeaderAndISR will be appended to the local log,
>         and eventually be truncated after processing the LeaderAndISR.
> This is what's referred to as
>         "some unofficial definition of data loss in terms of messages
> beyond the high watermark".
>   2.2 with this KIP, we can mitigate the effect since if the LeaderAndISR
> is immediately processed, the response to producers will have
>         the NotLeaderForPartition error, causing producers to retry
>
> This explanation above is the benefit for reducing the latency of a broker
> becoming the follower,
> closely related is reducing the latency of a broker becoming the leader.
> In this case, the benefit is even more obvious, if other brokers have
> resigned leadership, and the
> current broker should take leadership. Any delay in processing the
> LeaderAndISR will be perceived
> by clients as unavailability. In extreme cases, this can cause failed
> produce requests if the retries are
> exhausted.
>
> Another two types of controller requests are UpdateMetadata and
> StopReplica, which I'll briefly discuss as follows:
> For UpdateMetadata requests, delayed processing means clients receiving
> stale metadata, e.g. with the wrong leadership info
> for certain partitions, and the effect is more retries or even fatal
> failure if the retries are exhausted.
>
> For StopReplica requests, a long queuing time may degrade the performance
> of topic deletion.
>
> Regarding your last question of the delay for DescribeLogDirsRequest, you
> are right
> that this KIP cannot help with the latency in getting the log dirs info,
> and it's only relevant
> when controller requests are involved.
>
> Regards,
> Lucas
>
>
> On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com> wrote:
>
>> Hey Jun,
>>
>> Thanks much for the comments. It is good point. So the feature may be
>> useful for JBOD use-case. I have one question below.
>>
>> Hey Lucas,
>>
>> Do you think this feature is also useful for non-JBOD setup or it is only
>> useful for the JBOD setup? It may be useful to understand this.
>>
>> When the broker is setup using JBOD, in order to move leaders on the
>> failed
>> disk to other disks, the system operator first needs to get the list of
>> partitions on the failed disk. This is currently achieved using
>> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
>> broker. If we only prioritize the controller requests, then the
>> DescribeLogDirsRequest
>> may still take a long time to be processed by the broker. So the overall
>> time to move leaders away from the failed disk may still be long even with
>> this KIP. What do you think?
>>
>> Thanks,
>> Dong
>>
>>
>> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com> wrote:
>>
>> > Thanks for the insightful comment, Jun.
>> >
>> > @Dong,
>> > Since both of the two comments in your previous email are about the
>> > benefits of this KIP and whether it's useful,
>> > in light of Jun's last comment, do you agree that this KIP can be
>> > beneficial in the case mentioned by Jun?
>> > Please let me know, thanks!
>> >
>> > Regards,
>> > Lucas
>> >
>> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
>> >
>> > > Hi, Lucas, Dong,
>> > >
>> > > If all disks on a broker are slow, one probably should just kill the
>> > > broker. In that case, this KIP may not help. If only one of the disks
>> on
>> > a
>> > > broker is slow, one may want to fail that disk and move the leaders on
>> > that
>> > > disk to other brokers. In that case, being able to process the
>> > LeaderAndIsr
>> > > requests faster will potentially help the producers recover quicker.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com> wrote:
>> > >
>> > > > Hey Lucas,
>> > > >
>> > > > Thanks for the reply. Some follow up questions below.
>> > > >
>> > > > Regarding 1, if each ProduceRequest covers 20 partitions that are
>> > > randomly
>> > > > distributed across all partitions, then each ProduceRequest will
>> likely
>> > > > cover some partitions for which the broker is still leader after it
>> > > quickly
>> > > > processes the
>> > > > LeaderAndIsrRequest. Then broker will still be slow in processing
>> these
>> > > > ProduceRequest and request will still be very high with this KIP. It
>> > > seems
>> > > > that most ProduceRequest will still timeout after 30 seconds. Is
>> this
>> > > > understanding correct?
>> > > >
>> > > > Regarding 2, if most ProduceRequest will still timeout after 30
>> > seconds,
>> > > > then it is less clear how this KIP reduces average produce latency.
>> Can
>> > > you
>> > > > clarify what metrics can be improved by this KIP?
>> > > >
>> > > > Not sure why system operator directly cares number of truncated
>> > messages.
>> > > > Do you mean this KIP can improve average throughput or reduce
>> message
>> > > > duplication? It will be good to understand this.
>> > > >
>> > > > Thanks,
>> > > > Dong
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Hi Dong,
>> > > > >
>> > > > > Thanks for your valuable comments. Please see my reply below.
>> > > > >
>> > > > > 1. The Google doc showed only 1 partition. Now let's consider a
>> more
>> > > > common
>> > > > > scenario
>> > > > > where broker0 is the leader of many partitions. And let's say for
>> > some
>> > > > > reason its IO becomes slow.
>> > > > > The number of leader partitions on broker0 is so large, say 10K,
>> that
>> > > the
>> > > > > cluster is skewed,
>> > > > > and the operator would like to shift the leadership for a lot of
>> > > > > partitions, say 9K, to other brokers,
>> > > > > either manually or through some service like cruise control.
>> > > > > With this KIP, not only will the leadership transitions finish
>> more
>> > > > > quickly, helping the cluster itself becoming more balanced,
>> > > > > but all existing producers corresponding to the 9K partitions will
>> > get
>> > > > the
>> > > > > errors relatively quickly
>> > > > > rather than relying on their timeout, thanks to the batched async
>> ZK
>> > > > > operations.
>> > > > > To me it's a useful feature to have during such troublesome times.
>> > > > >
>> > > > >
>> > > > > 2. The experiments in the Google Doc have shown that with this KIP
>> > many
>> > > > > producers
>> > > > > receive an explicit error NotLeaderForPartition, based on which
>> they
>> > > > retry
>> > > > > immediately.
>> > > > > Therefore the latency (~14 seconds+quick retry) for their single
>> > > message
>> > > > is
>> > > > > much smaller
>> > > > > compared with the case of timing out without the KIP (30 seconds
>> for
>> > > > timing
>> > > > > out + quick retry).
>> > > > > One might argue that reducing the timing out on the producer side
>> can
>> > > > > achieve the same result,
>> > > > > yet reducing the timeout has its own drawbacks[1].
>> > > > >
>> > > > > Also *IF* there were a metric to show the number of truncated
>> > messages
>> > > on
>> > > > > brokers,
>> > > > > with the experiments done in the Google Doc, it should be easy to
>> see
>> > > > that
>> > > > > a lot fewer messages need
>> > > > > to be truncated on broker0 since the up-to-date metadata avoids
>> > > appending
>> > > > > of messages
>> > > > > in subsequent PRODUCE requests. If we talk to a system operator
>> and
>> > ask
>> > > > > whether
>> > > > > they prefer fewer wasteful IOs, I bet most likely the answer is
>> yes.
>> > > > >
>> > > > > 3. To answer your question, I think it might be helpful to
>> construct
>> > > some
>> > > > > formulas.
>> > > > > To simplify the modeling, I'm going back to the case where there
>> is
>> > > only
>> > > > > ONE partition involved.
>> > > > > Following the experiments in the Google Doc, let's say broker0
>> > becomes
>> > > > the
>> > > > > follower at time t0,
>> > > > > and after t0 there were still N produce requests in its request
>> > queue.
>> > > > > With the up-to-date metadata brought by this KIP, broker0 can
>> reply
>> > > with
>> > > > an
>> > > > > NotLeaderForPartition exception,
>> > > > > let's use M1 to denote the average processing time of replying
>> with
>> > > such
>> > > > an
>> > > > > error message.
>> > > > > Without this KIP, the broker will need to append messages to
>> > segments,
>> > > > > which may trigger a flush to disk,
>> > > > > let's use M2 to denote the average processing time for such logic.
>> > > > > Then the average extra latency incurred without this KIP is N *
>> (M2 -
>> > > > M1) /
>> > > > > 2.
>> > > > >
>> > > > > In practice, M2 should always be larger than M1, which means as
>> long
>> > > as N
>> > > > > is positive,
>> > > > > we would see improvements on the average latency.
>> > > > > There does not need to be significant backlog of requests in the
>> > > request
>> > > > > queue,
>> > > > > or severe degradation of disk performance to have the improvement.
>> > > > >
>> > > > > Regards,
>> > > > > Lucas
>> > > > >
>> > > > >
>> > > > > [1] For instance, reducing the timeout on the producer side can
>> > trigger
>> > > > > unnecessary duplicate requests
>> > > > > when the corresponding leader broker is overloaded, exacerbating
>> the
>> > > > > situation.
>> > > > >
>> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hey Lucas,
>> > > > > >
>> > > > > > Thanks much for the detailed documentation of the experiment.
>> > > > > >
>> > > > > > Initially I also think having a separate queue for controller
>> > > requests
>> > > > is
>> > > > > > useful because, as you mentioned in the summary section of the
>> > Google
>> > > > > doc,
>> > > > > > controller requests are generally more important than data
>> requests
>> > > and
>> > > > > we
>> > > > > > probably want controller requests to be processed sooner. But
>> then
>> > > Eno
>> > > > > has
>> > > > > > two very good questions which I am not sure the Google doc has
>> > > answered
>> > > > > > explicitly. Could you help with the following questions?
>> > > > > >
>> > > > > > 1) It is not very clear what is the actual benefit of KIP-291 to
>> > > users.
>> > > > > The
>> > > > > > experiment setup in the Google doc simulates the scenario that
>> > broker
>> > > > is
>> > > > > > very slow handling ProduceRequest due to e.g. slow disk. It
>> > currently
>> > > > > > assumes that there is only 1 partition. But in the common
>> scenario,
>> > > it
>> > > > is
>> > > > > > probably reasonable to assume that there are many other
>> partitions
>> > > that
>> > > > > are
>> > > > > > also actively produced to and ProduceRequest to these partition
>> > also
>> > > > > takes
>> > > > > > e.g. 2 seconds to be processed. So even if broker0 can become
>> > > follower
>> > > > > for
>> > > > > > the partition 0 soon, it probably still needs to process the
>> > > > > ProduceRequest
>> > > > > > slowly t in the queue because these ProduceRequests cover other
>> > > > > partitions.
>> > > > > > Thus most ProduceRequest will still timeout after 30 seconds and
>> > most
>> > > > > > clients will still likely timeout after 30 seconds. Then it is
>> not
>> > > > > > obviously what is the benefit to client since client will
>> timeout
>> > > after
>> > > > > 30
>> > > > > > seconds before possibly re-connecting to broker1, with or
>> without
>> > > > > KIP-291.
>> > > > > > Did I miss something here?
>> > > > > >
>> > > > > > 2) I guess Eno's is asking for the specific benefits of this
>> KIP to
>> > > > user
>> > > > > or
>> > > > > > system administrator, e.g. whether this KIP decreases average
>> > > latency,
>> > > > > > 999th percentile latency, probably of exception exposed to
>> client
>> > > etc.
>> > > > It
>> > > > > > is probably useful to clarify this.
>> > > > > >
>> > > > > > 3) Does this KIP help improve user experience only when there is
>> > > issue
>> > > > > with
>> > > > > > broker, e.g. significant backlog in the request queue due to
>> slow
>> > > disk
>> > > > as
>> > > > > > described in the Google doc? Or is this KIP also useful when
>> there
>> > is
>> > > > no
>> > > > > > ongoing issue in the cluster? It might be helpful to clarify
>> this
>> > to
>> > > > > > understand the benefit of this KIP.
>> > > > > >
>> > > > > >
>> > > > > > Thanks much,
>> > > > > > Dong
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
>> lucasatucla@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi Eno,
>> > > > > > >
>> > > > > > > Sorry for the delay in getting the experiment results.
>> > > > > > > Here is a link to the positive impact achieved by implementing
>> > the
>> > > > > > proposed
>> > > > > > > change:
>> > > > > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > > > Please take a look when you have time and let me know your
>> > > feedback.
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Lucas
>> > > > > > >
>> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io>
>> wrote:
>> > > > > > >
>> > > > > > > > Thanks for the pointer. Will take a look might suit our
>> > > > requirements
>> > > > > > > > better.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Harsha
>> > > > > > > >
>> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
>> > > > lucasatucla@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Hi Harsha,
>> > > > > > > > >
>> > > > > > > > > If I understand correctly, the replication quota mechanism
>> > > > proposed
>> > > > > > in
>> > > > > > > > > KIP-73 can be helpful in that scenario.
>> > > > > > > > > Have you tried it out?
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Lucas
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io
>> >
>> > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Lucas,
>> > > > > > > > > > One more question, any thoughts on making this
>> configurable
>> > > > > > > > > > and also allowing subset of data requests to be
>> > prioritized.
>> > > > For
>> > > > > > > > example
>> > > > > > > > >
>> > > > > > > > > > ,we notice in our cluster when we take out a broker and
>> > bring
>> > > > new
>> > > > > > one
>> > > > > > > > it
>> > > > > > > > >
>> > > > > > > > > > will try to become follower and have lot of fetch
>> requests
>> > to
>> > > > > other
>> > > > > > > > > leaders
>> > > > > > > > > > in clusters. This will negatively effect the
>> > > application/client
>> > > > > > > > > requests.
>> > > > > > > > > > We are also exploring the similar solution to
>> de-prioritize
>> > > if
>> > > > a
>> > > > > > new
>> > > > > > > > > > replica comes in for fetch requests, we are ok with the
>> > > replica
>> > > > > to
>> > > > > > be
>> > > > > > > > > > taking time but the leaders should prioritize the client
>> > > > > requests.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Harsha
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Hi Eno,
>> > > > > > > > > > >
>> > > > > > > > > > > Sorry for the delayed response.
>> > > > > > > > > > > - I haven't implemented the feature yet, so no
>> > experimental
>> > > > > > results
>> > > > > > > > so
>> > > > > > > > >
>> > > > > > > > > > > far.
>> > > > > > > > > > > And I plan to test in out in the following days.
>> > > > > > > > > > >
>> > > > > > > > > > > - You are absolutely right that the priority queue
>> does
>> > not
>> > > > > > > > completely
>> > > > > > > > >
>> > > > > > > > > > > prevent
>> > > > > > > > > > > data requests being processed ahead of controller
>> > requests.
>> > > > > > > > > > > That being said, I expect it to greatly mitigate the
>> > effect
>> > > > of
>> > > > > > > stable
>> > > > > > > > > > > metadata.
>> > > > > > > > > > > In any case, I'll try it out and post the results
>> when I
>> > > have
>> > > > > it.
>> > > > > > > > > > >
>> > > > > > > > > > > Regards,
>> > > > > > > > > > > Lucas
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
>> > > > > > > > eno.thereska@gmail.com
>> > > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Lucas,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Sorry for the delay, just had a look at this. A
>> couple
>> > of
>> > > > > > > > questions:
>> > > > > > > > >
>> > > > > > > > > > > > - did you notice any positive change after
>> implementing
>> > > > this
>> > > > > > KIP?
>> > > > > > > > > I'm
>> > > > > > > > > > > > wondering if you have any experimental results that
>> > show
>> > > > the
>> > > > > > > > benefit
>> > > > > > > > > of
>> > > > > > > > > > > the
>> > > > > > > > > > > > two queues.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - priority is usually not sufficient in addressing
>> the
>> > > > > problem
>> > > > > > > the
>> > > > > > > > > KIP
>> > > > > > > > > > > > identifies. Even with priority queues, you will
>> > sometimes
>> > > > > > > (often?)
>> > > > > > > > > have
>> > > > > > > > > > > the
>> > > > > > > > > > > > case that data plane requests will be ahead of the
>> > > control
>> > > > > > plane
>> > > > > > > > > > > requests.
>> > > > > > > > > > > > This happens because the system might have already
>> > > started
>> > > > > > > > > processing
>> > > > > > > > > > > the
>> > > > > > > > > > > > data plane requests before the control plane ones
>> > > arrived.
>> > > > So
>> > > > > > it
>> > > > > > > > > would
>> > > > > > > > > > > be
>> > > > > > > > > > > > good to know what % of the problem this KIP
>> addresses.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks
>> > > > > > > > > > > > Eno
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
>> > > > > yuzhihong@gmail.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Change looks good.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
>> > > > > > > > lucasatucla@gmail.com
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi Ted,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks for the suggestion. I've updated the KIP.
>> > > Please
>> > > > > > take
>> > > > > > > > > > another
>> > > > > > > > > > >
>> > > > > > > > > > > > > look.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Lucas
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
>> > > > > > > yuzhihong@gmail.com
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > It would be good if you can include the
>> default
>> > > value
>> > > > > for
>> > > > > > > > this
>> > > > > > > > >
>> > > > > > > > > > new
>> > > > > > > > > > >
>> > > > > > > > > > > > > config
>> > > > > > > > > > > > > > > in the KIP.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
>> > > > > > > > > > lucasatucla@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Ted, Dong
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I've updated the KIP by adding a new config,
>> > > > instead
>> > > > > of
>> > > > > > > > > reusing
>> > > > > > > > > > > the
>> > > > > > > > > > > > > > > > existing one.
>> > > > > > > > > > > > > > > > Please take another look when you have time.
>> > > > Thanks a
>> > > > > > > lot!
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Lucas
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
>> > > > > > > > yuzhihong@gmail.com
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > bq. that's a waste of resource if control
>> > > request
>> > > > > > rate
>> > > > > > > is
>> > > > > > > > > low
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I don't know if control request rate can
>> get
>> > to
>> > > > > > > 100,000,
>> > > > > > > > > > > likely
>> > > > > > > > > > > > > not.
>> > > > > > > > > > > > > > > Then
>> > > > > > > > > > > > > > > > > using the same bound as that for data
>> > requests
>> > > > > seems
>> > > > > > > > high.
>> > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas
>> Wang
>> > <
>> > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Hi Ted,
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
>> > > > > > > > > > > > > > > > > > Let's say today the setting of
>> > > > > > "queued.max.requests"
>> > > > > > > in
>> > > > > > > > > > > > cluster A
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > 1000,
>> > > > > > > > > > > > > > > > > > while the setting in cluster B is
>> 100,000.
>> > > > > > > > > > > > > > > > > > The 100 times difference might have
>> > indicated
>> > > > > that
>> > > > > > > > > machines
>> > > > > > > > > > > in
>> > > > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > > B
>> > > > > > > > > > > > > > > > > > have larger memory.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > By reusing the "queued.max.requests",
>> the
>> > > > > > > > > > > controlRequestQueue
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > cluster
>> > > > > > > > > > > > > > > > > B
>> > > > > > > > > > > > > > > > > > automatically
>> > > > > > > > > > > > > > > > > > gets a 100x capacity without explicitly
>> > > > bothering
>> > > > > > the
>> > > > > > > > > > > > operators.
>> > > > > > > > > > > > > > > > > > I understand the counter argument can be
>> > that
>> > > > > maybe
>> > > > > > > > > that's
>> > > > > > > > > > a
>> > > > > > > > > > >
>> > > > > > > > > > > > > waste
>> > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > > resource if control request
>> > > > > > > > > > > > > > > > > > rate is low and operators may want to
>> fine
>> > > tune
>> > > > > the
>> > > > > > > > > > capacity
>> > > > > > > > > > > of
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > controlRequestQueue.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I'm ok with either approach, and can
>> change
>> > > it
>> > > > if
>> > > > > > you
>> > > > > > > > or
>> > > > > > > > >
>> > > > > > > > > > > anyone
>> > > > > > > > > > > > > > else
>> > > > > > > > > > > > > > > > > feels
>> > > > > > > > > > > > > > > > > > strong about adding the extra config.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > Lucas
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu
>> <
>> > > > > > > > > > yuzhihong@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Lucas:
>> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can
>> you
>> > > > > > elaborate
>> > > > > > > a
>> > > > > > > > > bit
>> > > > > > > > > > > more
>> > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > why
>> > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > separate config has bigger impact ?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong
>> > Lin <
>> > > > > > > > > > > > lindong28@gmail.com
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Hey Luca,
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
>> overall.
>> > > > Some
>> > > > > > > > > comments
>> > > > > > > > > > > > below:
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > - We usually specify the full mbean
>> for
>> > > the
>> > > > > new
>> > > > > > > > > metrics
>> > > > > > > > > > > in
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > KIP.
>> > > > > > > > > > > > > > > > > Can
>> > > > > > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > > > > specify it in the Public Interface
>> > > section
>> > > > > > > similar
>> > > > > > > > > to
>> > > > > > > > > > > > KIP-237
>> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
>> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
>> Metrics>
>> > > > > > > > > > > > > > > > > > > > ?
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
>> > pattern
>> > > as
>> > > > > > > KIP-153
>> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
>> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
>> > > > > > > > > > > > > metric>,
>> > > > > > > > > > > > > > > > > > > > where we keep the existing sensor
>> name
>> > > > > > > > > "BytesInPerSec"
>> > > > > > > > > > > and
>> > > > > > > > > > > > > add
>> > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > > > sensor
>> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather
>> than
>> > > > > > replacing
>> > > > > > > > > the
>> > > > > > > > > > > > sensor
>> > > > > > > > > > > > > > > name "
>> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
>> > > > > "ClientBytesInPerSec".
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > - It seems that the KIP changes the
>> > > > semantics
>> > > > > > of
>> > > > > > > > the
>> > > > > > > > >
>> > > > > > > > > > > broker
>> > > > > > > > > > > > > > > config
>> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because the
>> > number
>> > > of
>> > > > > > total
>> > > > > > > > > > > requests
>> > > > > > > > > > > > > > queued
>> > > > > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > broker will be no longer bounded by
>> > > > > > > > > > > "queued.max.requests".
>> > > > > > > > > > > > > This
>> > > > > > > > > > > > > > > > > > probably
>> > > > > > > > > > > > > > > > > > > > needs to be specified in the Public
>> > > > > Interfaces
>> > > > > > > > > section
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > > > > discussion.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > Dong
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM,
>> Lucas
>> > > > Wang
>> > > > > <
>> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a
>> separate
>> > > queue
>> > > > > for
>> > > > > > > > > > > controller
>> > > > > > > > > > > > > > > > requests:
>> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
>> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > > > > > > >
>> > > > > > > > > > > > 291%
>> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
>> > > > > > > > > > control+requests+and+data+
>> > > > > > > > > > >
>> > > > > > > > > > > > > > requests
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Can you please take a look and
>> let me
>> > > > know
>> > > > > > your
>> > > > > > > > > > > feedback?
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
>> > > > > > > > > > > > > > > > > > > > > Regards,
>> > > > > > > > > > > > > > > > > > > > > Lucas
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Dong,

The simulation of disk being slow is merely for me to easily construct a
testing scenario
with a backlog of produce requests. In production, other than the disk
being slow, a backlog of
produce requests may also be caused by high produce QPS.
In that case, we may not want to kill the broker and that's when this KIP
can be useful, both for JBOD
and non-JBOD setup.

Going back to your previous question about each ProduceRequest covering 20
partitions that are randomly
distributed, let's say a LeaderAndIsr request is enqueued that tries to
switch the current broker, say broker0, from leader to follower
*for one of the partitions*, say *test-0*. For the sake of argument, let's
also assume the other brokers, say broker1, have *stopped* fetching from
the current broker, i.e. broker0.
1. If the enqueued produce requests have acks =  -1 (ALL)
  1.1 without this KIP, the ProduceRequests ahead of LeaderAndISR will be
put into the purgatory,
        and since they'll never be replicated to other brokers (because of
the assumption made above), they will
        be completed either when the LeaderAndISR request is processed or
when the timeout happens.
  1.2 With this KIP, broker0 will immediately transition the partition
test-0 to become a follower,
        after the current broker sees the replication of the remaining 19
partitions, it can send a response indicating that
        it's no longer the leader for the "test-0".
  To see the latency difference between 1.1 and 1.2, let's say there are
24K produce requests ahead of the LeaderAndISR, and there are 8 io threads,
  so each io thread will process approximately 3000 produce requests. Now
let's investigate the io thread that finally processed the LeaderAndISR.
  For the 3000 produce requests, if we model the time when their remaining
19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR request is
processed at time t3000.
  Without this KIP, the 1st produce request would have waited an extra
t3000 - t0 time in the purgatory, the 2nd an extra time of t3000 - t1, etc.
  Roughly speaking, the latency difference is bigger for the earlier
produce requests than for the later ones. For the same reason, the more
ProduceRequests queued
  before the LeaderAndISR, the bigger benefit we get (capped by the produce
timeout).
2. If the enqueued produce requests have acks=0 or acks=1
  There will be no latency differences in this case, but
  2.1 without this KIP, the records of partition test-0 in the
ProduceRequests ahead of the LeaderAndISR will be appended to the local log,
        and eventually be truncated after processing the LeaderAndISR. This
is what's referred to as
        "some unofficial definition of data loss in terms of messages
beyond the high watermark".
  2.2 with this KIP, we can mitigate the effect since if the LeaderAndISR
is immediately processed, the response to producers will have
        the NotLeaderForPartition error, causing producers to retry

This explanation above is the benefit for reducing the latency of a broker
becoming the follower,
closely related is reducing the latency of a broker becoming the leader.
In this case, the benefit is even more obvious, if other brokers have
resigned leadership, and the
current broker should take leadership. Any delay in processing the
LeaderAndISR will be perceived
by clients as unavailability. In extreme cases, this can cause failed
produce requests if the retries are
exhausted.

Another two types of controller requests are UpdateMetadata and
StopReplica, which I'll briefly discuss as follows:
For UpdateMetadata requests, delayed processing means clients receiving
stale metadata, e.g. with the wrong leadership info
for certain partitions, and the effect is more retries or even fatal
failure if the retries are exhausted.

For StopReplica requests, a long queuing time may degrade the performance
of topic deletion.

Regarding your last question of the delay for DescribeLogDirsRequest, you
are right
that this KIP cannot help with the latency in getting the log dirs info,
and it's only relevant
when controller requests are involved.

Regards,
Lucas


On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Jun,
>
> Thanks much for the comments. It is good point. So the feature may be
> useful for JBOD use-case. I have one question below.
>
> Hey Lucas,
>
> Do you think this feature is also useful for non-JBOD setup or it is only
> useful for the JBOD setup? It may be useful to understand this.
>
> When the broker is setup using JBOD, in order to move leaders on the failed
> disk to other disks, the system operator first needs to get the list of
> partitions on the failed disk. This is currently achieved using
> AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
> broker. If we only prioritize the controller requests, then the
> DescribeLogDirsRequest
> may still take a long time to be processed by the broker. So the overall
> time to move leaders away from the failed disk may still be long even with
> this KIP. What do you think?
>
> Thanks,
> Dong
>
>
> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Thanks for the insightful comment, Jun.
> >
> > @Dong,
> > Since both of the two comments in your previous email are about the
> > benefits of this KIP and whether it's useful,
> > in light of Jun's last comment, do you agree that this KIP can be
> > beneficial in the case mentioned by Jun?
> > Please let me know, thanks!
> >
> > Regards,
> > Lucas
> >
> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Lucas, Dong,
> > >
> > > If all disks on a broker are slow, one probably should just kill the
> > > broker. In that case, this KIP may not help. If only one of the disks
> on
> > a
> > > broker is slow, one may want to fail that disk and move the leaders on
> > that
> > > disk to other brokers. In that case, being able to process the
> > LeaderAndIsr
> > > requests faster will potentially help the producers recover quicker.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hey Lucas,
> > > >
> > > > Thanks for the reply. Some follow up questions below.
> > > >
> > > > Regarding 1, if each ProduceRequest covers 20 partitions that are
> > > randomly
> > > > distributed across all partitions, then each ProduceRequest will
> likely
> > > > cover some partitions for which the broker is still leader after it
> > > quickly
> > > > processes the
> > > > LeaderAndIsrRequest. Then broker will still be slow in processing
> these
> > > > ProduceRequest and request will still be very high with this KIP. It
> > > seems
> > > > that most ProduceRequest will still timeout after 30 seconds. Is this
> > > > understanding correct?
> > > >
> > > > Regarding 2, if most ProduceRequest will still timeout after 30
> > seconds,
> > > > then it is less clear how this KIP reduces average produce latency.
> Can
> > > you
> > > > clarify what metrics can be improved by this KIP?
> > > >
> > > > Not sure why system operator directly cares number of truncated
> > messages.
> > > > Do you mean this KIP can improve average throughput or reduce message
> > > > duplication? It will be good to understand this.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Dong,
> > > > >
> > > > > Thanks for your valuable comments. Please see my reply below.
> > > > >
> > > > > 1. The Google doc showed only 1 partition. Now let's consider a
> more
> > > > common
> > > > > scenario
> > > > > where broker0 is the leader of many partitions. And let's say for
> > some
> > > > > reason its IO becomes slow.
> > > > > The number of leader partitions on broker0 is so large, say 10K,
> that
> > > the
> > > > > cluster is skewed,
> > > > > and the operator would like to shift the leadership for a lot of
> > > > > partitions, say 9K, to other brokers,
> > > > > either manually or through some service like cruise control.
> > > > > With this KIP, not only will the leadership transitions finish more
> > > > > quickly, helping the cluster itself becoming more balanced,
> > > > > but all existing producers corresponding to the 9K partitions will
> > get
> > > > the
> > > > > errors relatively quickly
> > > > > rather than relying on their timeout, thanks to the batched async
> ZK
> > > > > operations.
> > > > > To me it's a useful feature to have during such troublesome times.
> > > > >
> > > > >
> > > > > 2. The experiments in the Google Doc have shown that with this KIP
> > many
> > > > > producers
> > > > > receive an explicit error NotLeaderForPartition, based on which
> they
> > > > retry
> > > > > immediately.
> > > > > Therefore the latency (~14 seconds+quick retry) for their single
> > > message
> > > > is
> > > > > much smaller
> > > > > compared with the case of timing out without the KIP (30 seconds
> for
> > > > timing
> > > > > out + quick retry).
> > > > > One might argue that reducing the timing out on the producer side
> can
> > > > > achieve the same result,
> > > > > yet reducing the timeout has its own drawbacks[1].
> > > > >
> > > > > Also *IF* there were a metric to show the number of truncated
> > messages
> > > on
> > > > > brokers,
> > > > > with the experiments done in the Google Doc, it should be easy to
> see
> > > > that
> > > > > a lot fewer messages need
> > > > > to be truncated on broker0 since the up-to-date metadata avoids
> > > appending
> > > > > of messages
> > > > > in subsequent PRODUCE requests. If we talk to a system operator and
> > ask
> > > > > whether
> > > > > they prefer fewer wasteful IOs, I bet most likely the answer is
> yes.
> > > > >
> > > > > 3. To answer your question, I think it might be helpful to
> construct
> > > some
> > > > > formulas.
> > > > > To simplify the modeling, I'm going back to the case where there is
> > > only
> > > > > ONE partition involved.
> > > > > Following the experiments in the Google Doc, let's say broker0
> > becomes
> > > > the
> > > > > follower at time t0,
> > > > > and after t0 there were still N produce requests in its request
> > queue.
> > > > > With the up-to-date metadata brought by this KIP, broker0 can reply
> > > with
> > > > an
> > > > > NotLeaderForPartition exception,
> > > > > let's use M1 to denote the average processing time of replying with
> > > such
> > > > an
> > > > > error message.
> > > > > Without this KIP, the broker will need to append messages to
> > segments,
> > > > > which may trigger a flush to disk,
> > > > > let's use M2 to denote the average processing time for such logic.
> > > > > Then the average extra latency incurred without this KIP is N *
> (M2 -
> > > > M1) /
> > > > > 2.
> > > > >
> > > > > In practice, M2 should always be larger than M1, which means as
> long
> > > as N
> > > > > is positive,
> > > > > we would see improvements on the average latency.
> > > > > There does not need to be significant backlog of requests in the
> > > request
> > > > > queue,
> > > > > or severe degradation of disk performance to have the improvement.
> > > > >
> > > > > Regards,
> > > > > Lucas
> > > > >
> > > > >
> > > > > [1] For instance, reducing the timeout on the producer side can
> > trigger
> > > > > unnecessary duplicate requests
> > > > > when the corresponding leader broker is overloaded, exacerbating
> the
> > > > > situation.
> > > > >
> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hey Lucas,
> > > > > >
> > > > > > Thanks much for the detailed documentation of the experiment.
> > > > > >
> > > > > > Initially I also think having a separate queue for controller
> > > requests
> > > > is
> > > > > > useful because, as you mentioned in the summary section of the
> > Google
> > > > > doc,
> > > > > > controller requests are generally more important than data
> requests
> > > and
> > > > > we
> > > > > > probably want controller requests to be processed sooner. But
> then
> > > Eno
> > > > > has
> > > > > > two very good questions which I am not sure the Google doc has
> > > answered
> > > > > > explicitly. Could you help with the following questions?
> > > > > >
> > > > > > 1) It is not very clear what is the actual benefit of KIP-291 to
> > > users.
> > > > > The
> > > > > > experiment setup in the Google doc simulates the scenario that
> > broker
> > > > is
> > > > > > very slow handling ProduceRequest due to e.g. slow disk. It
> > currently
> > > > > > assumes that there is only 1 partition. But in the common
> scenario,
> > > it
> > > > is
> > > > > > probably reasonable to assume that there are many other
> partitions
> > > that
> > > > > are
> > > > > > also actively produced to and ProduceRequest to these partition
> > also
> > > > > takes
> > > > > > e.g. 2 seconds to be processed. So even if broker0 can become
> > > follower
> > > > > for
> > > > > > the partition 0 soon, it probably still needs to process the
> > > > > ProduceRequest
> > > > > > slowly t in the queue because these ProduceRequests cover other
> > > > > partitions.
> > > > > > Thus most ProduceRequest will still timeout after 30 seconds and
> > most
> > > > > > clients will still likely timeout after 30 seconds. Then it is
> not
> > > > > > obviously what is the benefit to client since client will timeout
> > > after
> > > > > 30
> > > > > > seconds before possibly re-connecting to broker1, with or without
> > > > > KIP-291.
> > > > > > Did I miss something here?
> > > > > >
> > > > > > 2) I guess Eno's is asking for the specific benefits of this KIP
> to
> > > > user
> > > > > or
> > > > > > system administrator, e.g. whether this KIP decreases average
> > > latency,
> > > > > > 999th percentile latency, probably of exception exposed to client
> > > etc.
> > > > It
> > > > > > is probably useful to clarify this.
> > > > > >
> > > > > > 3) Does this KIP help improve user experience only when there is
> > > issue
> > > > > with
> > > > > > broker, e.g. significant backlog in the request queue due to slow
> > > disk
> > > > as
> > > > > > described in the Google doc? Or is this KIP also useful when
> there
> > is
> > > > no
> > > > > > ongoing issue in the cluster? It might be helpful to clarify this
> > to
> > > > > > understand the benefit of this KIP.
> > > > > >
> > > > > >
> > > > > > Thanks much,
> > > > > > Dong
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Eno,
> > > > > > >
> > > > > > > Sorry for the delay in getting the experiment results.
> > > > > > > Here is a link to the positive impact achieved by implementing
> > the
> > > > > > proposed
> > > > > > > change:
> > > > > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > > Please take a look when you have time and let me know your
> > > feedback.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io>
> wrote:
> > > > > > >
> > > > > > > > Thanks for the pointer. Will take a look might suit our
> > > > requirements
> > > > > > > > better.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Harsha
> > > > > > > >
> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Harsha,
> > > > > > > > >
> > > > > > > > > If I understand correctly, the replication quota mechanism
> > > > proposed
> > > > > > in
> > > > > > > > > KIP-73 can be helpful in that scenario.
> > > > > > > > > Have you tried it out?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Lucas,
> > > > > > > > > > One more question, any thoughts on making this
> configurable
> > > > > > > > > > and also allowing subset of data requests to be
> > prioritized.
> > > > For
> > > > > > > > example
> > > > > > > > >
> > > > > > > > > > ,we notice in our cluster when we take out a broker and
> > bring
> > > > new
> > > > > > one
> > > > > > > > it
> > > > > > > > >
> > > > > > > > > > will try to become follower and have lot of fetch
> requests
> > to
> > > > > other
> > > > > > > > > leaders
> > > > > > > > > > in clusters. This will negatively effect the
> > > application/client
> > > > > > > > > requests.
> > > > > > > > > > We are also exploring the similar solution to
> de-prioritize
> > > if
> > > > a
> > > > > > new
> > > > > > > > > > replica comes in for fetch requests, we are ok with the
> > > replica
> > > > > to
> > > > > > be
> > > > > > > > > > taking time but the leaders should prioritize the client
> > > > > requests.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Harsha
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Eno,
> > > > > > > > > > >
> > > > > > > > > > > Sorry for the delayed response.
> > > > > > > > > > > - I haven't implemented the feature yet, so no
> > experimental
> > > > > > results
> > > > > > > > so
> > > > > > > > >
> > > > > > > > > > > far.
> > > > > > > > > > > And I plan to test in out in the following days.
> > > > > > > > > > >
> > > > > > > > > > > - You are absolutely right that the priority queue does
> > not
> > > > > > > > completely
> > > > > > > > >
> > > > > > > > > > > prevent
> > > > > > > > > > > data requests being processed ahead of controller
> > requests.
> > > > > > > > > > > That being said, I expect it to greatly mitigate the
> > effect
> > > > of
> > > > > > > stable
> > > > > > > > > > > metadata.
> > > > > > > > > > > In any case, I'll try it out and post the results when
> I
> > > have
> > > > > it.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > > > > eno.thereska@gmail.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Lucas,
> > > > > > > > > > > >
> > > > > > > > > > > > Sorry for the delay, just had a look at this. A
> couple
> > of
> > > > > > > > questions:
> > > > > > > > >
> > > > > > > > > > > > - did you notice any positive change after
> implementing
> > > > this
> > > > > > KIP?
> > > > > > > > > I'm
> > > > > > > > > > > > wondering if you have any experimental results that
> > show
> > > > the
> > > > > > > > benefit
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > two queues.
> > > > > > > > > > > >
> > > > > > > > > > > > - priority is usually not sufficient in addressing
> the
> > > > > problem
> > > > > > > the
> > > > > > > > > KIP
> > > > > > > > > > > > identifies. Even with priority queues, you will
> > sometimes
> > > > > > > (often?)
> > > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > case that data plane requests will be ahead of the
> > > control
> > > > > > plane
> > > > > > > > > > > requests.
> > > > > > > > > > > > This happens because the system might have already
> > > started
> > > > > > > > > processing
> > > > > > > > > > > the
> > > > > > > > > > > > data plane requests before the control plane ones
> > > arrived.
> > > > So
> > > > > > it
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > good to know what % of the problem this KIP
> addresses.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Eno
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Change looks good.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the suggestion. I've updated the KIP.
> > > Please
> > > > > > take
> > > > > > > > > > another
> > > > > > > > > > >
> > > > > > > > > > > > > look.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > > > > > yuzhihong@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would be good if you can include the default
> > > value
> > > > > for
> > > > > > > > this
> > > > > > > > >
> > > > > > > > > > new
> > > > > > > > > > >
> > > > > > > > > > > > > config
> > > > > > > > > > > > > > > in the KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > > > > > lucasatucla@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I've updated the KIP by adding a new config,
> > > > instead
> > > > > of
> > > > > > > > > reusing
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > > > > Please take another look when you have time.
> > > > Thanks a
> > > > > > > lot!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > > > > > yuzhihong@gmail.com
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bq. that's a waste of resource if control
> > > request
> > > > > > rate
> > > > > > > is
> > > > > > > > > low
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I don't know if control request rate can
> get
> > to
> > > > > > > 100,000,
> > > > > > > > > > > likely
> > > > > > > > > > > > > not.
> > > > > > > > > > > > > > > Then
> > > > > > > > > > > > > > > > > using the same bound as that for data
> > requests
> > > > > seems
> > > > > > > > high.
> > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas
> Wang
> > <
> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > > > > > Let's say today the setting of
> > > > > > "queued.max.requests"
> > > > > > > in
> > > > > > > > > > > > cluster A
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > > > > while the setting in cluster B is
> 100,000.
> > > > > > > > > > > > > > > > > > The 100 times difference might have
> > indicated
> > > > > that
> > > > > > > > > machines
> > > > > > > > > > > in
> > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > > > > > controlRequestQueue
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > > > > gets a 100x capacity without explicitly
> > > > bothering
> > > > > > the
> > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > I understand the counter argument can be
> > that
> > > > > maybe
> > > > > > > > > that's
> > > > > > > > > > a
> > > > > > > > > > >
> > > > > > > > > > > > > waste
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > > > > > rate is low and operators may want to
> fine
> > > tune
> > > > > the
> > > > > > > > > > capacity
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm ok with either approach, and can
> change
> > > it
> > > > if
> > > > > > you
> > > > > > > > or
> > > > > > > > >
> > > > > > > > > > > anyone
> > > > > > > > > > > > > > else
> > > > > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > > > > > yuzhihong@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can
> you
> > > > > > elaborate
> > > > > > > a
> > > > > > > > > bit
> > > > > > > > > > > more
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong
> > Lin <
> > > > > > > > > > > > lindong28@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good
> overall.
> > > > Some
> > > > > > > > > comments
> > > > > > > > > > > > below:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - We usually specify the full mbean
> for
> > > the
> > > > > new
> > > > > > > > > metrics
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > specify it in the Public Interface
> > > section
> > > > > > > similar
> > > > > > > > > to
> > > > > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+
> Metrics>
> > > > > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
> > pattern
> > > as
> > > > > > > KIP-153
> > > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > > > > where we keep the existing sensor
> name
> > > > > > > > > "BytesInPerSec"
> > > > > > > > > > > and
> > > > > > > > > > > > > add
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather
> than
> > > > > > replacing
> > > > > > > > > the
> > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > > > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - It seems that the KIP changes the
> > > > semantics
> > > > > > of
> > > > > > > > the
> > > > > > > > >
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > "queued.max.requests" because the
> > number
> > > of
> > > > > > total
> > > > > > > > > > > requests
> > > > > > > > > > > > > > queued
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > > > > > "queued.max.requests".
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > > > needs to be specified in the Public
> > > > > Interfaces
> > > > > > > > > section
> > > > > > > > > > > for
> > > > > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM,
> Lucas
> > > > Wang
> > > > > <
> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a separate
> > > queue
> > > > > for
> > > > > > > > > > > controller
> > > > > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > > >
> > > > > > > > > > > > 291%
> > > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > > > > > control+requests+and+data+
> > > > > > > > > > >
> > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can you please take a look and let
> me
> > > > know
> > > > > > your
> > > > > > > > > > > feedback?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Jun,

Thanks much for the comments. It is good point. So the feature may be
useful for JBOD use-case. I have one question below.

Hey Lucas,

Do you think this feature is also useful for non-JBOD setup or it is only
useful for the JBOD setup? It may be useful to understand this.

When the broker is setup using JBOD, in order to move leaders on the failed
disk to other disks, the system operator first needs to get the list of
partitions on the failed disk. This is currently achieved using
AdminClient.describeLogDirs(), which sends DescribeLogDirsRequest to the
broker. If we only prioritize the controller requests, then the
DescribeLogDirsRequest
may still take a long time to be processed by the broker. So the overall
time to move leaders away from the failed disk may still be long even with
this KIP. What do you think?

Thanks,
Dong


On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <lu...@gmail.com> wrote:

> Thanks for the insightful comment, Jun.
>
> @Dong,
> Since both of the two comments in your previous email are about the
> benefits of this KIP and whether it's useful,
> in light of Jun's last comment, do you agree that this KIP can be
> beneficial in the case mentioned by Jun?
> Please let me know, thanks!
>
> Regards,
> Lucas
>
> On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Lucas, Dong,
> >
> > If all disks on a broker are slow, one probably should just kill the
> > broker. In that case, this KIP may not help. If only one of the disks on
> a
> > broker is slow, one may want to fail that disk and move the leaders on
> that
> > disk to other brokers. In that case, being able to process the
> LeaderAndIsr
> > requests faster will potentially help the producers recover quicker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks for the reply. Some follow up questions below.
> > >
> > > Regarding 1, if each ProduceRequest covers 20 partitions that are
> > randomly
> > > distributed across all partitions, then each ProduceRequest will likely
> > > cover some partitions for which the broker is still leader after it
> > quickly
> > > processes the
> > > LeaderAndIsrRequest. Then broker will still be slow in processing these
> > > ProduceRequest and request will still be very high with this KIP. It
> > seems
> > > that most ProduceRequest will still timeout after 30 seconds. Is this
> > > understanding correct?
> > >
> > > Regarding 2, if most ProduceRequest will still timeout after 30
> seconds,
> > > then it is less clear how this KIP reduces average produce latency. Can
> > you
> > > clarify what metrics can be improved by this KIP?
> > >
> > > Not sure why system operator directly cares number of truncated
> messages.
> > > Do you mean this KIP can improve average throughput or reduce message
> > > duplication? It will be good to understand this.
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com>
> wrote:
> > >
> > > > Hi Dong,
> > > >
> > > > Thanks for your valuable comments. Please see my reply below.
> > > >
> > > > 1. The Google doc showed only 1 partition. Now let's consider a more
> > > common
> > > > scenario
> > > > where broker0 is the leader of many partitions. And let's say for
> some
> > > > reason its IO becomes slow.
> > > > The number of leader partitions on broker0 is so large, say 10K, that
> > the
> > > > cluster is skewed,
> > > > and the operator would like to shift the leadership for a lot of
> > > > partitions, say 9K, to other brokers,
> > > > either manually or through some service like cruise control.
> > > > With this KIP, not only will the leadership transitions finish more
> > > > quickly, helping the cluster itself becoming more balanced,
> > > > but all existing producers corresponding to the 9K partitions will
> get
> > > the
> > > > errors relatively quickly
> > > > rather than relying on their timeout, thanks to the batched async ZK
> > > > operations.
> > > > To me it's a useful feature to have during such troublesome times.
> > > >
> > > >
> > > > 2. The experiments in the Google Doc have shown that with this KIP
> many
> > > > producers
> > > > receive an explicit error NotLeaderForPartition, based on which they
> > > retry
> > > > immediately.
> > > > Therefore the latency (~14 seconds+quick retry) for their single
> > message
> > > is
> > > > much smaller
> > > > compared with the case of timing out without the KIP (30 seconds for
> > > timing
> > > > out + quick retry).
> > > > One might argue that reducing the timing out on the producer side can
> > > > achieve the same result,
> > > > yet reducing the timeout has its own drawbacks[1].
> > > >
> > > > Also *IF* there were a metric to show the number of truncated
> messages
> > on
> > > > brokers,
> > > > with the experiments done in the Google Doc, it should be easy to see
> > > that
> > > > a lot fewer messages need
> > > > to be truncated on broker0 since the up-to-date metadata avoids
> > appending
> > > > of messages
> > > > in subsequent PRODUCE requests. If we talk to a system operator and
> ask
> > > > whether
> > > > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> > > >
> > > > 3. To answer your question, I think it might be helpful to construct
> > some
> > > > formulas.
> > > > To simplify the modeling, I'm going back to the case where there is
> > only
> > > > ONE partition involved.
> > > > Following the experiments in the Google Doc, let's say broker0
> becomes
> > > the
> > > > follower at time t0,
> > > > and after t0 there were still N produce requests in its request
> queue.
> > > > With the up-to-date metadata brought by this KIP, broker0 can reply
> > with
> > > an
> > > > NotLeaderForPartition exception,
> > > > let's use M1 to denote the average processing time of replying with
> > such
> > > an
> > > > error message.
> > > > Without this KIP, the broker will need to append messages to
> segments,
> > > > which may trigger a flush to disk,
> > > > let's use M2 to denote the average processing time for such logic.
> > > > Then the average extra latency incurred without this KIP is N * (M2 -
> > > M1) /
> > > > 2.
> > > >
> > > > In practice, M2 should always be larger than M1, which means as long
> > as N
> > > > is positive,
> > > > we would see improvements on the average latency.
> > > > There does not need to be significant backlog of requests in the
> > request
> > > > queue,
> > > > or severe degradation of disk performance to have the improvement.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > >
> > > > [1] For instance, reducing the timeout on the producer side can
> trigger
> > > > unnecessary duplicate requests
> > > > when the corresponding leader broker is overloaded, exacerbating the
> > > > situation.
> > > >
> > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com>
> wrote:
> > > >
> > > > > Hey Lucas,
> > > > >
> > > > > Thanks much for the detailed documentation of the experiment.
> > > > >
> > > > > Initially I also think having a separate queue for controller
> > requests
> > > is
> > > > > useful because, as you mentioned in the summary section of the
> Google
> > > > doc,
> > > > > controller requests are generally more important than data requests
> > and
> > > > we
> > > > > probably want controller requests to be processed sooner. But then
> > Eno
> > > > has
> > > > > two very good questions which I am not sure the Google doc has
> > answered
> > > > > explicitly. Could you help with the following questions?
> > > > >
> > > > > 1) It is not very clear what is the actual benefit of KIP-291 to
> > users.
> > > > The
> > > > > experiment setup in the Google doc simulates the scenario that
> broker
> > > is
> > > > > very slow handling ProduceRequest due to e.g. slow disk. It
> currently
> > > > > assumes that there is only 1 partition. But in the common scenario,
> > it
> > > is
> > > > > probably reasonable to assume that there are many other partitions
> > that
> > > > are
> > > > > also actively produced to and ProduceRequest to these partition
> also
> > > > takes
> > > > > e.g. 2 seconds to be processed. So even if broker0 can become
> > follower
> > > > for
> > > > > the partition 0 soon, it probably still needs to process the
> > > > ProduceRequest
> > > > > slowly t in the queue because these ProduceRequests cover other
> > > > partitions.
> > > > > Thus most ProduceRequest will still timeout after 30 seconds and
> most
> > > > > clients will still likely timeout after 30 seconds. Then it is not
> > > > > obviously what is the benefit to client since client will timeout
> > after
> > > > 30
> > > > > seconds before possibly re-connecting to broker1, with or without
> > > > KIP-291.
> > > > > Did I miss something here?
> > > > >
> > > > > 2) I guess Eno's is asking for the specific benefits of this KIP to
> > > user
> > > > or
> > > > > system administrator, e.g. whether this KIP decreases average
> > latency,
> > > > > 999th percentile latency, probably of exception exposed to client
> > etc.
> > > It
> > > > > is probably useful to clarify this.
> > > > >
> > > > > 3) Does this KIP help improve user experience only when there is
> > issue
> > > > with
> > > > > broker, e.g. significant backlog in the request queue due to slow
> > disk
> > > as
> > > > > described in the Google doc? Or is this KIP also useful when there
> is
> > > no
> > > > > ongoing issue in the cluster? It might be helpful to clarify this
> to
> > > > > understand the benefit of this KIP.
> > > > >
> > > > >
> > > > > Thanks much,
> > > > > Dong
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Eno,
> > > > > >
> > > > > > Sorry for the delay in getting the experiment results.
> > > > > > Here is a link to the positive impact achieved by implementing
> the
> > > > > proposed
> > > > > > change:
> > > > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > > Please take a look when you have time and let me know your
> > feedback.
> > > > > >
> > > > > > Regards,
> > > > > > Lucas
> > > > > >
> > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
> > > > > >
> > > > > > > Thanks for the pointer. Will take a look might suit our
> > > requirements
> > > > > > > better.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Harsha,
> > > > > > > >
> > > > > > > > If I understand correctly, the replication quota mechanism
> > > proposed
> > > > > in
> > > > > > > > KIP-73 can be helpful in that scenario.
> > > > > > > > Have you tried it out?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io >
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > > One more question, any thoughts on making this configurable
> > > > > > > > > and also allowing subset of data requests to be
> prioritized.
> > > For
> > > > > > > example
> > > > > > > >
> > > > > > > > > ,we notice in our cluster when we take out a broker and
> bring
> > > new
> > > > > one
> > > > > > > it
> > > > > > > >
> > > > > > > > > will try to become follower and have lot of fetch requests
> to
> > > > other
> > > > > > > > leaders
> > > > > > > > > in clusters. This will negatively effect the
> > application/client
> > > > > > > > requests.
> > > > > > > > > We are also exploring the similar solution to de-prioritize
> > if
> > > a
> > > > > new
> > > > > > > > > replica comes in for fetch requests, we are ok with the
> > replica
> > > > to
> > > > > be
> > > > > > > > > taking time but the leaders should prioritize the client
> > > > requests.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Harsha
> > > > > > > > >
> > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Eno,
> > > > > > > > > >
> > > > > > > > > > Sorry for the delayed response.
> > > > > > > > > > - I haven't implemented the feature yet, so no
> experimental
> > > > > results
> > > > > > > so
> > > > > > > >
> > > > > > > > > > far.
> > > > > > > > > > And I plan to test in out in the following days.
> > > > > > > > > >
> > > > > > > > > > - You are absolutely right that the priority queue does
> not
> > > > > > > completely
> > > > > > > >
> > > > > > > > > > prevent
> > > > > > > > > > data requests being processed ahead of controller
> requests.
> > > > > > > > > > That being said, I expect it to greatly mitigate the
> effect
> > > of
> > > > > > stable
> > > > > > > > > > metadata.
> > > > > > > > > > In any case, I'll try it out and post the results when I
> > have
> > > > it.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > > > eno.thereska@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Lucas,
> > > > > > > > > > >
> > > > > > > > > > > Sorry for the delay, just had a look at this. A couple
> of
> > > > > > > questions:
> > > > > > > >
> > > > > > > > > > > - did you notice any positive change after implementing
> > > this
> > > > > KIP?
> > > > > > > > I'm
> > > > > > > > > > > wondering if you have any experimental results that
> show
> > > the
> > > > > > > benefit
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > two queues.
> > > > > > > > > > >
> > > > > > > > > > > - priority is usually not sufficient in addressing the
> > > > problem
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > > > identifies. Even with priority queues, you will
> sometimes
> > > > > > (often?)
> > > > > > > > have
> > > > > > > > > > the
> > > > > > > > > > > case that data plane requests will be ahead of the
> > control
> > > > > plane
> > > > > > > > > > requests.
> > > > > > > > > > > This happens because the system might have already
> > started
> > > > > > > > processing
> > > > > > > > > > the
> > > > > > > > > > > data plane requests before the control plane ones
> > arrived.
> > > So
> > > > > it
> > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Eno
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Change looks good.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the suggestion. I've updated the KIP.
> > Please
> > > > > take
> > > > > > > > > another
> > > > > > > > > >
> > > > > > > > > > > > look.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > > > > yuzhihong@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be good if you can include the default
> > value
> > > > for
> > > > > > > this
> > > > > > > >
> > > > > > > > > new
> > > > > > > > > >
> > > > > > > > > > > > config
> > > > > > > > > > > > > > in the KIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've updated the KIP by adding a new config,
> > > instead
> > > > of
> > > > > > > > reusing
> > > > > > > > > > the
> > > > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > > > Please take another look when you have time.
> > > Thanks a
> > > > > > lot!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > > > > yuzhihong@gmail.com
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bq. that's a waste of resource if control
> > request
> > > > > rate
> > > > > > is
> > > > > > > > low
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I don't know if control request rate can get
> to
> > > > > > 100,000,
> > > > > > > > > > likely
> > > > > > > > > > > > not.
> > > > > > > > > > > > > > Then
> > > > > > > > > > > > > > > > using the same bound as that for data
> requests
> > > > seems
> > > > > > > high.
> > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang
> <
> > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > > > > Let's say today the setting of
> > > > > "queued.max.requests"
> > > > > > in
> > > > > > > > > > > cluster A
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > > > > > The 100 times difference might have
> indicated
> > > > that
> > > > > > > > machines
> > > > > > > > > > in
> > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > > > > controlRequestQueue
> > > > > > > > > > > in
> > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > > > gets a 100x capacity without explicitly
> > > bothering
> > > > > the
> > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > I understand the counter argument can be
> that
> > > > maybe
> > > > > > > > that's
> > > > > > > > > a
> > > > > > > > > >
> > > > > > > > > > > > waste
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > > > > rate is low and operators may want to fine
> > tune
> > > > the
> > > > > > > > > capacity
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm ok with either approach, and can change
> > it
> > > if
> > > > > you
> > > > > > > or
> > > > > > > >
> > > > > > > > > > anyone
> > > > > > > > > > > > > else
> > > > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > > > > yuzhihong@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> > > > > elaborate
> > > > > > a
> > > > > > > > bit
> > > > > > > > > > more
> > > > > > > > > > > > on
> > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong
> Lin <
> > > > > > > > > > > lindong28@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall.
> > > Some
> > > > > > > > comments
> > > > > > > > > > > below:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - We usually specify the full mbean for
> > the
> > > > new
> > > > > > > > metrics
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > specify it in the Public Interface
> > section
> > > > > > similar
> > > > > > > > to
> > > > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Maybe we could follow the same
> pattern
> > as
> > > > > > KIP-153
> > > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > > > > > "BytesInPerSec"
> > > > > > > > > > and
> > > > > > > > > > > > add
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> > > > > replacing
> > > > > > > > the
> > > > > > > > > > > sensor
> > > > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - It seems that the KIP changes the
> > > semantics
> > > > > of
> > > > > > > the
> > > > > > > >
> > > > > > > > > > broker
> > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > "queued.max.requests" because the
> number
> > of
> > > > > total
> > > > > > > > > > requests
> > > > > > > > > > > > > queued
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > > > > "queued.max.requests".
> > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > > needs to be specified in the Public
> > > > Interfaces
> > > > > > > > section
> > > > > > > > > > for
> > > > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas
> > > Wang
> > > > <
> > > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I created KIP-291 to add a separate
> > queue
> > > > for
> > > > > > > > > > controller
> > > > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > > >
> > > > > > > > > > > 291%
> > > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > > > > control+requests+and+data+
> > > > > > > > > >
> > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you please take a look and let me
> > > know
> > > > > your
> > > > > > > > > > feedback?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Thanks for the insightful comment, Jun.

@Dong,
Since both of the two comments in your previous email are about the
benefits of this KIP and whether it's useful,
in light of Jun's last comment, do you agree that this KIP can be
beneficial in the case mentioned by Jun?
Please let me know, thanks!

Regards,
Lucas

On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Lucas, Dong,
>
> If all disks on a broker are slow, one probably should just kill the
> broker. In that case, this KIP may not help. If only one of the disks on a
> broker is slow, one may want to fail that disk and move the leaders on that
> disk to other brokers. In that case, being able to process the LeaderAndIsr
> requests faster will potentially help the producers recover quicker.
>
> Thanks,
>
> Jun
>
> On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Lucas,
> >
> > Thanks for the reply. Some follow up questions below.
> >
> > Regarding 1, if each ProduceRequest covers 20 partitions that are
> randomly
> > distributed across all partitions, then each ProduceRequest will likely
> > cover some partitions for which the broker is still leader after it
> quickly
> > processes the
> > LeaderAndIsrRequest. Then broker will still be slow in processing these
> > ProduceRequest and request will still be very high with this KIP. It
> seems
> > that most ProduceRequest will still timeout after 30 seconds. Is this
> > understanding correct?
> >
> > Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> > then it is less clear how this KIP reduces average produce latency. Can
> you
> > clarify what metrics can be improved by this KIP?
> >
> > Not sure why system operator directly cares number of truncated messages.
> > Do you mean this KIP can improve average throughput or reduce message
> > duplication? It will be good to understand this.
> >
> > Thanks,
> > Dong
> >
> >
> >
> >
> >
> > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com> wrote:
> >
> > > Hi Dong,
> > >
> > > Thanks for your valuable comments. Please see my reply below.
> > >
> > > 1. The Google doc showed only 1 partition. Now let's consider a more
> > common
> > > scenario
> > > where broker0 is the leader of many partitions. And let's say for some
> > > reason its IO becomes slow.
> > > The number of leader partitions on broker0 is so large, say 10K, that
> the
> > > cluster is skewed,
> > > and the operator would like to shift the leadership for a lot of
> > > partitions, say 9K, to other brokers,
> > > either manually or through some service like cruise control.
> > > With this KIP, not only will the leadership transitions finish more
> > > quickly, helping the cluster itself becoming more balanced,
> > > but all existing producers corresponding to the 9K partitions will get
> > the
> > > errors relatively quickly
> > > rather than relying on their timeout, thanks to the batched async ZK
> > > operations.
> > > To me it's a useful feature to have during such troublesome times.
> > >
> > >
> > > 2. The experiments in the Google Doc have shown that with this KIP many
> > > producers
> > > receive an explicit error NotLeaderForPartition, based on which they
> > retry
> > > immediately.
> > > Therefore the latency (~14 seconds+quick retry) for their single
> message
> > is
> > > much smaller
> > > compared with the case of timing out without the KIP (30 seconds for
> > timing
> > > out + quick retry).
> > > One might argue that reducing the timing out on the producer side can
> > > achieve the same result,
> > > yet reducing the timeout has its own drawbacks[1].
> > >
> > > Also *IF* there were a metric to show the number of truncated messages
> on
> > > brokers,
> > > with the experiments done in the Google Doc, it should be easy to see
> > that
> > > a lot fewer messages need
> > > to be truncated on broker0 since the up-to-date metadata avoids
> appending
> > > of messages
> > > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > > whether
> > > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> > >
> > > 3. To answer your question, I think it might be helpful to construct
> some
> > > formulas.
> > > To simplify the modeling, I'm going back to the case where there is
> only
> > > ONE partition involved.
> > > Following the experiments in the Google Doc, let's say broker0 becomes
> > the
> > > follower at time t0,
> > > and after t0 there were still N produce requests in its request queue.
> > > With the up-to-date metadata brought by this KIP, broker0 can reply
> with
> > an
> > > NotLeaderForPartition exception,
> > > let's use M1 to denote the average processing time of replying with
> such
> > an
> > > error message.
> > > Without this KIP, the broker will need to append messages to segments,
> > > which may trigger a flush to disk,
> > > let's use M2 to denote the average processing time for such logic.
> > > Then the average extra latency incurred without this KIP is N * (M2 -
> > M1) /
> > > 2.
> > >
> > > In practice, M2 should always be larger than M1, which means as long
> as N
> > > is positive,
> > > we would see improvements on the average latency.
> > > There does not need to be significant backlog of requests in the
> request
> > > queue,
> > > or severe degradation of disk performance to have the improvement.
> > >
> > > Regards,
> > > Lucas
> > >
> > >
> > > [1] For instance, reducing the timeout on the producer side can trigger
> > > unnecessary duplicate requests
> > > when the corresponding leader broker is overloaded, exacerbating the
> > > situation.
> > >
> > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hey Lucas,
> > > >
> > > > Thanks much for the detailed documentation of the experiment.
> > > >
> > > > Initially I also think having a separate queue for controller
> requests
> > is
> > > > useful because, as you mentioned in the summary section of the Google
> > > doc,
> > > > controller requests are generally more important than data requests
> and
> > > we
> > > > probably want controller requests to be processed sooner. But then
> Eno
> > > has
> > > > two very good questions which I am not sure the Google doc has
> answered
> > > > explicitly. Could you help with the following questions?
> > > >
> > > > 1) It is not very clear what is the actual benefit of KIP-291 to
> users.
> > > The
> > > > experiment setup in the Google doc simulates the scenario that broker
> > is
> > > > very slow handling ProduceRequest due to e.g. slow disk. It currently
> > > > assumes that there is only 1 partition. But in the common scenario,
> it
> > is
> > > > probably reasonable to assume that there are many other partitions
> that
> > > are
> > > > also actively produced to and ProduceRequest to these partition also
> > > takes
> > > > e.g. 2 seconds to be processed. So even if broker0 can become
> follower
> > > for
> > > > the partition 0 soon, it probably still needs to process the
> > > ProduceRequest
> > > > slowly t in the queue because these ProduceRequests cover other
> > > partitions.
> > > > Thus most ProduceRequest will still timeout after 30 seconds and most
> > > > clients will still likely timeout after 30 seconds. Then it is not
> > > > obviously what is the benefit to client since client will timeout
> after
> > > 30
> > > > seconds before possibly re-connecting to broker1, with or without
> > > KIP-291.
> > > > Did I miss something here?
> > > >
> > > > 2) I guess Eno's is asking for the specific benefits of this KIP to
> > user
> > > or
> > > > system administrator, e.g. whether this KIP decreases average
> latency,
> > > > 999th percentile latency, probably of exception exposed to client
> etc.
> > It
> > > > is probably useful to clarify this.
> > > >
> > > > 3) Does this KIP help improve user experience only when there is
> issue
> > > with
> > > > broker, e.g. significant backlog in the request queue due to slow
> disk
> > as
> > > > described in the Google doc? Or is this KIP also useful when there is
> > no
> > > > ongoing issue in the cluster? It might be helpful to clarify this to
> > > > understand the benefit of this KIP.
> > > >
> > > >
> > > > Thanks much,
> > > > Dong
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Eno,
> > > > >
> > > > > Sorry for the delay in getting the experiment results.
> > > > > Here is a link to the positive impact achieved by implementing the
> > > > proposed
> > > > > change:
> > > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > > Please take a look when you have time and let me know your
> feedback.
> > > > >
> > > > > Regards,
> > > > > Lucas
> > > > >
> > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
> > > > >
> > > > > > Thanks for the pointer. Will take a look might suit our
> > requirements
> > > > > > better.
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Harsha,
> > > > > > >
> > > > > > > If I understand correctly, the replication quota mechanism
> > proposed
> > > > in
> > > > > > > KIP-73 can be helpful in that scenario.
> > > > > > > Have you tried it out?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lucas
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io >
> > wrote:
> > > > > > >
> > > > > > > > Hi Lucas,
> > > > > > > > One more question, any thoughts on making this configurable
> > > > > > > > and also allowing subset of data requests to be prioritized.
> > For
> > > > > > example
> > > > > > >
> > > > > > > > ,we notice in our cluster when we take out a broker and bring
> > new
> > > > one
> > > > > > it
> > > > > > >
> > > > > > > > will try to become follower and have lot of fetch requests to
> > > other
> > > > > > > leaders
> > > > > > > > in clusters. This will negatively effect the
> application/client
> > > > > > > requests.
> > > > > > > > We are also exploring the similar solution to de-prioritize
> if
> > a
> > > > new
> > > > > > > > replica comes in for fetch requests, we are ok with the
> replica
> > > to
> > > > be
> > > > > > > > taking time but the leaders should prioritize the client
> > > requests.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Harsha
> > > > > > > >
> > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Eno,
> > > > > > > > >
> > > > > > > > > Sorry for the delayed response.
> > > > > > > > > - I haven't implemented the feature yet, so no experimental
> > > > results
> > > > > > so
> > > > > > >
> > > > > > > > > far.
> > > > > > > > > And I plan to test in out in the following days.
> > > > > > > > >
> > > > > > > > > - You are absolutely right that the priority queue does not
> > > > > > completely
> > > > > > >
> > > > > > > > > prevent
> > > > > > > > > data requests being processed ahead of controller requests.
> > > > > > > > > That being said, I expect it to greatly mitigate the effect
> > of
> > > > > stable
> > > > > > > > > metadata.
> > > > > > > > > In any case, I'll try it out and post the results when I
> have
> > > it.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > > eno.thereska@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Lucas,
> > > > > > > > > >
> > > > > > > > > > Sorry for the delay, just had a look at this. A couple of
> > > > > > questions:
> > > > > > >
> > > > > > > > > > - did you notice any positive change after implementing
> > this
> > > > KIP?
> > > > > > > I'm
> > > > > > > > > > wondering if you have any experimental results that show
> > the
> > > > > > benefit
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > two queues.
> > > > > > > > > >
> > > > > > > > > > - priority is usually not sufficient in addressing the
> > > problem
> > > > > the
> > > > > > > KIP
> > > > > > > > > > identifies. Even with priority queues, you will sometimes
> > > > > (often?)
> > > > > > > have
> > > > > > > > > the
> > > > > > > > > > case that data plane requests will be ahead of the
> control
> > > > plane
> > > > > > > > > requests.
> > > > > > > > > > This happens because the system might have already
> started
> > > > > > > processing
> > > > > > > > > the
> > > > > > > > > > data plane requests before the control plane ones
> arrived.
> > So
> > > > it
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Eno
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Change looks good.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > > > lucasatucla@gmail.com
> > > > > > >
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the suggestion. I've updated the KIP.
> Please
> > > > take
> > > > > > > > another
> > > > > > > > >
> > > > > > > > > > > look.
> > > > > > > > > > > >
> > > > > > > > > > > > Lucas
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > > > >
> > > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > > >
> > > > > > > > > > > > > It would be good if you can include the default
> value
> > > for
> > > > > > this
> > > > > > >
> > > > > > > > new
> > > > > > > > >
> > > > > > > > > > > config
> > > > > > > > > > > > > in the KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've updated the KIP by adding a new config,
> > instead
> > > of
> > > > > > > reusing
> > > > > > > > > the
> > > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > > Please take another look when you have time.
> > Thanks a
> > > > > lot!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > > > yuzhihong@gmail.com
> > > > > > >
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bq. that's a waste of resource if control
> request
> > > > rate
> > > > > is
> > > > > > > low
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't know if control request rate can get to
> > > > > 100,000,
> > > > > > > > > likely
> > > > > > > > > > > not.
> > > > > > > > > > > > > Then
> > > > > > > > > > > > > > > using the same bound as that for data requests
> > > seems
> > > > > > high.
> > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > > > Let's say today the setting of
> > > > "queued.max.requests"
> > > > > in
> > > > > > > > > > cluster A
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > > > > The 100 times difference might have indicated
> > > that
> > > > > > > machines
> > > > > > > > > in
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > > > controlRequestQueue
> > > > > > > > > > in
> > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > > gets a 100x capacity without explicitly
> > bothering
> > > > the
> > > > > > > > > > operators.
> > > > > > > > > > > > > > > > I understand the counter argument can be that
> > > maybe
> > > > > > > that's
> > > > > > > > a
> > > > > > > > >
> > > > > > > > > > > waste
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > > > rate is low and operators may want to fine
> tune
> > > the
> > > > > > > > capacity
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm ok with either approach, and can change
> it
> > if
> > > > you
> > > > > > or
> > > > > > >
> > > > > > > > > anyone
> > > > > > > > > > > > else
> > > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > > > yuzhihong@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> > > > elaborate
> > > > > a
> > > > > > > bit
> > > > > > > > > more
> > > > > > > > > > > on
> > > > > > > > > > > > > why
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > > > > lindong28@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall.
> > Some
> > > > > > > comments
> > > > > > > > > > below:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - We usually specify the full mbean for
> the
> > > new
> > > > > > > metrics
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > specify it in the Public Interface
> section
> > > > > similar
> > > > > > > to
> > > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > >
> > > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Maybe we could follow the same pattern
> as
> > > > > KIP-153
> > > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > > > > "BytesInPerSec"
> > > > > > > > > and
> > > > > > > > > > > add
> > > > > > > > > > > > a
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> > > > replacing
> > > > > > > the
> > > > > > > > > > sensor
> > > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - It seems that the KIP changes the
> > semantics
> > > > of
> > > > > > the
> > > > > > >
> > > > > > > > > broker
> > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > "queued.max.requests" because the number
> of
> > > > total
> > > > > > > > > requests
> > > > > > > > > > > > queued
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > > > "queued.max.requests".
> > > > > > > > > > > This
> > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > needs to be specified in the Public
> > > Interfaces
> > > > > > > section
> > > > > > > > > for
> > > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas
> > Wang
> > > <
> > > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I created KIP-291 to add a separate
> queue
> > > for
> > > > > > > > > controller
> > > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > > >
> > > > > > > > > > 291%
> > > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > > > control+requests+and+data+
> > > > > > > > >
> > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you please take a look and let me
> > know
> > > > your
> > > > > > > > > feedback?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Jun Rao <ju...@confluent.io>.

Hi, Lucas, Dong,

If all disks on a broker are slow, one probably should just kill the
broker. In that case, this KIP may not help. If only one of the disks on a
broker is slow, one may want to fail that disk and move the leaders on that
disk to other brokers. In that case, being able to process the LeaderAndIsr
requests faster will potentially help the producers recover quicker.

Thanks,

Jun

On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Lucas,
>
> Thanks for the reply. Some follow up questions below.
>
> Regarding 1, if each ProduceRequest covers 20 partitions that are randomly
> distributed across all partitions, then each ProduceRequest will likely
> cover some partitions for which the broker is still leader after it quickly
> processes the
> LeaderAndIsrRequest. Then broker will still be slow in processing these
> ProduceRequest and request will still be very high with this KIP. It seems
> that most ProduceRequest will still timeout after 30 seconds. Is this
> understanding correct?
>
> Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> then it is less clear how this KIP reduces average produce latency. Can you
> clarify what metrics can be improved by this KIP?
>
> Not sure why system operator directly cares number of truncated messages.
> Do you mean this KIP can improve average throughput or reduce message
> duplication? It will be good to understand this.
>
> Thanks,
> Dong
>
>
>
>
>
> On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Dong,
> >
> > Thanks for your valuable comments. Please see my reply below.
> >
> > 1. The Google doc showed only 1 partition. Now let's consider a more
> common
> > scenario
> > where broker0 is the leader of many partitions. And let's say for some
> > reason its IO becomes slow.
> > The number of leader partitions on broker0 is so large, say 10K, that the
> > cluster is skewed,
> > and the operator would like to shift the leadership for a lot of
> > partitions, say 9K, to other brokers,
> > either manually or through some service like cruise control.
> > With this KIP, not only will the leadership transitions finish more
> > quickly, helping the cluster itself becoming more balanced,
> > but all existing producers corresponding to the 9K partitions will get
> the
> > errors relatively quickly
> > rather than relying on their timeout, thanks to the batched async ZK
> > operations.
> > To me it's a useful feature to have during such troublesome times.
> >
> >
> > 2. The experiments in the Google Doc have shown that with this KIP many
> > producers
> > receive an explicit error NotLeaderForPartition, based on which they
> retry
> > immediately.
> > Therefore the latency (~14 seconds+quick retry) for their single message
> is
> > much smaller
> > compared with the case of timing out without the KIP (30 seconds for
> timing
> > out + quick retry).
> > One might argue that reducing the timing out on the producer side can
> > achieve the same result,
> > yet reducing the timeout has its own drawbacks[1].
> >
> > Also *IF* there were a metric to show the number of truncated messages on
> > brokers,
> > with the experiments done in the Google Doc, it should be easy to see
> that
> > a lot fewer messages need
> > to be truncated on broker0 since the up-to-date metadata avoids appending
> > of messages
> > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > whether
> > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> >
> > 3. To answer your question, I think it might be helpful to construct some
> > formulas.
> > To simplify the modeling, I'm going back to the case where there is only
> > ONE partition involved.
> > Following the experiments in the Google Doc, let's say broker0 becomes
> the
> > follower at time t0,
> > and after t0 there were still N produce requests in its request queue.
> > With the up-to-date metadata brought by this KIP, broker0 can reply with
> an
> > NotLeaderForPartition exception,
> > let's use M1 to denote the average processing time of replying with such
> an
> > error message.
> > Without this KIP, the broker will need to append messages to segments,
> > which may trigger a flush to disk,
> > let's use M2 to denote the average processing time for such logic.
> > Then the average extra latency incurred without this KIP is N * (M2 -
> M1) /
> > 2.
> >
> > In practice, M2 should always be larger than M1, which means as long as N
> > is positive,
> > we would see improvements on the average latency.
> > There does not need to be significant backlog of requests in the request
> > queue,
> > or severe degradation of disk performance to have the improvement.
> >
> > Regards,
> > Lucas
> >
> >
> > [1] For instance, reducing the timeout on the producer side can trigger
> > unnecessary duplicate requests
> > when the corresponding leader broker is overloaded, exacerbating the
> > situation.
> >
> > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks much for the detailed documentation of the experiment.
> > >
> > > Initially I also think having a separate queue for controller requests
> is
> > > useful because, as you mentioned in the summary section of the Google
> > doc,
> > > controller requests are generally more important than data requests and
> > we
> > > probably want controller requests to be processed sooner. But then Eno
> > has
> > > two very good questions which I am not sure the Google doc has answered
> > > explicitly. Could you help with the following questions?
> > >
> > > 1) It is not very clear what is the actual benefit of KIP-291 to users.
> > The
> > > experiment setup in the Google doc simulates the scenario that broker
> is
> > > very slow handling ProduceRequest due to e.g. slow disk. It currently
> > > assumes that there is only 1 partition. But in the common scenario, it
> is
> > > probably reasonable to assume that there are many other partitions that
> > are
> > > also actively produced to and ProduceRequest to these partition also
> > takes
> > > e.g. 2 seconds to be processed. So even if broker0 can become follower
> > for
> > > the partition 0 soon, it probably still needs to process the
> > ProduceRequest
> > > slowly t in the queue because these ProduceRequests cover other
> > partitions.
> > > Thus most ProduceRequest will still timeout after 30 seconds and most
> > > clients will still likely timeout after 30 seconds. Then it is not
> > > obviously what is the benefit to client since client will timeout after
> > 30
> > > seconds before possibly re-connecting to broker1, with or without
> > KIP-291.
> > > Did I miss something here?
> > >
> > > 2) I guess Eno's is asking for the specific benefits of this KIP to
> user
> > or
> > > system administrator, e.g. whether this KIP decreases average latency,
> > > 999th percentile latency, probably of exception exposed to client etc.
> It
> > > is probably useful to clarify this.
> > >
> > > 3) Does this KIP help improve user experience only when there is issue
> > with
> > > broker, e.g. significant backlog in the request queue due to slow disk
> as
> > > described in the Google doc? Or is this KIP also useful when there is
> no
> > > ongoing issue in the cluster? It might be helpful to clarify this to
> > > understand the benefit of this KIP.
> > >
> > >
> > > Thanks much,
> > > Dong
> > >
> > >
> > >
> > >
> > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Eno,
> > > >
> > > > Sorry for the delay in getting the experiment results.
> > > > Here is a link to the positive impact achieved by implementing the
> > > proposed
> > > > change:
> > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > Please take a look when you have time and let me know your feedback.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
> > > >
> > > > > Thanks for the pointer. Will take a look might suit our
> requirements
> > > > > better.
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Harsha,
> > > > > >
> > > > > > If I understand correctly, the replication quota mechanism
> proposed
> > > in
> > > > > > KIP-73 can be helpful in that scenario.
> > > > > > Have you tried it out?
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io >
> wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > > One more question, any thoughts on making this configurable
> > > > > > > and also allowing subset of data requests to be prioritized.
> For
> > > > > example
> > > > > >
> > > > > > > ,we notice in our cluster when we take out a broker and bring
> new
> > > one
> > > > > it
> > > > > >
> > > > > > > will try to become follower and have lot of fetch requests to
> > other
> > > > > > leaders
> > > > > > > in clusters. This will negatively effect the application/client
> > > > > > requests.
> > > > > > > We are also exploring the similar solution to de-prioritize if
> a
> > > new
> > > > > > > replica comes in for fetch requests, we are ok with the replica
> > to
> > > be
> > > > > > > taking time but the leaders should prioritize the client
> > requests.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Eno,
> > > > > > > >
> > > > > > > > Sorry for the delayed response.
> > > > > > > > - I haven't implemented the feature yet, so no experimental
> > > results
> > > > > so
> > > > > >
> > > > > > > > far.
> > > > > > > > And I plan to test in out in the following days.
> > > > > > > >
> > > > > > > > - You are absolutely right that the priority queue does not
> > > > > completely
> > > > > >
> > > > > > > > prevent
> > > > > > > > data requests being processed ahead of controller requests.
> > > > > > > > That being said, I expect it to greatly mitigate the effect
> of
> > > > stable
> > > > > > > > metadata.
> > > > > > > > In any case, I'll try it out and post the results when I have
> > it.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > eno.thereska@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > Sorry for the delay, just had a look at this. A couple of
> > > > > questions:
> > > > > >
> > > > > > > > > - did you notice any positive change after implementing
> this
> > > KIP?
> > > > > > I'm
> > > > > > > > > wondering if you have any experimental results that show
> the
> > > > > benefit
> > > > > > of
> > > > > > > > the
> > > > > > > > > two queues.
> > > > > > > > >
> > > > > > > > > - priority is usually not sufficient in addressing the
> > problem
> > > > the
> > > > > > KIP
> > > > > > > > > identifies. Even with priority queues, you will sometimes
> > > > (often?)
> > > > > > have
> > > > > > > > the
> > > > > > > > > case that data plane requests will be ahead of the control
> > > plane
> > > > > > > > requests.
> > > > > > > > > This happens because the system might have already started
> > > > > > processing
> > > > > > > > the
> > > > > > > > > data plane requests before the control plane ones arrived.
> So
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Eno
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Change looks good.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the suggestion. I've updated the KIP. Please
> > > take
> > > > > > > another
> > > > > > > >
> > > > > > > > > > look.
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > > >
> > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > >
> > > > > > > > > > > > It would be good if you can include the default value
> > for
> > > > > this
> > > > > >
> > > > > > > new
> > > > > > > >
> > > > > > > > > > config
> > > > > > > > > > > > in the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the KIP by adding a new config,
> instead
> > of
> > > > > > reusing
> > > > > > > > the
> > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > Please take another look when you have time.
> Thanks a
> > > > lot!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > bq. that's a waste of resource if control request
> > > rate
> > > > is
> > > > > > low
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't know if control request rate can get to
> > > > 100,000,
> > > > > > > > likely
> > > > > > > > > > not.
> > > > > > > > > > > > Then
> > > > > > > > > > > > > > using the same bound as that for data requests
> > seems
> > > > > high.
> > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > > Let's say today the setting of
> > > "queued.max.requests"
> > > > in
> > > > > > > > > cluster A
> > > > > > > > > > > is
> > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > > > The 100 times difference might have indicated
> > that
> > > > > > machines
> > > > > > > > in
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > B
> > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > > controlRequestQueue
> > > > > > > > > in
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > gets a 100x capacity without explicitly
> bothering
> > > the
> > > > > > > > > operators.
> > > > > > > > > > > > > > > I understand the counter argument can be that
> > maybe
> > > > > > that's
> > > > > > > a
> > > > > > > >
> > > > > > > > > > waste
> > > > > > > > > > > of
> > > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > > rate is low and operators may want to fine tune
> > the
> > > > > > > capacity
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm ok with either approach, and can change it
> if
> > > you
> > > > > or
> > > > > >
> > > > > > > > anyone
> > > > > > > > > > > else
> > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > > yuzhihong@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> > > elaborate
> > > > a
> > > > > > bit
> > > > > > > > more
> > > > > > > > > > on
> > > > > > > > > > > > why
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > > > lindong28@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall.
> Some
> > > > > > comments
> > > > > > > > > below:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - We usually specify the full mbean for the
> > new
> > > > > > metrics
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > specify it in the Public Interface section
> > > > similar
> > > > > > to
> > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Maybe we could follow the same pattern as
> > > > KIP-153
> > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > > > "BytesInPerSec"
> > > > > > > > and
> > > > > > > > > > add
> > > > > > > > > > > a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> > > replacing
> > > > > > the
> > > > > > > > > sensor
> > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - It seems that the KIP changes the
> semantics
> > > of
> > > > > the
> > > > > >
> > > > > > > > broker
> > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > "queued.max.requests" because the number of
> > > total
> > > > > > > > requests
> > > > > > > > > > > queued
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > > "queued.max.requests".
> > > > > > > > > > This
> > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > needs to be specified in the Public
> > Interfaces
> > > > > > section
> > > > > > > > for
> > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas
> Wang
> > <
> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I created KIP-291 to add a separate queue
> > for
> > > > > > > > controller
> > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > 291%
> > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > > control+requests+and+data+
> > > > > > > >
> > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you please take a look and let me
> know
> > > your
> > > > > > > > feedback?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

Thanks for the reply. Some follow up questions below.

Regarding 1, if each ProduceRequest covers 20 partitions that are randomly
distributed across all partitions, then each ProduceRequest will likely
cover some partitions for which the broker is still leader after it quickly
processes the
LeaderAndIsrRequest. Then broker will still be slow in processing these
ProduceRequest and request will still be very high with this KIP. It seems
that most ProduceRequest will still timeout after 30 seconds. Is this
understanding correct?

Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
then it is less clear how this KIP reduces average produce latency. Can you
clarify what metrics can be improved by this KIP?

Not sure why system operator directly cares number of truncated messages.
Do you mean this KIP can improve average throughput or reduce message
duplication? It will be good to understand this.

Thanks,
Dong





On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lu...@gmail.com> wrote:

> Hi Dong,
>
> Thanks for your valuable comments. Please see my reply below.
>
> 1. The Google doc showed only 1 partition. Now let's consider a more common
> scenario
> where broker0 is the leader of many partitions. And let's say for some
> reason its IO becomes slow.
> The number of leader partitions on broker0 is so large, say 10K, that the
> cluster is skewed,
> and the operator would like to shift the leadership for a lot of
> partitions, say 9K, to other brokers,
> either manually or through some service like cruise control.
> With this KIP, not only will the leadership transitions finish more
> quickly, helping the cluster itself becoming more balanced,
> but all existing producers corresponding to the 9K partitions will get the
> errors relatively quickly
> rather than relying on their timeout, thanks to the batched async ZK
> operations.
> To me it's a useful feature to have during such troublesome times.
>
>
> 2. The experiments in the Google Doc have shown that with this KIP many
> producers
> receive an explicit error NotLeaderForPartition, based on which they retry
> immediately.
> Therefore the latency (~14 seconds+quick retry) for their single message is
> much smaller
> compared with the case of timing out without the KIP (30 seconds for timing
> out + quick retry).
> One might argue that reducing the timing out on the producer side can
> achieve the same result,
> yet reducing the timeout has its own drawbacks[1].
>
> Also *IF* there were a metric to show the number of truncated messages on
> brokers,
> with the experiments done in the Google Doc, it should be easy to see that
> a lot fewer messages need
> to be truncated on broker0 since the up-to-date metadata avoids appending
> of messages
> in subsequent PRODUCE requests. If we talk to a system operator and ask
> whether
> they prefer fewer wasteful IOs, I bet most likely the answer is yes.
>
> 3. To answer your question, I think it might be helpful to construct some
> formulas.
> To simplify the modeling, I'm going back to the case where there is only
> ONE partition involved.
> Following the experiments in the Google Doc, let's say broker0 becomes the
> follower at time t0,
> and after t0 there were still N produce requests in its request queue.
> With the up-to-date metadata brought by this KIP, broker0 can reply with an
> NotLeaderForPartition exception,
> let's use M1 to denote the average processing time of replying with such an
> error message.
> Without this KIP, the broker will need to append messages to segments,
> which may trigger a flush to disk,
> let's use M2 to denote the average processing time for such logic.
> Then the average extra latency incurred without this KIP is N * (M2 - M1) /
> 2.
>
> In practice, M2 should always be larger than M1, which means as long as N
> is positive,
> we would see improvements on the average latency.
> There does not need to be significant backlog of requests in the request
> queue,
> or severe degradation of disk performance to have the improvement.
>
> Regards,
> Lucas
>
>
> [1] For instance, reducing the timeout on the producer side can trigger
> unnecessary duplicate requests
> when the corresponding leader broker is overloaded, exacerbating the
> situation.
>
> On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Lucas,
> >
> > Thanks much for the detailed documentation of the experiment.
> >
> > Initially I also think having a separate queue for controller requests is
> > useful because, as you mentioned in the summary section of the Google
> doc,
> > controller requests are generally more important than data requests and
> we
> > probably want controller requests to be processed sooner. But then Eno
> has
> > two very good questions which I am not sure the Google doc has answered
> > explicitly. Could you help with the following questions?
> >
> > 1) It is not very clear what is the actual benefit of KIP-291 to users.
> The
> > experiment setup in the Google doc simulates the scenario that broker is
> > very slow handling ProduceRequest due to e.g. slow disk. It currently
> > assumes that there is only 1 partition. But in the common scenario, it is
> > probably reasonable to assume that there are many other partitions that
> are
> > also actively produced to and ProduceRequest to these partition also
> takes
> > e.g. 2 seconds to be processed. So even if broker0 can become follower
> for
> > the partition 0 soon, it probably still needs to process the
> ProduceRequest
> > slowly t in the queue because these ProduceRequests cover other
> partitions.
> > Thus most ProduceRequest will still timeout after 30 seconds and most
> > clients will still likely timeout after 30 seconds. Then it is not
> > obviously what is the benefit to client since client will timeout after
> 30
> > seconds before possibly re-connecting to broker1, with or without
> KIP-291.
> > Did I miss something here?
> >
> > 2) I guess Eno's is asking for the specific benefits of this KIP to user
> or
> > system administrator, e.g. whether this KIP decreases average latency,
> > 999th percentile latency, probably of exception exposed to client etc. It
> > is probably useful to clarify this.
> >
> > 3) Does this KIP help improve user experience only when there is issue
> with
> > broker, e.g. significant backlog in the request queue due to slow disk as
> > described in the Google doc? Or is this KIP also useful when there is no
> > ongoing issue in the cluster? It might be helpful to clarify this to
> > understand the benefit of this KIP.
> >
> >
> > Thanks much,
> > Dong
> >
> >
> >
> >
> > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Eno,
> > >
> > > Sorry for the delay in getting the experiment results.
> > > Here is a link to the positive impact achieved by implementing the
> > proposed
> > > change:
> > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > Please take a look when you have time and let me know your feedback.
> > >
> > > Regards,
> > > Lucas
> > >
> > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
> > >
> > > > Thanks for the pointer. Will take a look might suit our requirements
> > > > better.
> > > >
> > > > Thanks,
> > > > Harsha
> > > >
> > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <lucasatucla@gmail.com
> >
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Hi Harsha,
> > > > >
> > > > > If I understand correctly, the replication quota mechanism proposed
> > in
> > > > > KIP-73 can be helpful in that scenario.
> > > > > Have you tried it out?
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > > One more question, any thoughts on making this configurable
> > > > > > and also allowing subset of data requests to be prioritized. For
> > > > example
> > > > >
> > > > > > ,we notice in our cluster when we take out a broker and bring new
> > one
> > > > it
> > > > >
> > > > > > will try to become follower and have lot of fetch requests to
> other
> > > > > leaders
> > > > > > in clusters. This will negatively effect the application/client
> > > > > requests.
> > > > > > We are also exploring the similar solution to de-prioritize if a
> > new
> > > > > > replica comes in for fetch requests, we are ok with the replica
> to
> > be
> > > > > > taking time but the leaders should prioritize the client
> requests.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Eno,
> > > > > > >
> > > > > > > Sorry for the delayed response.
> > > > > > > - I haven't implemented the feature yet, so no experimental
> > results
> > > > so
> > > > >
> > > > > > > far.
> > > > > > > And I plan to test in out in the following days.
> > > > > > >
> > > > > > > - You are absolutely right that the priority queue does not
> > > > completely
> > > > >
> > > > > > > prevent
> > > > > > > data requests being processed ahead of controller requests.
> > > > > > > That being said, I expect it to greatly mitigate the effect of
> > > stable
> > > > > > > metadata.
> > > > > > > In any case, I'll try it out and post the results when I have
> it.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > eno.thereska@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Lucas,
> > > > > > > >
> > > > > > > > Sorry for the delay, just had a look at this. A couple of
> > > > questions:
> > > > >
> > > > > > > > - did you notice any positive change after implementing this
> > KIP?
> > > > > I'm
> > > > > > > > wondering if you have any experimental results that show the
> > > > benefit
> > > > > of
> > > > > > > the
> > > > > > > > two queues.
> > > > > > > >
> > > > > > > > - priority is usually not sufficient in addressing the
> problem
> > > the
> > > > > KIP
> > > > > > > > identifies. Even with priority queues, you will sometimes
> > > (often?)
> > > > > have
> > > > > > > the
> > > > > > > > case that data plane requests will be ahead of the control
> > plane
> > > > > > > requests.
> > > > > > > > This happens because the system might have already started
> > > > > processing
> > > > > > > the
> > > > > > > > data plane requests before the control plane ones arrived. So
> > it
> > > > > would
> > > > > > > be
> > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Eno
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Change looks good.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > >
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Ted,
> > > > > > > > > >
> > > > > > > > > > Thanks for the suggestion. I've updated the KIP. Please
> > take
> > > > > > another
> > > > > > >
> > > > > > > > > look.
> > > > > > > > > >
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > >
> > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > >
> > > > > > > > > > > It would be good if you can include the default value
> for
> > > > this
> > > > >
> > > > > > new
> > > > > > >
> > > > > > > > > config
> > > > > > > > > > > in the KIP.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > >
> > > > > > > > > > > > I've updated the KIP by adding a new config, instead
> of
> > > > > reusing
> > > > > > > the
> > > > > > > > > > > > existing one.
> > > > > > > > > > > > Please take another look when you have time. Thanks a
> > > lot!
> > > > > > > > > > > >
> > > > > > > > > > > > Lucas
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > >
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > bq. that's a waste of resource if control request
> > rate
> > > is
> > > > > low
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't know if control request rate can get to
> > > 100,000,
> > > > > > > likely
> > > > > > > > > not.
> > > > > > > > > > > Then
> > > > > > > > > > > > > using the same bound as that for data requests
> seems
> > > > high.
> > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > Let's say today the setting of
> > "queued.max.requests"
> > > in
> > > > > > > > cluster A
> > > > > > > > > > is
> > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > > The 100 times difference might have indicated
> that
> > > > > machines
> > > > > > > in
> > > > > > > > > > > cluster
> > > > > > > > > > > > B
> > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > controlRequestQueue
> > > > > > > > in
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > B
> > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > gets a 100x capacity without explicitly bothering
> > the
> > > > > > > > operators.
> > > > > > > > > > > > > > I understand the counter argument can be that
> maybe
> > > > > that's
> > > > > > a
> > > > > > >
> > > > > > > > > waste
> > > > > > > > > > of
> > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > rate is low and operators may want to fine tune
> the
> > > > > > capacity
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm ok with either approach, and can change it if
> > you
> > > > or
> > > > >
> > > > > > > anyone
> > > > > > > > > > else
> > > > > > > > > > > > > feels
> > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > yuzhihong@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> > elaborate
> > > a
> > > > > bit
> > > > > > > more
> > > > > > > > > on
> > > > > > > > > > > why
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > > lindong28@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. Some
> > > > > comments
> > > > > > > > below:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - We usually specify the full mbean for the
> new
> > > > > metrics
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > KIP.
> > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > specify it in the Public Interface section
> > > similar
> > > > > to
> > > > > > > > KIP-237
> > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > confluence/display/KAFKA/KIP-
> > > > > > >
> > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Maybe we could follow the same pattern as
> > > KIP-153
> > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > confluence/display/KAFKA/KIP-
> > > > > > >
> > > > > > > > > > > > > > > >
> > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > metric>,
> > > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > > "BytesInPerSec"
> > > > > > > and
> > > > > > > > > add
> > > > > > > > > > a
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> > replacing
> > > > > the
> > > > > > > > sensor
> > > > > > > > > > > name "
> > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> "ClientBytesInPerSec".
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - It seems that the KIP changes the semantics
> > of
> > > > the
> > > > >
> > > > > > > broker
> > > > > > > > > > > config
> > > > > > > > > > > > > > > > "queued.max.requests" because the number of
> > total
> > > > > > > requests
> > > > > > > > > > queued
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > "queued.max.requests".
> > > > > > > > > This
> > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > needs to be specified in the Public
> Interfaces
> > > > > section
> > > > > > > for
> > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang
> <
> > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I created KIP-291 to add a separate queue
> for
> > > > > > > controller
> > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > confluence/display/KAFKA/KIP-
> > > > > > >
> > > > > > > > 291%
> > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > control+requests+and+data+
> > > > > > >
> > > > > > > > > > requests
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you please take a look and let me know
> > your
> > > > > > > feedback?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Dong,

Thanks for your valuable comments. Please see my reply below.

1. The Google doc showed only 1 partition. Now let's consider a more common
scenario
where broker0 is the leader of many partitions. And let's say for some
reason its IO becomes slow.
The number of leader partitions on broker0 is so large, say 10K, that the
cluster is skewed,
and the operator would like to shift the leadership for a lot of
partitions, say 9K, to other brokers,
either manually or through some service like cruise control.
With this KIP, not only will the leadership transitions finish more
quickly, helping the cluster itself becoming more balanced,
but all existing producers corresponding to the 9K partitions will get the
errors relatively quickly
rather than relying on their timeout, thanks to the batched async ZK
operations.
To me it's a useful feature to have during such troublesome times.


2. The experiments in the Google Doc have shown that with this KIP many
producers
receive an explicit error NotLeaderForPartition, based on which they retry
immediately.
Therefore the latency (~14 seconds+quick retry) for their single message is
much smaller
compared with the case of timing out without the KIP (30 seconds for timing
out + quick retry).
One might argue that reducing the timing out on the producer side can
achieve the same result,
yet reducing the timeout has its own drawbacks[1].

Also *IF* there were a metric to show the number of truncated messages on
brokers,
with the experiments done in the Google Doc, it should be easy to see that
a lot fewer messages need
to be truncated on broker0 since the up-to-date metadata avoids appending
of messages
in subsequent PRODUCE requests. If we talk to a system operator and ask
whether
they prefer fewer wasteful IOs, I bet most likely the answer is yes.

3. To answer your question, I think it might be helpful to construct some
formulas.
To simplify the modeling, I'm going back to the case where there is only
ONE partition involved.
Following the experiments in the Google Doc, let's say broker0 becomes the
follower at time t0,
and after t0 there were still N produce requests in its request queue.
With the up-to-date metadata brought by this KIP, broker0 can reply with an
NotLeaderForPartition exception,
let's use M1 to denote the average processing time of replying with such an
error message.
Without this KIP, the broker will need to append messages to segments,
which may trigger a flush to disk,
let's use M2 to denote the average processing time for such logic.
Then the average extra latency incurred without this KIP is N * (M2 - M1) /
2.

In practice, M2 should always be larger than M1, which means as long as N
is positive,
we would see improvements on the average latency.
There does not need to be significant backlog of requests in the request
queue,
or severe degradation of disk performance to have the improvement.

Regards,
Lucas


[1] For instance, reducing the timeout on the producer side can trigger
unnecessary duplicate requests
when the corresponding leader broker is overloaded, exacerbating the
situation.

On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Lucas,
>
> Thanks much for the detailed documentation of the experiment.
>
> Initially I also think having a separate queue for controller requests is
> useful because, as you mentioned in the summary section of the Google doc,
> controller requests are generally more important than data requests and we
> probably want controller requests to be processed sooner. But then Eno has
> two very good questions which I am not sure the Google doc has answered
> explicitly. Could you help with the following questions?
>
> 1) It is not very clear what is the actual benefit of KIP-291 to users. The
> experiment setup in the Google doc simulates the scenario that broker is
> very slow handling ProduceRequest due to e.g. slow disk. It currently
> assumes that there is only 1 partition. But in the common scenario, it is
> probably reasonable to assume that there are many other partitions that are
> also actively produced to and ProduceRequest to these partition also takes
> e.g. 2 seconds to be processed. So even if broker0 can become follower for
> the partition 0 soon, it probably still needs to process the ProduceRequest
> slowly t in the queue because these ProduceRequests cover other partitions.
> Thus most ProduceRequest will still timeout after 30 seconds and most
> clients will still likely timeout after 30 seconds. Then it is not
> obviously what is the benefit to client since client will timeout after 30
> seconds before possibly re-connecting to broker1, with or without KIP-291.
> Did I miss something here?
>
> 2) I guess Eno's is asking for the specific benefits of this KIP to user or
> system administrator, e.g. whether this KIP decreases average latency,
> 999th percentile latency, probably of exception exposed to client etc. It
> is probably useful to clarify this.
>
> 3) Does this KIP help improve user experience only when there is issue with
> broker, e.g. significant backlog in the request queue due to slow disk as
> described in the Google doc? Or is this KIP also useful when there is no
> ongoing issue in the cluster? It might be helpful to clarify this to
> understand the benefit of this KIP.
>
>
> Thanks much,
> Dong
>
>
>
>
> On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Eno,
> >
> > Sorry for the delay in getting the experiment results.
> > Here is a link to the positive impact achieved by implementing the
> proposed
> > change:
> > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > Please take a look when you have time and let me know your feedback.
> >
> > Regards,
> > Lucas
> >
> > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
> >
> > > Thanks for the pointer. Will take a look might suit our requirements
> > > better.
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > >
> > > >
> > > >
> > > > Hi Harsha,
> > > >
> > > > If I understand correctly, the replication quota mechanism proposed
> in
> > > > KIP-73 can be helpful in that scenario.
> > > > Have you tried it out?
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > >
> > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io > wrote:
> > > >
> > > > > Hi Lucas,
> > > > > One more question, any thoughts on making this configurable
> > > > > and also allowing subset of data requests to be prioritized. For
> > > example
> > > >
> > > > > ,we notice in our cluster when we take out a broker and bring new
> one
> > > it
> > > >
> > > > > will try to become follower and have lot of fetch requests to other
> > > > leaders
> > > > > in clusters. This will negatively effect the application/client
> > > > requests.
> > > > > We are also exploring the similar solution to de-prioritize if a
> new
> > > > > replica comes in for fetch requests, we are ok with the replica to
> be
> > > > > taking time but the leaders should prioritize the client requests.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Eno,
> > > > > >
> > > > > > Sorry for the delayed response.
> > > > > > - I haven't implemented the feature yet, so no experimental
> results
> > > so
> > > >
> > > > > > far.
> > > > > > And I plan to test in out in the following days.
> > > > > >
> > > > > > - You are absolutely right that the priority queue does not
> > > completely
> > > >
> > > > > > prevent
> > > > > > data requests being processed ahead of controller requests.
> > > > > > That being said, I expect it to greatly mitigate the effect of
> > stable
> > > > > > metadata.
> > > > > > In any case, I'll try it out and post the results when I have it.
> > > > > >
> > > > > > Regards,
> > > > > > Lucas
> > > > > >
> > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > eno.thereska@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > >
> > > > > > > Sorry for the delay, just had a look at this. A couple of
> > > questions:
> > > >
> > > > > > > - did you notice any positive change after implementing this
> KIP?
> > > > I'm
> > > > > > > wondering if you have any experimental results that show the
> > > benefit
> > > > of
> > > > > > the
> > > > > > > two queues.
> > > > > > >
> > > > > > > - priority is usually not sufficient in addressing the problem
> > the
> > > > KIP
> > > > > > > identifies. Even with priority queues, you will sometimes
> > (often?)
> > > > have
> > > > > > the
> > > > > > > case that data plane requests will be ahead of the control
> plane
> > > > > > requests.
> > > > > > > This happens because the system might have already started
> > > > processing
> > > > > > the
> > > > > > > data plane requests before the control plane ones arrived. So
> it
> > > > would
> > > > > > be
> > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Eno
> > > > > > >
> > > > > >
> > > > > >
> > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Change looks good.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > >
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Ted,
> > > > > > > > >
> > > > > > > > > Thanks for the suggestion. I've updated the KIP. Please
> take
> > > > > another
> > > > > >
> > > > > > > > look.
> > > > > > > > >
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > >
> > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > >
> > > > > > > > > > It would be good if you can include the default value for
> > > this
> > > >
> > > > > new
> > > > > >
> > > > > > > > config
> > > > > > > > > > in the KIP.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > >
> > > > > > > > > > > I've updated the KIP by adding a new config, instead of
> > > > reusing
> > > > > > the
> > > > > > > > > > > existing one.
> > > > > > > > > > > Please take another look when you have time. Thanks a
> > lot!
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > >
> > > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > bq. that's a waste of resource if control request
> rate
> > is
> > > > low
> > > > > > > > > > > >
> > > > > > > > > > > > I don't know if control request rate can get to
> > 100,000,
> > > > > > likely
> > > > > > > > not.
> > > > > > > > > > Then
> > > > > > > > > > > > using the same bound as that for data requests seems
> > > high.
> > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > Let's say today the setting of
> "queued.max.requests"
> > in
> > > > > > > cluster A
> > > > > > > > > is
> > > > > > > > > > > > 1000,
> > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > The 100 times difference might have indicated that
> > > > machines
> > > > > > in
> > > > > > > > > > cluster
> > > > > > > > > > > B
> > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > >
> > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > controlRequestQueue
> > > > > > > in
> > > > > > > > > > > cluster
> > > > > > > > > > > > B
> > > > > > > > > > > > > automatically
> > > > > > > > > > > > > gets a 100x capacity without explicitly bothering
> the
> > > > > > > operators.
> > > > > > > > > > > > > I understand the counter argument can be that maybe
> > > > that's
> > > > > a
> > > > > >
> > > > > > > > waste
> > > > > > > > > of
> > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > rate is low and operators may want to fine tune the
> > > > > capacity
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm ok with either approach, and can change it if
> you
> > > or
> > > >
> > > > > > anyone
> > > > > > > > > else
> > > > > > > > > > > > feels
> > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> elaborate
> > a
> > > > bit
> > > > > > more
> > > > > > > > on
> > > > > > > > > > why
> > > > > > > > > > > > the
> > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > lindong28@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. Some
> > > > comments
> > > > > > > below:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - We usually specify the full mbean for the new
> > > > metrics
> > > > > > in
> > > > > > > > the
> > > > > > > > > > KIP.
> > > > > > > > > > > > Can
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > specify it in the Public Interface section
> > similar
> > > > to
> > > > > > > KIP-237
> > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > confluence/display/KAFKA/KIP-
> > > > > >
> > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Maybe we could follow the same pattern as
> > KIP-153
> > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > confluence/display/KAFKA/KIP-
> > > > > >
> > > > > > > > > > > > > > >
> > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > metric>,
> > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > "BytesInPerSec"
> > > > > > and
> > > > > > > > add
> > > > > > > > > a
> > > > > > > > > > > new
> > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> replacing
> > > > the
> > > > > > > sensor
> > > > > > > > > > name "
> > > > > > > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - It seems that the KIP changes the semantics
> of
> > > the
> > > >
> > > > > > broker
> > > > > > > > > > config
> > > > > > > > > > > > > > > "queued.max.requests" because the number of
> total
> > > > > > requests
> > > > > > > > > queued
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > "queued.max.requests".
> > > > > > > > This
> > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > needs to be specified in the Public Interfaces
> > > > section
> > > > > > for
> > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I created KIP-291 to add a separate queue for
> > > > > > controller
> > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > confluence/display/KAFKA/KIP-
> > > > > >
> > > > > > > 291%
> > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > control+requests+and+data+
> > > > > >
> > > > > > > > > requests
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please take a look and let me know
> your
> > > > > > feedback?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Lucas,

Thanks much for the detailed documentation of the experiment.

Initially I also think having a separate queue for controller requests is
useful because, as you mentioned in the summary section of the Google doc,
controller requests are generally more important than data requests and we
probably want controller requests to be processed sooner. But then Eno has
two very good questions which I am not sure the Google doc has answered
explicitly. Could you help with the following questions?

1) It is not very clear what is the actual benefit of KIP-291 to users. The
experiment setup in the Google doc simulates the scenario that broker is
very slow handling ProduceRequest due to e.g. slow disk. It currently
assumes that there is only 1 partition. But in the common scenario, it is
probably reasonable to assume that there are many other partitions that are
also actively produced to and ProduceRequest to these partition also takes
e.g. 2 seconds to be processed. So even if broker0 can become follower for
the partition 0 soon, it probably still needs to process the ProduceRequest
slowly t in the queue because these ProduceRequests cover other partitions.
Thus most ProduceRequest will still timeout after 30 seconds and most
clients will still likely timeout after 30 seconds. Then it is not
obviously what is the benefit to client since client will timeout after 30
seconds before possibly re-connecting to broker1, with or without KIP-291.
Did I miss something here?

2) I guess Eno's is asking for the specific benefits of this KIP to user or
system administrator, e.g. whether this KIP decreases average latency,
999th percentile latency, probably of exception exposed to client etc. It
is probably useful to clarify this.

3) Does this KIP help improve user experience only when there is issue with
broker, e.g. significant backlog in the request queue due to slow disk as
described in the Google doc? Or is this KIP also useful when there is no
ongoing issue in the cluster? It might be helpful to clarify this to
understand the benefit of this KIP.


Thanks much,
Dong




On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Eno,
>
> Sorry for the delay in getting the experiment results.
> Here is a link to the positive impact achieved by implementing the proposed
> change:
> https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> Please take a look when you have time and let me know your feedback.
>
> Regards,
> Lucas
>
> On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:
>
> > Thanks for the pointer. Will take a look might suit our requirements
> > better.
> >
> > Thanks,
> > Harsha
> >
> > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > >
> > >
> > >
> > > Hi Harsha,
> > >
> > > If I understand correctly, the replication quota mechanism proposed in
> > > KIP-73 can be helpful in that scenario.
> > > Have you tried it out?
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > >
> > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io > wrote:
> > >
> > > > Hi Lucas,
> > > > One more question, any thoughts on making this configurable
> > > > and also allowing subset of data requests to be prioritized. For
> > example
> > >
> > > > ,we notice in our cluster when we take out a broker and bring new one
> > it
> > >
> > > > will try to become follower and have lot of fetch requests to other
> > > leaders
> > > > in clusters. This will negatively effect the application/client
> > > requests.
> > > > We are also exploring the similar solution to de-prioritize if a new
> > > > replica comes in for fetch requests, we are ok with the replica to be
> > > > taking time but the leaders should prioritize the client requests.
> > > >
> > > >
> > > > Thanks,
> > > > Harsha
> > > >
> > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Hi Eno,
> > > > >
> > > > > Sorry for the delayed response.
> > > > > - I haven't implemented the feature yet, so no experimental results
> > so
> > >
> > > > > far.
> > > > > And I plan to test in out in the following days.
> > > > >
> > > > > - You are absolutely right that the priority queue does not
> > completely
> > >
> > > > > prevent
> > > > > data requests being processed ahead of controller requests.
> > > > > That being said, I expect it to greatly mitigate the effect of
> stable
> > > > > metadata.
> > > > > In any case, I'll try it out and post the results when I have it.
> > > > >
> > > > > Regards,
> > > > > Lucas
> > > > >
> > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > eno.thereska@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Lucas,
> > > > > >
> > > > > > Sorry for the delay, just had a look at this. A couple of
> > questions:
> > >
> > > > > > - did you notice any positive change after implementing this KIP?
> > > I'm
> > > > > > wondering if you have any experimental results that show the
> > benefit
> > > of
> > > > > the
> > > > > > two queues.
> > > > > >
> > > > > > - priority is usually not sufficient in addressing the problem
> the
> > > KIP
> > > > > > identifies. Even with priority queues, you will sometimes
> (often?)
> > > have
> > > > > the
> > > > > > case that data plane requests will be ahead of the control plane
> > > > > requests.
> > > > > > This happens because the system might have already started
> > > processing
> > > > > the
> > > > > > data plane requests before the control plane ones arrived. So it
> > > would
> > > > > be
> > > > > > good to know what % of the problem this KIP addresses.
> > > > > >
> > > > > > Thanks
> > > > > > Eno
> > > > > >
> > > > >
> > > > >
> > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com >
> > > wrote:
> > > > > >
> > > > > > > Change looks good.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > lucasatucla@gmail.com
> > >
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ted,
> > > > > > > >
> > > > > > > > Thanks for the suggestion. I've updated the KIP. Please take
> > > > another
> > > > >
> > > > > > > look.
> > > > > > > >
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > >
> > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > >
> > > > > > > > > It would be good if you can include the default value for
> > this
> > >
> > > > new
> > > > >
> > > > > > > config
> > > > > > > > > in the KIP.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Ted, Dong
> > > > > > > > > >
> > > > > > > > > > I've updated the KIP by adding a new config, instead of
> > > reusing
> > > > > the
> > > > > > > > > > existing one.
> > > > > > > > > > Please take another look when you have time. Thanks a
> lot!
> > > > > > > > > >
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > yuzhihong@gmail.com
> > >
> > > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > bq. that's a waste of resource if control request rate
> is
> > > low
> > > > > > > > > > >
> > > > > > > > > > > I don't know if control request rate can get to
> 100,000,
> > > > > likely
> > > > > > > not.
> > > > > > > > > Then
> > > > > > > > > > > using the same bound as that for data requests seems
> > high.
> > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > Let's say today the setting of "queued.max.requests"
> in
> > > > > > cluster A
> > > > > > > > is
> > > > > > > > > > > 1000,
> > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > The 100 times difference might have indicated that
> > > machines
> > > > > in
> > > > > > > > > cluster
> > > > > > > > > > B
> > > > > > > > > > > > have larger memory.
> > > > > > > > > > > >
> > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > controlRequestQueue
> > > > > > in
> > > > > > > > > > cluster
> > > > > > > > > > > B
> > > > > > > > > > > > automatically
> > > > > > > > > > > > gets a 100x capacity without explicitly bothering the
> > > > > > operators.
> > > > > > > > > > > > I understand the counter argument can be that maybe
> > > that's
> > > > a
> > > > >
> > > > > > > waste
> > > > > > > > of
> > > > > > > > > > > > resource if control request
> > > > > > > > > > > > rate is low and operators may want to fine tune the
> > > > capacity
> > > > > of
> > > > > > > the
> > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm ok with either approach, and can change it if you
> > or
> > >
> > > > > anyone
> > > > > > > > else
> > > > > > > > > > > feels
> > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Lucas
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > Under Rejected Alternatives, #2, can you elaborate
> a
> > > bit
> > > > > more
> > > > > > > on
> > > > > > > > > why
> > > > > > > > > > > the
> > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > lindong28@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. Some
> > > comments
> > > > > > below:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - We usually specify the full mbean for the new
> > > metrics
> > > > > in
> > > > > > > the
> > > > > > > > > KIP.
> > > > > > > > > > > Can
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > specify it in the Public Interface section
> similar
> > > to
> > > > > > KIP-237
> > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > confluence/display/KAFKA/KIP-
> > > > >
> > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Maybe we could follow the same pattern as
> KIP-153
> > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > confluence/display/KAFKA/KIP-
> > > > >
> > > > > > > > > > > > > >
> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > metric>,
> > > > > > > > > > > > > > where we keep the existing sensor name
> > > "BytesInPerSec"
> > > > > and
> > > > > > > add
> > > > > > > > a
> > > > > > > > > > new
> > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than replacing
> > > the
> > > > > > sensor
> > > > > > > > > name "
> > > > > > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - It seems that the KIP changes the semantics of
> > the
> > >
> > > > > broker
> > > > > > > > > config
> > > > > > > > > > > > > > "queued.max.requests" because the number of total
> > > > > requests
> > > > > > > > queued
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > "queued.max.requests".
> > > > > > > This
> > > > > > > > > > > > probably
> > > > > > > > > > > > > > needs to be specified in the Public Interfaces
> > > section
> > > > > for
> > > > > > > > > > > discussion.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I created KIP-291 to add a separate queue for
> > > > > controller
> > > > > > > > > > requests:
> > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > confluence/display/KAFKA/KIP-
> > > > >
> > > > > > 291%
> > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > control+requests+and+data+
> > > > >
> > > > > > > > requests
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you please take a look and let me know your
> > > > > feedback?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Eno,

Sorry for the delay in getting the experiment results.
Here is a link to the positive impact achieved by implementing the proposed
change:
https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhWFWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
Please take a look when you have time and let me know your feedback.

Regards,
Lucas

On Tue, Jun 26, 2018 at 9:52 AM, Harsha <ka...@harsha.io> wrote:

> Thanks for the pointer. Will take a look might suit our requirements
> better.
>
> Thanks,
> Harsha
>
> On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> >
> >
> >
> > Hi Harsha,
> >
> > If I understand correctly, the replication quota mechanism proposed in
> > KIP-73 can be helpful in that scenario.
> > Have you tried it out?
> >
> > Thanks,
> > Lucas
> >
> >
> >
> > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io > wrote:
> >
> > > Hi Lucas,
> > > One more question, any thoughts on making this configurable
> > > and also allowing subset of data requests to be prioritized. For
> example
> >
> > > ,we notice in our cluster when we take out a broker and bring new one
> it
> >
> > > will try to become follower and have lot of fetch requests to other
> > leaders
> > > in clusters. This will negatively effect the application/client
> > requests.
> > > We are also exploring the similar solution to de-prioritize if a new
> > > replica comes in for fetch requests, we are ok with the replica to be
> > > taking time but the leaders should prioritize the client requests.
> > >
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > >
> > > >
> > > >
> > > >
> > > > Hi Eno,
> > > >
> > > > Sorry for the delayed response.
> > > > - I haven't implemented the feature yet, so no experimental results
> so
> >
> > > > far.
> > > > And I plan to test in out in the following days.
> > > >
> > > > - You are absolutely right that the priority queue does not
> completely
> >
> > > > prevent
> > > > data requests being processed ahead of controller requests.
> > > > That being said, I expect it to greatly mitigate the effect of stable
> > > > metadata.
> > > > In any case, I'll try it out and post the results when I have it.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> eno.thereska@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Lucas,
> > > > >
> > > > > Sorry for the delay, just had a look at this. A couple of
> questions:
> >
> > > > > - did you notice any positive change after implementing this KIP?
> > I'm
> > > > > wondering if you have any experimental results that show the
> benefit
> > of
> > > > the
> > > > > two queues.
> > > > >
> > > > > - priority is usually not sufficient in addressing the problem the
> > KIP
> > > > > identifies. Even with priority queues, you will sometimes (often?)
> > have
> > > > the
> > > > > case that data plane requests will be ahead of the control plane
> > > > requests.
> > > > > This happens because the system might have already started
> > processing
> > > > the
> > > > > data plane requests before the control plane ones arrived. So it
> > would
> > > > be
> > > > > good to know what % of the problem this KIP addresses.
> > > > >
> > > > > Thanks
> > > > > Eno
> > > > >
> > > >
> > > >
> > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com >
> > wrote:
> > > > >
> > > > > > Change looks good.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> lucasatucla@gmail.com
> >
> > > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Ted,
> > > > > > >
> > > > > > > Thanks for the suggestion. I've updated the KIP. Please take
> > > another
> > > >
> > > > > > look.
> > > > > > >
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu < yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > >
> > > > > > > > val QueuedMaxRequests = 500
> > > > > > > >
> > > > > > > > It would be good if you can include the default value for
> this
> >
> > > new
> > > >
> > > > > > config
> > > > > > > > in the KIP.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > lucasatucla@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Ted, Dong
> > > > > > > > >
> > > > > > > > > I've updated the KIP by adding a new config, instead of
> > reusing
> > > > the
> > > > > > > > > existing one.
> > > > > > > > > Please take another look when you have time. Thanks a lot!
> > > > > > > > >
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> yuzhihong@gmail.com
> >
> > > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > bq. that's a waste of resource if control request rate is
> > low
> > > > > > > > > >
> > > > > > > > > > I don't know if control request rate can get to 100,000,
> > > > likely
> > > > > > not.
> > > > > > > > Then
> > > > > > > > > > using the same bound as that for data requests seems
> high.
> >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > Let's say today the setting of "queued.max.requests" in
> > > > > cluster A
> > > > > > > is
> > > > > > > > > > 1000,
> > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > The 100 times difference might have indicated that
> > machines
> > > > in
> > > > > > > > cluster
> > > > > > > > > B
> > > > > > > > > > > have larger memory.
> > > > > > > > > > >
> > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > controlRequestQueue
> > > > > in
> > > > > > > > > cluster
> > > > > > > > > > B
> > > > > > > > > > > automatically
> > > > > > > > > > > gets a 100x capacity without explicitly bothering the
> > > > > operators.
> > > > > > > > > > > I understand the counter argument can be that maybe
> > that's
> > > a
> > > >
> > > > > > waste
> > > > > > > of
> > > > > > > > > > > resource if control request
> > > > > > > > > > > rate is low and operators may want to fine tune the
> > > capacity
> > > > of
> > > > > > the
> > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > >
> > > > > > > > > > > I'm ok with either approach, and can change it if you
> or
> >
> > > > anyone
> > > > > > > else
> > > > > > > > > > feels
> > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Lucas:
> > > > > > > > > > > > Under Rejected Alternatives, #2, can you elaborate a
> > bit
> > > > more
> > > > > > on
> > > > > > > > why
> > > > > > > > > > the
> > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > lindong28@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the KIP. Looks good overall. Some
> > comments
> > > > > below:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - We usually specify the full mbean for the new
> > metrics
> > > > in
> > > > > > the
> > > > > > > > KIP.
> > > > > > > > > > Can
> > > > > > > > > > > > you
> > > > > > > > > > > > > specify it in the Public Interface section similar
> > to
> > > > > KIP-237
> > > > > > > > > > > > > < https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > >
> > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > > > > > > < https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > >
> > > > > > > > > > > > >
> > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > metric>,
> > > > > > > > > > > > > where we keep the existing sensor name
> > "BytesInPerSec"
> > > > and
> > > > > > add
> > > > > > > a
> > > > > > > > > new
> > > > > > > > > > > > sensor
> > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than replacing
> > the
> > > > > sensor
> > > > > > > > name "
> > > > > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > > > > >
> > > > > > > > > > > > > - It seems that the KIP changes the semantics of
> the
> >
> > > > broker
> > > > > > > > config
> > > > > > > > > > > > > "queued.max.requests" because the number of total
> > > > requests
> > > > > > > queued
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > broker will be no longer bounded by
> > > > "queued.max.requests".
> > > > > > This
> > > > > > > > > > > probably
> > > > > > > > > > > > > needs to be specified in the Public Interfaces
> > section
> > > > for
> > > > > > > > > > discussion.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Dong
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I created KIP-291 to add a separate queue for
> > > > controller
> > > > > > > > > requests:
> > > > > > > > > > > > > > https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > >
> > > > > 291%
> > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > control+requests+and+data+
> > > >
> > > > > > > requests
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please take a look and let me know your
> > > > feedback?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Harsha <ka...@harsha.io>.

Thanks for the pointer. Will take a look might suit our requirements better.

Thanks,
Harsha

On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <lu...@gmail.com> wrote:

> 
> 
> 
> Hi Harsha,
> 
> If I understand correctly, the replication quota mechanism proposed in
> KIP-73 can be helpful in that scenario.
> Have you tried it out?
> 
> Thanks,
> Lucas
> 
> 
> 
> On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io > wrote:
> 
> > Hi Lucas,
> > One more question, any thoughts on making this configurable
> > and also allowing subset of data requests to be prioritized. For example
> 
> > ,we notice in our cluster when we take out a broker and bring new one it
> 
> > will try to become follower and have lot of fetch requests to other
> leaders
> > in clusters. This will negatively effect the application/client
> requests.
> > We are also exploring the similar solution to de-prioritize if a new
> > replica comes in for fetch requests, we are ok with the replica to be
> > taking time but the leaders should prioritize the client requests.
> >
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> >
> > >
> > >
> > >
> > > Hi Eno,
> > >
> > > Sorry for the delayed response.
> > > - I haven't implemented the feature yet, so no experimental results so
> 
> > > far.
> > > And I plan to test in out in the following days.
> > >
> > > - You are absolutely right that the priority queue does not completely
> 
> > > prevent
> > > data requests being processed ahead of controller requests.
> > > That being said, I expect it to greatly mitigate the effect of stable
> > > metadata.
> > > In any case, I'll try it out and post the results when I have it.
> > >
> > > Regards,
> > > Lucas
> > >
> > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska < eno.thereska@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Lucas,
> > > >
> > > > Sorry for the delay, just had a look at this. A couple of questions:
> 
> > > > - did you notice any positive change after implementing this KIP?
> I'm
> > > > wondering if you have any experimental results that show the benefit
> of
> > > the
> > > > two queues.
> > > >
> > > > - priority is usually not sufficient in addressing the problem the
> KIP
> > > > identifies. Even with priority queues, you will sometimes (often?)
> have
> > > the
> > > > case that data plane requests will be ahead of the control plane
> > > requests.
> > > > This happens because the system might have already started
> processing
> > > the
> > > > data plane requests before the control plane ones arrived. So it
> would
> > > be
> > > > good to know what % of the problem this KIP addresses.
> > > >
> > > > Thanks
> > > > Eno
> > > >
> > >
> > >
> > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com >
> wrote:
> > > >
> > > > > Change looks good.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang < lucasatucla@gmail.com
> 
> > >
> > > > wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > Thanks for the suggestion. I've updated the KIP. Please take
> > another
> > >
> > > > > look.
> > > > > >
> > > > > > Lucas
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu < yuzhihong@gmail.com >
> > > wrote:
> > > > > >
> > > > > > > Currently in KafkaConfig.scala :
> > > > > > >
> > > > > > > val QueuedMaxRequests = 500
> > > > > > >
> > > > > > > It would be good if you can include the default value for this
> 
> > new
> > >
> > > > > config
> > > > > > > in the KIP.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > lucasatucla@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ted, Dong
> > > > > > > >
> > > > > > > > I've updated the KIP by adding a new config, instead of
> reusing
> > > the
> > > > > > > > existing one.
> > > > > > > > Please take another look when you have time. Thanks a lot!
> > > > > > > >
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu < yuzhihong@gmail.com
> 
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > bq. that's a waste of resource if control request rate is
> low
> > > > > > > > >
> > > > > > > > > I don't know if control request rate can get to 100,000,
> > > likely
> > > > > not.
> > > > > > > Then
> > > > > > > > > using the same bound as that for data requests seems high.
> 
> > > > > > > > >
> > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Ted,
> > > > > > > > > >
> > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > Let's say today the setting of "queued.max.requests" in
> > > > cluster A
> > > > > > is
> > > > > > > > > 1000,
> > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > The 100 times difference might have indicated that
> machines
> > > in
> > > > > > > cluster
> > > > > > > > B
> > > > > > > > > > have larger memory.
> > > > > > > > > >
> > > > > > > > > > By reusing the "queued.max.requests", the
> > > controlRequestQueue
> > > > in
> > > > > > > > cluster
> > > > > > > > > B
> > > > > > > > > > automatically
> > > > > > > > > > gets a 100x capacity without explicitly bothering the
> > > > operators.
> > > > > > > > > > I understand the counter argument can be that maybe
> that's
> > a
> > >
> > > > > waste
> > > > > > of
> > > > > > > > > > resource if control request
> > > > > > > > > > rate is low and operators may want to fine tune the
> > capacity
> > > of
> > > > > the
> > > > > > > > > > controlRequestQueue.
> > > > > > > > > >
> > > > > > > > > > I'm ok with either approach, and can change it if you or
> 
> > > anyone
> > > > > > else
> > > > > > > > > feels
> > > > > > > > > > strong about adding the extra config.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Lucas:
> > > > > > > > > > > Under Rejected Alternatives, #2, can you elaborate a
> bit
> > > more
> > > > > on
> > > > > > > why
> > > > > > > > > the
> > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > lindong28@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP. Looks good overall. Some
> comments
> > > > below:
> > > > > > > > > > > >
> > > > > > > > > > > > - We usually specify the full mbean for the new
> metrics
> > > in
> > > > > the
> > > > > > > KIP.
> > > > > > > > > Can
> > > > > > > > > > > you
> > > > > > > > > > > > specify it in the Public Interface section similar
> to
> > > > KIP-237
> > > > > > > > > > > > < https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > >
> > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > ?
> > > > > > > > > > > >
> > > > > > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > > > > > < https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > >
> > > > > > > > > > > >
> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > metric>,
> > > > > > > > > > > > where we keep the existing sensor name
> "BytesInPerSec"
> > > and
> > > > > add
> > > > > > a
> > > > > > > > new
> > > > > > > > > > > sensor
> > > > > > > > > > > > "ReplicationBytesInPerSec", rather than replacing
> the
> > > > sensor
> > > > > > > name "
> > > > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > > > >
> > > > > > > > > > > > - It seems that the KIP changes the semantics of the
> 
> > > broker
> > > > > > > config
> > > > > > > > > > > > "queued.max.requests" because the number of total
> > > requests
> > > > > > queued
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > broker will be no longer bounded by
> > > "queued.max.requests".
> > > > > This
> > > > > > > > > > probably
> > > > > > > > > > > > needs to be specified in the Public Interfaces
> section
> > > for
> > > > > > > > > discussion.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Dong
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I created KIP-291 to add a separate queue for
> > > controller
> > > > > > > > requests:
> > > > > > > > > > > > > https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > >
> > > > 291%
> > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > control+requests+and+data+
> > >
> > > > > > requests
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you please take a look and let me know your
> > > feedback?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> 
> 
> 
> 
> 
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Harsha,

If I understand correctly, the replication quota mechanism proposed in
KIP-73 can be helpful in that scenario.
Have you tried it out?

Thanks,
Lucas

On Sun, Jun 24, 2018 at 8:28 AM, Harsha <ka...@harsha.io> wrote:

> Hi Lucas,
>              One more question, any thoughts on making this configurable
> and also allowing subset of data requests to be prioritized. For example
> ,we notice in our cluster when we take out a broker and bring new one it
> will try to become follower and have lot of fetch requests to other leaders
> in clusters. This will negatively effect the application/client requests.
> We are also exploring the similar solution to de-prioritize if a new
> replica comes in for fetch requests, we are ok with the replica to be
> taking time but the leaders should prioritize the client requests.
>
>
> Thanks,
> Harsha
>
> On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
>
> >
> >
> >
> > Hi Eno,
> >
> > Sorry for the delayed response.
> > - I haven't implemented the feature yet, so no experimental results so
> > far.
> > And I plan to test in out in the following days.
> >
> > - You are absolutely right that the priority queue does not completely
> > prevent
> > data requests being processed ahead of controller requests.
> > That being said, I expect it to greatly mitigate the effect of stable
> > metadata.
> > In any case, I'll try it out and post the results when I have it.
> >
> > Regards,
> > Lucas
> >
> > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska < eno.thereska@gmail.com >
> > wrote:
> >
> > > Hi Lucas,
> > >
> > > Sorry for the delay, just had a look at this. A couple of questions:
> > > - did you notice any positive change after implementing this KIP? I'm
> > > wondering if you have any experimental results that show the benefit of
> > the
> > > two queues.
> > >
> > > - priority is usually not sufficient in addressing the problem the KIP
> > > identifies. Even with priority queues, you will sometimes (often?) have
> > the
> > > case that data plane requests will be ahead of the control plane
> > requests.
> > > This happens because the system might have already started processing
> > the
> > > data plane requests before the control plane ones arrived. So it would
> > be
> > > good to know what % of the problem this KIP addresses.
> > >
> > > Thanks
> > > Eno
> > >
> >
> >
> > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com > wrote:
> > >
> > > > Change looks good.
> > > >
> > > > Thanks
> > > >
> > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang < lucasatucla@gmail.com
> >
> > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > Thanks for the suggestion. I've updated the KIP. Please take
> another
> >
> > > > look.
> > > > >
> > > > > Lucas
> > > > >
> > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu < yuzhihong@gmail.com >
> > wrote:
> > > > >
> > > > > > Currently in KafkaConfig.scala :
> > > > > >
> > > > > > val QueuedMaxRequests = 500
> > > > > >
> > > > > > It would be good if you can include the default value for this
> new
> >
> > > > config
> > > > > > in the KIP.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Ted, Dong
> > > > > > >
> > > > > > > I've updated the KIP by adding a new config, instead of reusing
> > the
> > > > > > > existing one.
> > > > > > > Please take another look when you have time. Thanks a lot!
> > > > > > >
> > > > > > > Lucas
> > > > > > >
> > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu < yuzhihong@gmail.com
> >
> > > wrote:
> > > > > > >
> > > > > > > > bq. that's a waste of resource if control request rate is low
> > > > > > > >
> > > > > > > > I don't know if control request rate can get to 100,000,
> > likely
> > > > not.
> > > > > > Then
> > > > > > > > using the same bound as that for data requests seems high.
> > > > > > > >
> > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > lucasatucla@gmail.com >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Ted,
> > > > > > > > >
> > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > Let's say today the setting of "queued.max.requests" in
> > > cluster A
> > > > > is
> > > > > > > > 1000,
> > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > The 100 times difference might have indicated that machines
> > in
> > > > > > cluster
> > > > > > > B
> > > > > > > > > have larger memory.
> > > > > > > > >
> > > > > > > > > By reusing the "queued.max.requests", the
> > controlRequestQueue
> > > in
> > > > > > > cluster
> > > > > > > > B
> > > > > > > > > automatically
> > > > > > > > > gets a 100x capacity without explicitly bothering the
> > > operators.
> > > > > > > > > I understand the counter argument can be that maybe that's
> a
> >
> > > > waste
> > > > > of
> > > > > > > > > resource if control request
> > > > > > > > > rate is low and operators may want to fine tune the
> capacity
> > of
> > > > the
> > > > > > > > > controlRequestQueue.
> > > > > > > > >
> > > > > > > > > I'm ok with either approach, and can change it if you or
> > anyone
> > > > > else
> > > > > > > > feels
> > > > > > > > > strong about adding the extra config.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Lucas:
> > > > > > > > > > Under Rejected Alternatives, #2, can you elaborate a bit
> > more
> > > > on
> > > > > > why
> > > > > > > > the
> > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > lindong28@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Luca,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP. Looks good overall. Some comments
> > > below:
> > > > > > > > > > >
> > > > > > > > > > > - We usually specify the full mbean for the new metrics
> > in
> > > > the
> > > > > > KIP.
> > > > > > > > Can
> > > > > > > > > > you
> > > > > > > > > > > specify it in the Public Interface section similar to
> > > KIP-237
> > > > > > > > > > > < https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
> >
> > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > ?
> > > > > > > > > > >
> > > > > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > > > > < https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
> >
> > > > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > metric>,
> > > > > > > > > > > where we keep the existing sensor name "BytesInPerSec"
> > and
> > > > add
> > > > > a
> > > > > > > new
> > > > > > > > > > sensor
> > > > > > > > > > > "ReplicationBytesInPerSec", rather than replacing the
> > > sensor
> > > > > > name "
> > > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > > >
> > > > > > > > > > > - It seems that the KIP changes the semantics of the
> > broker
> > > > > > config
> > > > > > > > > > > "queued.max.requests" because the number of total
> > requests
> > > > > queued
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > broker will be no longer bounded by
> > "queued.max.requests".
> > > > This
> > > > > > > > > probably
> > > > > > > > > > > needs to be specified in the Public Interfaces section
> > for
> > > > > > > > discussion.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Dong
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > >
> > > > > > > > > > > > I created KIP-291 to add a separate queue for
> > controller
> > > > > > > requests:
> > > > > > > > > > > > https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
> >
> > > 291%
> > > > > > > > > > > > 3A+Have+separate+queues+for+
> control+requests+and+data+
> >
> > > > > requests
> > > > > > > > > > > >
> > > > > > > > > > > > Can you please take a look and let me know your
> > feedback?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Lucas
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Harsha <ka...@harsha.io>.

Hi Lucas,
             One more question, any thoughts on making this configurable and also allowing subset of data requests to be prioritized. For example ,we notice in our cluster when we take out a broker and bring new one it will try to become follower and have lot of fetch requests to other leaders in clusters. This will negatively effect the application/client requests. We are also exploring the similar solution to de-prioritize if a new replica comes in for fetch requests, we are ok with the replica to be taking time but the leaders should prioritize the client requests. 
            

Thanks,
Harsha

On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:

> 
> 
> 
> Hi Eno,
> 
> Sorry for the delayed response.
> - I haven't implemented the feature yet, so no experimental results so
> far.
> And I plan to test in out in the following days.
> 
> - You are absolutely right that the priority queue does not completely
> prevent
> data requests being processed ahead of controller requests.
> That being said, I expect it to greatly mitigate the effect of stable
> metadata.
> In any case, I'll try it out and post the results when I have it.
> 
> Regards,
> Lucas
> 
> On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska < eno.thereska@gmail.com >
> wrote:
> 
> > Hi Lucas,
> >
> > Sorry for the delay, just had a look at this. A couple of questions:
> > - did you notice any positive change after implementing this KIP? I'm
> > wondering if you have any experimental results that show the benefit of
> the
> > two queues.
> >
> > - priority is usually not sufficient in addressing the problem the KIP
> > identifies. Even with priority queues, you will sometimes (often?) have
> the
> > case that data plane requests will be ahead of the control plane
> requests.
> > This happens because the system might have already started processing
> the
> > data plane requests before the control plane ones arrived. So it would
> be
> > good to know what % of the problem this KIP addresses.
> >
> > Thanks
> > Eno
> >
> 
> 
> > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu < yuzhihong@gmail.com > wrote:
> >
> > > Change looks good.
> > >
> > > Thanks
> > >
> > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang < lucasatucla@gmail.com >
> > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > Thanks for the suggestion. I've updated the KIP. Please take another
> 
> > > look.
> > > >
> > > > Lucas
> > > >
> > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu < yuzhihong@gmail.com >
> wrote:
> > > >
> > > > > Currently in KafkaConfig.scala :
> > > > >
> > > > > val QueuedMaxRequests = 500
> > > > >
> > > > > It would be good if you can include the default value for this new
> 
> > > config
> > > > > in the KIP.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang < lucasatucla@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Ted, Dong
> > > > > >
> > > > > > I've updated the KIP by adding a new config, instead of reusing
> the
> > > > > > existing one.
> > > > > > Please take another look when you have time. Thanks a lot!
> > > > > >
> > > > > > Lucas
> > > > > >
> > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu < yuzhihong@gmail.com >
> > wrote:
> > > > > >
> > > > > > > bq. that's a waste of resource if control request rate is low
> > > > > > >
> > > > > > > I don't know if control request rate can get to 100,000,
> likely
> > > not.
> > > > > Then
> > > > > > > using the same bound as that for data requests seems high.
> > > > > > >
> > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > lucasatucla@gmail.com >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ted,
> > > > > > > >
> > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > Let's say today the setting of "queued.max.requests" in
> > cluster A
> > > > is
> > > > > > > 1000,
> > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > The 100 times difference might have indicated that machines
> in
> > > > > cluster
> > > > > > B
> > > > > > > > have larger memory.
> > > > > > > >
> > > > > > > > By reusing the "queued.max.requests", the
> controlRequestQueue
> > in
> > > > > > cluster
> > > > > > > B
> > > > > > > > automatically
> > > > > > > > gets a 100x capacity without explicitly bothering the
> > operators.
> > > > > > > > I understand the counter argument can be that maybe that's a
> 
> > > waste
> > > > of
> > > > > > > > resource if control request
> > > > > > > > rate is low and operators may want to fine tune the capacity
> of
> > > the
> > > > > > > > controlRequestQueue.
> > > > > > > >
> > > > > > > > I'm ok with either approach, and can change it if you or
> anyone
> > > > else
> > > > > > > feels
> > > > > > > > strong about adding the extra config.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu < yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Lucas:
> > > > > > > > > Under Rejected Alternatives, #2, can you elaborate a bit
> more
> > > on
> > > > > why
> > > > > > > the
> > > > > > > > > separate config has bigger impact ?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > lindong28@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Luca,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good overall. Some comments
> > below:
> > > > > > > > > >
> > > > > > > > > > - We usually specify the full mbean for the new metrics
> in
> > > the
> > > > > KIP.
> > > > > > > Can
> > > > > > > > > you
> > > > > > > > > > specify it in the Public Interface section similar to
> > KIP-237
> > > > > > > > > > < https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 
> > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > ?
> > > > > > > > > >
> > > > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > > > < https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 
> > > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > metric>,
> > > > > > > > > > where we keep the existing sensor name "BytesInPerSec"
> and
> > > add
> > > > a
> > > > > > new
> > > > > > > > > sensor
> > > > > > > > > > "ReplicationBytesInPerSec", rather than replacing the
> > sensor
> > > > > name "
> > > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > > >
> > > > > > > > > > - It seems that the KIP changes the semantics of the
> broker
> > > > > config
> > > > > > > > > > "queued.max.requests" because the number of total
> requests
> > > > queued
> > > > > > in
> > > > > > > > the
> > > > > > > > > > broker will be no longer bounded by
> "queued.max.requests".
> > > This
> > > > > > > > probably
> > > > > > > > > > needs to be specified in the Public Interfaces section
> for
> > > > > > > discussion.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dong
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > > lucasatucla@gmail.com >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > >
> > > > > > > > > > > I created KIP-291 to add a separate queue for
> controller
> > > > > > requests:
> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 
> > 291%
> > > > > > > > > > > 3A+Have+separate+queues+for+control+requests+and+data+
> 
> > > > requests
> > > > > > > > > > >
> > > > > > > > > > > Can you please take a look and let me know your
> feedback?
> > > > > > > > > > >
> > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > Regards,
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 
> 
> 
> 
> 
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Eno,

Sorry for the delayed response.
- I haven't implemented the feature yet, so no experimental results so far.
And I plan to test in out in the following days.

- You are absolutely right that the priority queue does not completely
prevent
data requests being processed ahead of controller requests.
That being said, I expect it to greatly mitigate the effect of stable
metadata.
In any case, I'll try it out and post the results when I have it.

Regards,
Lucas

On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <en...@gmail.com>
wrote:

> Hi Lucas,
>
> Sorry for the delay, just had a look at this. A couple of questions:
> - did you notice any positive change after implementing this KIP? I'm
> wondering if you have any experimental results that show the benefit of the
> two queues.
>
> - priority is usually not sufficient in addressing the problem the KIP
> identifies. Even with priority queues, you will sometimes (often?) have the
> case that data plane requests will be ahead of the control plane requests.
> This happens because the system might have already started processing the
> data plane requests before the control plane ones arrived. So it would be
> good to know what % of the problem this KIP addresses.
>
> Thanks
> Eno
>
> On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Change looks good.
> >
> > Thanks
> >
> > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Ted,
> > >
> > > Thanks for the suggestion. I've updated the KIP. Please take another
> > look.
> > >
> > > Lucas
> > >
> > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Currently in KafkaConfig.scala :
> > > >
> > > >   val QueuedMaxRequests = 500
> > > >
> > > > It would be good if you can include the default value for this new
> > config
> > > > in the KIP.
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Ted, Dong
> > > > >
> > > > > I've updated the KIP by adding a new config, instead of reusing the
> > > > > existing one.
> > > > > Please take another look when you have time. Thanks a lot!
> > > > >
> > > > > Lucas
> > > > >
> > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > bq.  that's a waste of resource if control request rate is low
> > > > > >
> > > > > > I don't know if control request rate can get to 100,000, likely
> > not.
> > > > Then
> > > > > > using the same bound as that for data requests seems high.
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ted,
> > > > > > >
> > > > > > > Thanks for taking a look at this KIP.
> > > > > > > Let's say today the setting of "queued.max.requests" in
> cluster A
> > > is
> > > > > > 1000,
> > > > > > > while the setting in cluster B is 100,000.
> > > > > > > The 100 times difference might have indicated that machines in
> > > > cluster
> > > > > B
> > > > > > > have larger memory.
> > > > > > >
> > > > > > > By reusing the "queued.max.requests", the controlRequestQueue
> in
> > > > > cluster
> > > > > > B
> > > > > > > automatically
> > > > > > > gets a 100x capacity without explicitly bothering the
> operators.
> > > > > > > I understand the counter argument can be that maybe that's a
> > waste
> > > of
> > > > > > > resource if control request
> > > > > > > rate is low and operators may want to fine tune the capacity of
> > the
> > > > > > > controlRequestQueue.
> > > > > > >
> > > > > > > I'm ok with either approach, and can change it if you or anyone
> > > else
> > > > > > feels
> > > > > > > strong about adding the extra config.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lucas
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Lucas:
> > > > > > > > Under Rejected Alternatives, #2, can you elaborate a bit more
> > on
> > > > why
> > > > > > the
> > > > > > > > separate config has bigger impact ?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> lindong28@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Luca,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Looks good overall. Some comments
> below:
> > > > > > > > >
> > > > > > > > > - We usually specify the full mbean for the new metrics in
> > the
> > > > KIP.
> > > > > > Can
> > > > > > > > you
> > > > > > > > > specify it in the Public Interface section similar to
> KIP-237
> > > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > ?
> > > > > > > > >
> > > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > metric>,
> > > > > > > > > where we keep the existing sensor name "BytesInPerSec" and
> > add
> > > a
> > > > > new
> > > > > > > > sensor
> > > > > > > > > "ReplicationBytesInPerSec", rather than replacing the
> sensor
> > > > name "
> > > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > > >
> > > > > > > > > - It seems that the KIP changes the semantics of the broker
> > > > config
> > > > > > > > > "queued.max.requests" because the number of total requests
> > > queued
> > > > > in
> > > > > > > the
> > > > > > > > > broker will be no longer bounded by "queued.max.requests".
> > This
> > > > > > > probably
> > > > > > > > > needs to be specified in the Public Interfaces section for
> > > > > > discussion.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > > lucasatucla@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Kafka experts,
> > > > > > > > > >
> > > > > > > > > > I created KIP-291 to add a separate queue for controller
> > > > > requests:
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 291%
> > > > > > > > > > 3A+Have+separate+queues+for+control+requests+and+data+
> > > requests
> > > > > > > > > >
> > > > > > > > > > Can you please take a look and let me know your feedback?
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > Regards,
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Eno Thereska <en...@gmail.com>.

Hi Lucas,

Sorry for the delay, just had a look at this. A couple of questions:
- did you notice any positive change after implementing this KIP? I'm
wondering if you have any experimental results that show the benefit of the
two queues.

- priority is usually not sufficient in addressing the problem the KIP
identifies. Even with priority queues, you will sometimes (often?) have the
case that data plane requests will be ahead of the control plane requests.
This happens because the system might have already started processing the
data plane requests before the control plane ones arrived. So it would be
good to know what % of the problem this KIP addresses.

Thanks
Eno

On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <yu...@gmail.com> wrote:

> Change looks good.
>
> Thanks
>
> On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Thanks for the suggestion. I've updated the KIP. Please take another
> look.
> >
> > Lucas
> >
> > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Currently in KafkaConfig.scala :
> > >
> > >   val QueuedMaxRequests = 500
> > >
> > > It would be good if you can include the default value for this new
> config
> > > in the KIP.
> > >
> > > Thanks
> > >
> > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> > >
> > > > Hi Ted, Dong
> > > >
> > > > I've updated the KIP by adding a new config, instead of reusing the
> > > > existing one.
> > > > Please take another look when you have time. Thanks a lot!
> > > >
> > > > Lucas
> > > >
> > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > bq.  that's a waste of resource if control request rate is low
> > > > >
> > > > > I don't know if control request rate can get to 100,000, likely
> not.
> > > Then
> > > > > using the same bound as that for data requests seems high.
> > > > >
> > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > Thanks for taking a look at this KIP.
> > > > > > Let's say today the setting of "queued.max.requests" in cluster A
> > is
> > > > > 1000,
> > > > > > while the setting in cluster B is 100,000.
> > > > > > The 100 times difference might have indicated that machines in
> > > cluster
> > > > B
> > > > > > have larger memory.
> > > > > >
> > > > > > By reusing the "queued.max.requests", the controlRequestQueue in
> > > > cluster
> > > > > B
> > > > > > automatically
> > > > > > gets a 100x capacity without explicitly bothering the operators.
> > > > > > I understand the counter argument can be that maybe that's a
> waste
> > of
> > > > > > resource if control request
> > > > > > rate is low and operators may want to fine tune the capacity of
> the
> > > > > > controlRequestQueue.
> > > > > >
> > > > > > I'm ok with either approach, and can change it if you or anyone
> > else
> > > > > feels
> > > > > > strong about adding the extra config.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Lucas:
> > > > > > > Under Rejected Alternatives, #2, can you elaborate a bit more
> on
> > > why
> > > > > the
> > > > > > > separate config has bigger impact ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <lindong28@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hey Luca,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > > > > > >
> > > > > > > > - We usually specify the full mbean for the new metrics in
> the
> > > KIP.
> > > > > Can
> > > > > > > you
> > > > > > > > specify it in the Public Interface section similar to KIP-237
> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > ?
> > > > > > > >
> > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> metric>,
> > > > > > > > where we keep the existing sensor name "BytesInPerSec" and
> add
> > a
> > > > new
> > > > > > > sensor
> > > > > > > > "ReplicationBytesInPerSec", rather than replacing the sensor
> > > name "
> > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > >
> > > > > > > > - It seems that the KIP changes the semantics of the broker
> > > config
> > > > > > > > "queued.max.requests" because the number of total requests
> > queued
> > > > in
> > > > > > the
> > > > > > > > broker will be no longer bounded by "queued.max.requests".
> This
> > > > > > probably
> > > > > > > > needs to be specified in the Public Interfaces section for
> > > > > discussion.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > lucasatucla@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Kafka experts,
> > > > > > > > >
> > > > > > > > > I created KIP-291 to add a separate queue for controller
> > > > requests:
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > > > > > > 3A+Have+separate+queues+for+control+requests+and+data+
> > requests
> > > > > > > > >
> > > > > > > > > Can you please take a look and let me know your feedback?
> > > > > > > > >
> > > > > > > > > Thanks a lot for your time!
> > > > > > > > > Regards,
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Ted Yu <yu...@gmail.com>.

Change looks good.

Thanks

On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Ted,
>
> Thanks for the suggestion. I've updated the KIP. Please take another look.
>
> Lucas
>
> On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Currently in KafkaConfig.scala :
> >
> >   val QueuedMaxRequests = 500
> >
> > It would be good if you can include the default value for this new config
> > in the KIP.
> >
> > Thanks
> >
> > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <lu...@gmail.com>
> wrote:
> >
> > > Hi Ted, Dong
> > >
> > > I've updated the KIP by adding a new config, instead of reusing the
> > > existing one.
> > > Please take another look when you have time. Thanks a lot!
> > >
> > > Lucas
> > >
> > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq.  that's a waste of resource if control request rate is low
> > > >
> > > > I don't know if control request rate can get to 100,000, likely not.
> > Then
> > > > using the same bound as that for data requests seems high.
> > > >
> > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > Thanks for taking a look at this KIP.
> > > > > Let's say today the setting of "queued.max.requests" in cluster A
> is
> > > > 1000,
> > > > > while the setting in cluster B is 100,000.
> > > > > The 100 times difference might have indicated that machines in
> > cluster
> > > B
> > > > > have larger memory.
> > > > >
> > > > > By reusing the "queued.max.requests", the controlRequestQueue in
> > > cluster
> > > > B
> > > > > automatically
> > > > > gets a 100x capacity without explicitly bothering the operators.
> > > > > I understand the counter argument can be that maybe that's a waste
> of
> > > > > resource if control request
> > > > > rate is low and operators may want to fine tune the capacity of the
> > > > > controlRequestQueue.
> > > > >
> > > > > I'm ok with either approach, and can change it if you or anyone
> else
> > > > feels
> > > > > strong about adding the extra config.
> > > > >
> > > > > Thanks,
> > > > > Lucas
> > > > >
> > > > >
> > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > Lucas:
> > > > > > Under Rejected Alternatives, #2, can you elaborate a bit more on
> > why
> > > > the
> > > > > > separate config has bigger impact ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hey Luca,
> > > > > > >
> > > > > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > > > > >
> > > > > > > - We usually specify the full mbean for the new metrics in the
> > KIP.
> > > > Can
> > > > > > you
> > > > > > > specify it in the Public Interface section similar to KIP-237
> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > ?
> > > > > > >
> > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > > > > > > where we keep the existing sensor name "BytesInPerSec" and add
> a
> > > new
> > > > > > sensor
> > > > > > > "ReplicationBytesInPerSec", rather than replacing the sensor
> > name "
> > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > >
> > > > > > > - It seems that the KIP changes the semantics of the broker
> > config
> > > > > > > "queued.max.requests" because the number of total requests
> queued
> > > in
> > > > > the
> > > > > > > broker will be no longer bounded by "queued.max.requests". This
> > > > > probably
> > > > > > > needs to be specified in the Public Interfaces section for
> > > > discussion.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dong
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > lucasatucla@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kafka experts,
> > > > > > > >
> > > > > > > > I created KIP-291 to add a separate queue for controller
> > > requests:
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > > > > > 3A+Have+separate+queues+for+control+requests+and+data+
> requests
> > > > > > > >
> > > > > > > > Can you please take a look and let me know your feedback?
> > > > > > > >
> > > > > > > > Thanks a lot for your time!
> > > > > > > > Regards,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Ted,

Thanks for the suggestion. I've updated the KIP. Please take another look.

Lucas

On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <yu...@gmail.com> wrote:

> Currently in KafkaConfig.scala :
>
>   val QueuedMaxRequests = 500
>
> It would be good if you can include the default value for this new config
> in the KIP.
>
> Thanks
>
> On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <lu...@gmail.com> wrote:
>
> > Hi Ted, Dong
> >
> > I've updated the KIP by adding a new config, instead of reusing the
> > existing one.
> > Please take another look when you have time. Thanks a lot!
> >
> > Lucas
> >
> > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq.  that's a waste of resource if control request rate is low
> > >
> > > I don't know if control request rate can get to 100,000, likely not.
> Then
> > > using the same bound as that for data requests seems high.
> > >
> > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > Thanks for taking a look at this KIP.
> > > > Let's say today the setting of "queued.max.requests" in cluster A is
> > > 1000,
> > > > while the setting in cluster B is 100,000.
> > > > The 100 times difference might have indicated that machines in
> cluster
> > B
> > > > have larger memory.
> > > >
> > > > By reusing the "queued.max.requests", the controlRequestQueue in
> > cluster
> > > B
> > > > automatically
> > > > gets a 100x capacity without explicitly bothering the operators.
> > > > I understand the counter argument can be that maybe that's a waste of
> > > > resource if control request
> > > > rate is low and operators may want to fine tune the capacity of the
> > > > controlRequestQueue.
> > > >
> > > > I'm ok with either approach, and can change it if you or anyone else
> > > feels
> > > > strong about adding the extra config.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > >
> > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Lucas:
> > > > > Under Rejected Alternatives, #2, can you elaborate a bit more on
> why
> > > the
> > > > > separate config has bigger impact ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hey Luca,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > > > >
> > > > > > - We usually specify the full mbean for the new metrics in the
> KIP.
> > > Can
> > > > > you
> > > > > > specify it in the Public Interface section similar to KIP-237
> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > ?
> > > > > >
> > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > > > > > where we keep the existing sensor name "BytesInPerSec" and add a
> > new
> > > > > sensor
> > > > > > "ReplicationBytesInPerSec", rather than replacing the sensor
> name "
> > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > >
> > > > > > - It seems that the KIP changes the semantics of the broker
> config
> > > > > > "queued.max.requests" because the number of total requests queued
> > in
> > > > the
> > > > > > broker will be no longer bounded by "queued.max.requests". This
> > > > probably
> > > > > > needs to be specified in the Public Interfaces section for
> > > discussion.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Dong
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > lucasatucla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Kafka experts,
> > > > > > >
> > > > > > > I created KIP-291 to add a separate queue for controller
> > requests:
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > > > > 3A+Have+separate+queues+for+control+requests+and+data+requests
> > > > > > >
> > > > > > > Can you please take a look and let me know your feedback?
> > > > > > >
> > > > > > > Thanks a lot for your time!
> > > > > > > Regards,
> > > > > > > Lucas
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Ted Yu <yu...@gmail.com>.

Currently in KafkaConfig.scala :

  val QueuedMaxRequests = 500

It would be good if you can include the default value for this new config
in the KIP.

Thanks

On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Ted, Dong
>
> I've updated the KIP by adding a new config, instead of reusing the
> existing one.
> Please take another look when you have time. Thanks a lot!
>
> Lucas
>
> On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq.  that's a waste of resource if control request rate is low
> >
> > I don't know if control request rate can get to 100,000, likely not. Then
> > using the same bound as that for data requests seems high.
> >
> > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > > Hi Ted,
> > >
> > > Thanks for taking a look at this KIP.
> > > Let's say today the setting of "queued.max.requests" in cluster A is
> > 1000,
> > > while the setting in cluster B is 100,000.
> > > The 100 times difference might have indicated that machines in cluster
> B
> > > have larger memory.
> > >
> > > By reusing the "queued.max.requests", the controlRequestQueue in
> cluster
> > B
> > > automatically
> > > gets a 100x capacity without explicitly bothering the operators.
> > > I understand the counter argument can be that maybe that's a waste of
> > > resource if control request
> > > rate is low and operators may want to fine tune the capacity of the
> > > controlRequestQueue.
> > >
> > > I'm ok with either approach, and can change it if you or anyone else
> > feels
> > > strong about adding the extra config.
> > >
> > > Thanks,
> > > Lucas
> > >
> > >
> > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Lucas:
> > > > Under Rejected Alternatives, #2, can you elaborate a bit more on why
> > the
> > > > separate config has bigger impact ?
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com>
> wrote:
> > > >
> > > > > Hey Luca,
> > > > >
> > > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > > >
> > > > > - We usually specify the full mbean for the new metrics in the KIP.
> > Can
> > > > you
> > > > > specify it in the Public Interface section similar to KIP-237
> > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > ?
> > > > >
> > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > > > > where we keep the existing sensor name "BytesInPerSec" and add a
> new
> > > > sensor
> > > > > "ReplicationBytesInPerSec", rather than replacing the sensor name "
> > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > >
> > > > > - It seems that the KIP changes the semantics of the broker config
> > > > > "queued.max.requests" because the number of total requests queued
> in
> > > the
> > > > > broker will be no longer bounded by "queued.max.requests". This
> > > probably
> > > > > needs to be specified in the Public Interfaces section for
> > discussion.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Dong
> > > > >
> > > > >
> > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> lucasatucla@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Kafka experts,
> > > > > >
> > > > > > I created KIP-291 to add a separate queue for controller
> requests:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > > > 3A+Have+separate+queues+for+control+requests+and+data+requests
> > > > > >
> > > > > > Can you please take a look and let me know your feedback?
> > > > > >
> > > > > > Thanks a lot for your time!
> > > > > > Regards,
> > > > > > Lucas
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Ted, Dong

I've updated the KIP by adding a new config, instead of reusing the
existing one.
Please take another look when you have time. Thanks a lot!

Lucas

On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <yu...@gmail.com> wrote:

> bq.  that's a waste of resource if control request rate is low
>
> I don't know if control request rate can get to 100,000, likely not. Then
> using the same bound as that for data requests seems high.
>
> On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> > Hi Ted,
> >
> > Thanks for taking a look at this KIP.
> > Let's say today the setting of "queued.max.requests" in cluster A is
> 1000,
> > while the setting in cluster B is 100,000.
> > The 100 times difference might have indicated that machines in cluster B
> > have larger memory.
> >
> > By reusing the "queued.max.requests", the controlRequestQueue in cluster
> B
> > automatically
> > gets a 100x capacity without explicitly bothering the operators.
> > I understand the counter argument can be that maybe that's a waste of
> > resource if control request
> > rate is low and operators may want to fine tune the capacity of the
> > controlRequestQueue.
> >
> > I'm ok with either approach, and can change it if you or anyone else
> feels
> > strong about adding the extra config.
> >
> > Thanks,
> > Lucas
> >
> >
> > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Lucas:
> > > Under Rejected Alternatives, #2, can you elaborate a bit more on why
> the
> > > separate config has bigger impact ?
> > >
> > > Thanks
> > >
> > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hey Luca,
> > > >
> > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > >
> > > > - We usually specify the full mbean for the new metrics in the KIP.
> Can
> > > you
> > > > specify it in the Public Interface section similar to KIP-237
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 237%3A+More+Controller+Health+Metrics>
> > > > ?
> > > >
> > > > - Maybe we could follow the same pattern as KIP-153
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > > > where we keep the existing sensor name "BytesInPerSec" and add a new
> > > sensor
> > > > "ReplicationBytesInPerSec", rather than replacing the sensor name "
> > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > >
> > > > - It seems that the KIP changes the semantics of the broker config
> > > > "queued.max.requests" because the number of total requests queued in
> > the
> > > > broker will be no longer bounded by "queued.max.requests". This
> > probably
> > > > needs to be specified in the Public Interfaces section for
> discussion.
> > > >
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > > >
> > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Kafka experts,
> > > > >
> > > > > I created KIP-291 to add a separate queue for controller requests:
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > > 3A+Have+separate+queues+for+control+requests+and+data+requests
> > > > >
> > > > > Can you please take a look and let me know your feedback?
> > > > >
> > > > > Thanks a lot for your time!
> > > > > Regards,
> > > > > Lucas
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Ted Yu <yu...@gmail.com>.

bq.  that's a waste of resource if control request rate is low

I don't know if control request rate can get to 100,000, likely not. Then
using the same bound as that for data requests seems high.

On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Ted,
>
> Thanks for taking a look at this KIP.
> Let's say today the setting of "queued.max.requests" in cluster A is 1000,
> while the setting in cluster B is 100,000.
> The 100 times difference might have indicated that machines in cluster B
> have larger memory.
>
> By reusing the "queued.max.requests", the controlRequestQueue in cluster B
> automatically
> gets a 100x capacity without explicitly bothering the operators.
> I understand the counter argument can be that maybe that's a waste of
> resource if control request
> rate is low and operators may want to fine tune the capacity of the
> controlRequestQueue.
>
> I'm ok with either approach, and can change it if you or anyone else feels
> strong about adding the extra config.
>
> Thanks,
> Lucas
>
>
> On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Lucas:
> > Under Rejected Alternatives, #2, can you elaborate a bit more on why the
> > separate config has bigger impact ?
> >
> > Thanks
> >
> > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com> wrote:
> >
> > > Hey Luca,
> > >
> > > Thanks for the KIP. Looks good overall. Some comments below:
> > >
> > > - We usually specify the full mbean for the new metrics in the KIP. Can
> > you
> > > specify it in the Public Interface section similar to KIP-237
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 237%3A+More+Controller+Health+Metrics>
> > > ?
> > >
> > > - Maybe we could follow the same pattern as KIP-153
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > > where we keep the existing sensor name "BytesInPerSec" and add a new
> > sensor
> > > "ReplicationBytesInPerSec", rather than replacing the sensor name "
> > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > >
> > > - It seems that the KIP changes the semantics of the broker config
> > > "queued.max.requests" because the number of total requests queued in
> the
> > > broker will be no longer bounded by "queued.max.requests". This
> probably
> > > needs to be specified in the Public Interfaces section for discussion.
> > >
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com>
> > > wrote:
> > >
> > > > Hi Kafka experts,
> > > >
> > > > I created KIP-291 to add a separate queue for controller requests:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > > 3A+Have+separate+queues+for+control+requests+and+data+requests
> > > >
> > > > Can you please take a look and let me know your feedback?
> > > >
> > > > Thanks a lot for your time!
> > > > Regards,
> > > > Lucas
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Lucas Wang <lu...@gmail.com>.

Hi Ted,

Thanks for taking a look at this KIP.
Let's say today the setting of "queued.max.requests" in cluster A is 1000,
while the setting in cluster B is 100,000.
The 100 times difference might have indicated that machines in cluster B
have larger memory.

By reusing the "queued.max.requests", the controlRequestQueue in cluster B
automatically
gets a 100x capacity without explicitly bothering the operators.
I understand the counter argument can be that maybe that's a waste of
resource if control request
rate is low and operators may want to fine tune the capacity of the
controlRequestQueue.

I'm ok with either approach, and can change it if you or anyone else feels
strong about adding the extra config.

Thanks,
Lucas

On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <yu...@gmail.com> wrote:

> Lucas:
> Under Rejected Alternatives, #2, can you elaborate a bit more on why the
> separate config has bigger impact ?
>
> Thanks
>
> On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com> wrote:
>
> > Hey Luca,
> >
> > Thanks for the KIP. Looks good overall. Some comments below:
> >
> > - We usually specify the full mbean for the new metrics in the KIP. Can
> you
> > specify it in the Public Interface section similar to KIP-237
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 237%3A+More+Controller+Health+Metrics>
> > ?
> >
> > - Maybe we could follow the same pattern as KIP-153
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> > where we keep the existing sensor name "BytesInPerSec" and add a new
> sensor
> > "ReplicationBytesInPerSec", rather than replacing the sensor name "
> > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> >
> > - It seems that the KIP changes the semantics of the broker config
> > "queued.max.requests" because the number of total requests queued in the
> > broker will be no longer bounded by "queued.max.requests". This probably
> > needs to be specified in the Public Interfaces section for discussion.
> >
> >
> > Thanks,
> > Dong
> >
> >
> > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com>
> > wrote:
> >
> > > Hi Kafka experts,
> > >
> > > I created KIP-291 to add a separate queue for controller requests:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > > 3A+Have+separate+queues+for+control+requests+and+data+requests
> > >
> > > Can you please take a look and let me know your feedback?
> > >
> > > Thanks a lot for your time!
> > > Regards,
> > > Lucas
> > >
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Ted Yu <yu...@gmail.com>.

Lucas:
Under Rejected Alternatives, #2, can you elaborate a bit more on why the
separate config has bigger impact ?

Thanks

On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <li...@gmail.com> wrote:

> Hey Luca,
>
> Thanks for the KIP. Looks good overall. Some comments below:
>
> - We usually specify the full mbean for the new metrics in the KIP. Can you
> specify it in the Public Interface section similar to KIP-237
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 237%3A+More+Controller+Health+Metrics>
> ?
>
> - Maybe we could follow the same pattern as KIP-153
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
> where we keep the existing sensor name "BytesInPerSec" and add a new sensor
> "ReplicationBytesInPerSec", rather than replacing the sensor name "
> BytesInPerSec" with e.g. "ClientBytesInPerSec".
>
> - It seems that the KIP changes the semantics of the broker config
> "queued.max.requests" because the number of total requests queued in the
> broker will be no longer bounded by "queued.max.requests". This probably
> needs to be specified in the Public Interfaces section for discussion.
>
>
> Thanks,
> Dong
>
>
> On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com>
> wrote:
>
> > Hi Kafka experts,
> >
> > I created KIP-291 to add a separate queue for controller requests:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> > 3A+Have+separate+queues+for+control+requests+and+data+requests
> >
> > Can you please take a look and let me know your feedback?
> >
> > Thanks a lot for your time!
> > Regards,
> > Lucas
> >
>

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

Posted by Dong Lin <li...@gmail.com>.

Hey Luca,

Thanks for the KIP. Looks good overall. Some comments below:

- We usually specify the full mbean for the new metrics in the KIP. Can you
specify it in the Public Interface section similar to KIP-237
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics>
?

- Maybe we could follow the same pattern as KIP-153
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-153%3A+Include+only+client+traffic+in+BytesOutPerSec+metric>,
where we keep the existing sensor name "BytesInPerSec" and add a new sensor
"ReplicationBytesInPerSec", rather than replacing the sensor name "
BytesInPerSec" with e.g. "ClientBytesInPerSec".

- It seems that the KIP changes the semantics of the broker config
"queued.max.requests" because the number of total requests queued in the
broker will be no longer bounded by "queued.max.requests". This probably
needs to be specified in the Public Interfaces section for discussion.


Thanks,
Dong


On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <lu...@gmail.com> wrote:

> Hi Kafka experts,
>
> I created KIP-291 to add a separate queue for controller requests:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%
> 3A+Have+separate+queues+for+control+requests+and+data+requests
>
> Can you please take a look and let me know your feedback?
>
> Thanks a lot for your time!
> Regards,
> Lucas
>