You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2018/01/09 02:30:52 UTC

Re: [VOTE] KIP-219 - Improve Quota Communication

Hi, Jiangjie,

Sorry for the late response. The proposal sounds good overall. A couple of
minor comments below.

1. For throttling a fetch request, we could potentially just send an empty
response. We can return a throttle time calculated from a full response,
but only mute the channel on the server based on a throttle time calculated
based on the empty response. This has the benefit that the server will mute
the channel much shorter, which will prevent the consumer from rebalancing
when throttled.

2. The wiki says "connections.max.idle.ms should be ignored during the
throttle time X." This has the potential issue that a server may not detect
that a client connection is already gone until after an arbitrary amount of
time. Perhaps we could still just close a connection if the server has
muted it for longer than connections.max.idle.ms. This will at least bound
the time for a server to detect closed client connections.

Thanks,

Jun


On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com> wrote:

> Hi,
>
> We would like to start the voting thread for KIP-219. The KIP proposes to
> improve the quota communication between the brokers and clients, especially
> for cases of long throttling time.
>
> The KIP wiki is following:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-219+-+Improve+quota+
> communication
>
> The discussion thread is here:
> http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
>
> Thanks,
>
> Jiangjie (Becket) Qin
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Actually returning an empty fetch request may still be useful to reduce the
throttle time due to request quota violation because the FetchResponse send
time will be less. I just updated the KIP.

Rajini, does that address your concern?

On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <be...@gmail.com> wrote:

> Thanks for the reply, Jun.
>
> Currently the byte rate quota does not apply to HeartbeatRequest,
> JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> throttled is because the request quota is violated. In that case, the
> throttle time does not really matter whether we return a full FetchResponse
> or an empty one. Would it be more consistent if we throttle based on the
> actual throttle time / channel mute time?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Jiangjie,
>>
>> You are right that the heartbeat is done in a channel different from the
>> fetch request. I think it's still useful to return an empty fetch response
>> when the quota is violated. This way, the throttle time for the heartbeat
>> request won't be large. I agree that we can just mute the channel for the
>> fetch request for the throttle time computed based on a full fetch
>> response. This probably also partially addresses Rajini's #1 concern.
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com> wrote:
>>
>> > Hi Rajini,
>> >
>> > Thanks for the comments. Pleas see the reply inline.
>> >
>> > Hi Jun,
>> >
>> > Thinking about the consumer rebalance case a bit more, I am not sure if
>> we
>> > need to worry about the delayed rebalance due to quota violation or not.
>> > The rebalance actually uses a separate channel to coordinator. Therefore
>> > unless the request quota is hit, the rebalance won't be throttled. Even
>> if
>> > request quota is hit, it seems unlikely to be delayed long. So it looks
>> > that we don't need to unmute the channel earlier than needed. Does this
>> > address your concern?
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> rajinisivaram@gmail.com>
>> > wrote:
>> >
>> > > Hi Becket,
>> > >
>> > > A few questions:
>> > >
>> > > 1. KIP says: *Although older client implementations (prior to
>> knowledge
>> > of
>> > > this KIP) will immediately send the next request after the broker
>> > responds
>> > > without paying attention to the throttle time field, the broker is
>> > > protected by virtue of muting the channel for time X. i.e., the next
>> > > request will not be processed until the channel is unmuted. *
>> > > For fetch requests, the response is sent immediately and the mute
>> time on
>> > > the broker based on empty fetch response will often be zero (unlike
>> the
>> > > throttle time returned to the client based on non-empty response).
>> Won't
>> > > that lead to a tight loop of fetch requests from old clients
>> > (particularly
>> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
>> > for
>> > > old clients? Also, if we change the behaviour for old clients, client
>> > > metrics that track throttle time will report a lot of throttle
>> unrelated
>> > to
>> > >  actual throttle time.
>> > >
>> > For consumers, if quota is violated, the throttle time on the broker
>> will
>> > not be 0. It is just that the throttle time will not be increasing
>> because
>> > the consumer will return an empty response in this case. So there should
>> > not be a tight loop.
>> >
>> >
>> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
>> > > <http://connections.max.idle.ms> will still be honored during the
>> > throttle
>> > > time X. This makes sure that the brokers will detect client connection
>> > > closure in a bounded time.*
>> > > Wouldn't it be better to bound maximum throttle time to
>> > > *connections.max.idle.ms
>> > > <http://connections.max.idle.ms>*? If we mute for a time greater than
>> > > *connections.max.idle.ms
>> > > <http://connections.max.idle.ms>* and then close a client connection
>> > > simply
>> > > because we had muted it on the broker for a longer throttle time, we
>> > force
>> > > a reconnection and read the next request from the new connection
>> straight
>> > > away. This feels the same as a bound on the quota of *
>> > > connections.max.idle.ms
>> > > <http://connections.max.idle.ms>*, but with added load on the broker
>> for
>> > > authenticating another connection (expensive with SSL).
>> > >
>> > I think we need to think about the consumer prior to and after this KIP
>> > separately.
>> >
>> > For consumer prior to this KIP, even if we unmute the channel after
>> > connection.max.idle.ms, it is likely that the consumers have already
>> > reached request.timeout.ms before that and has reconnected to the
>> broker.
>> > So there is no real difference whether we close the throttled channel or
>> > not.
>> >
>> > For consumers after the KIP, because they will honor the throttle time,
>> > they will back off until throttle time is reached. If that throttle
>> time is
>> > longer than connection.max.idle.ms, it seems not a big overhead because
>> > there will only be one connection re-establishment in quite a few
>> minutes.
>> > Compared with such overhead, it seems more important to honor the quota
>> so
>> > the broker can survive.
>> >
>> >
>> > > 3. Are we changing the behaviour of network bandwidth quota for
>> > > Produce/Fetch and retaining current behaviour for request quotas?
>> > >
>> > This is going to be applied to all the quota. Including request quotas.
>> >
>> >
>> > >
>> > > Thanks,
>> > >
>> > > Rajini
>> > >
>> > >
>> > >
>> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
>> > >
>> > > > Hi, Jiangjie,
>> > > >
>> > > > Thanks for the updated KIP. +1
>> > > >
>> > > > Jun
>> > > >
>> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Thanks for the comments, Jun.
>> > > > >
>> > > > > 1. Good point.
>> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
>> > enough
>> > > > so
>> > > > > the throttling is impacted.
>> > > > >
>> > > > > I have updated the KIP to reflect the changes.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
>> > > > >
>> > > > > > Hi, Jiangjie,
>> > > > > >
>> > > > > > Sorry for the late response. The proposal sounds good overall. A
>> > > couple
>> > > > > of
>> > > > > > minor comments below.
>> > > > > >
>> > > > > > 1. For throttling a fetch request, we could potentially just
>> send
>> > an
>> > > > > empty
>> > > > > > response. We can return a throttle time calculated from a full
>> > > > response,
>> > > > > > but only mute the channel on the server based on a throttle time
>> > > > > calculated
>> > > > > > based on the empty response. This has the benefit that the
>> server
>> > > will
>> > > > > mute
>> > > > > > the channel much shorter, which will prevent the consumer from
>> > > > > rebalancing
>> > > > > > when throttled.
>> > > > > >
>> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
>> during
>> > > the
>> > > > > > throttle time X." This has the potential issue that a server may
>> > not
>> > > > > detect
>> > > > > > that a client connection is already gone until after an
>> arbitrary
>> > > > amount
>> > > > > of
>> > > > > > time. Perhaps we could still just close a connection if the
>> server
>> > > has
>> > > > > > muted it for longer than connections.max.idle.ms. This will at
>> > least
>> > > > > bound
>> > > > > > the time for a server to detect closed client connections.
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
>> becket.qin@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > We would like to start the voting thread for KIP-219. The KIP
>> > > > proposes
>> > > > > to
>> > > > > > > improve the quota communication between the brokers and
>> clients,
>> > > > > > especially
>> > > > > > > for cases of long throttling time.
>> > > > > > >
>> > > > > > > The KIP wiki is following:
>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 219+-+Improve+quota+
>> > > > > > > communication
>> > > > > > >
>> > > > > > > The discussion thread is here:
>> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
>> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Jiangjie (Becket) Qin
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Dong Lin <li...@gmail.com>.
+Jon

On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <be...@gmail.com> wrote:

> Thanks for the discussion and voting.
>
> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>
> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <ra...@gmail.com>
> wrote:
>
> > Hi Becket,
> >
> > Thanks for the update. Yes, it does address my concern.
> >
> > +1 (binding)
> >
> > Regards,
> >
> > Rajini
> >
> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Actually returning an empty fetch request may still be useful to reduce
> > the
> > > throttle time due to request quota violation because the FetchResponse
> > send
> > > time will be less. I just updated the KIP.
> > >
> > > Rajini, does that address your concern?
> > >
> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the reply, Jun.
> > > >
> > > > Currently the byte rate quota does not apply to HeartbeatRequest,
> > > > JoinGroupRequest/SyncGroupRequest. So the only case those requests
> are
> > > > throttled is because the request quota is violated. In that case, the
> > > > throttle time does not really matter whether we return a full
> > > FetchResponse
> > > > or an empty one. Would it be more consistent if we throttle based on
> > the
> > > > actual throttle time / channel mute time?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > >> Hi, Jiangjie,
> > > >>
> > > >> You are right that the heartbeat is done in a channel different from
> > the
> > > >> fetch request. I think it's still useful to return an empty fetch
> > > response
> > > >> when the quota is violated. This way, the throttle time for the
> > > heartbeat
> > > >> request won't be large. I agree that we can just mute the channel
> for
> > > the
> > > >> fetch request for the throttle time computed based on a full fetch
> > > >> response. This probably also partially addresses Rajini's #1
> concern.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi Rajini,
> > > >> >
> > > >> > Thanks for the comments. Pleas see the reply inline.
> > > >> >
> > > >> > Hi Jun,
> > > >> >
> > > >> > Thinking about the consumer rebalance case a bit more, I am not
> sure
> > > if
> > > >> we
> > > >> > need to worry about the delayed rebalance due to quota violation
> or
> > > not.
> > > >> > The rebalance actually uses a separate channel to coordinator.
> > > Therefore
> > > >> > unless the request quota is hit, the rebalance won't be throttled.
> > > Even
> > > >> if
> > > >> > request quota is hit, it seems unlikely to be delayed long. So it
> > > looks
> > > >> > that we don't need to unmute the channel earlier than needed. Does
> > > this
> > > >> > address your concern?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > > >> rajinisivaram@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Becket,
> > > >> > >
> > > >> > > A few questions:
> > > >> > >
> > > >> > > 1. KIP says: *Although older client implementations (prior to
> > > >> knowledge
> > > >> > of
> > > >> > > this KIP) will immediately send the next request after the
> broker
> > > >> > responds
> > > >> > > without paying attention to the throttle time field, the broker
> is
> > > >> > > protected by virtue of muting the channel for time X. i.e., the
> > next
> > > >> > > request will not be processed until the channel is unmuted. *
> > > >> > > For fetch requests, the response is sent immediately and the
> mute
> > > >> time on
> > > >> > > the broker based on empty fetch response will often be zero
> > (unlike
> > > >> the
> > > >> > > throttle time returned to the client based on non-empty
> response).
> > > >> Won't
> > > >> > > that lead to a tight loop of fetch requests from old clients
> > > >> > (particularly
> > > >> > > expensive with SSL)? Wouldn't it be better to retain current
> > > behaviour
> > > >> > for
> > > >> > > old clients? Also, if we change the behaviour for old clients,
> > > client
> > > >> > > metrics that track throttle time will report a lot of throttle
> > > >> unrelated
> > > >> > to
> > > >> > >  actual throttle time.
> > > >> > >
> > > >> > For consumers, if quota is violated, the throttle time on the
> broker
> > > >> will
> > > >> > not be 0. It is just that the throttle time will not be increasing
> > > >> because
> > > >> > the consumer will return an empty response in this case. So there
> > > should
> > > >> > not be a tight loop.
> > > >> >
> > > >> >
> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
> > connections.max.idle.ms
> > > >> > > <http://connections.max.idle.ms> will still be honored during
> the
> > > >> > throttle
> > > >> > > time X. This makes sure that the brokers will detect client
> > > connection
> > > >> > > closure in a bounded time.*
> > > >> > > Wouldn't it be better to bound maximum throttle time to
> > > >> > > *connections.max.idle.ms
> > > >> > > <http://connections.max.idle.ms>*? If we mute for a time
> greater
> > > than
> > > >> > > *connections.max.idle.ms
> > > >> > > <http://connections.max.idle.ms>* and then close a client
> > > connection
> > > >> > > simply
> > > >> > > because we had muted it on the broker for a longer throttle
> time,
> > we
> > > >> > force
> > > >> > > a reconnection and read the next request from the new connection
> > > >> straight
> > > >> > > away. This feels the same as a bound on the quota of *
> > > >> > > connections.max.idle.ms
> > > >> > > <http://connections.max.idle.ms>*, but with added load on the
> > > broker
> > > >> for
> > > >> > > authenticating another connection (expensive with SSL).
> > > >> > >
> > > >> > I think we need to think about the consumer prior to and after
> this
> > > KIP
> > > >> > separately.
> > > >> >
> > > >> > For consumer prior to this KIP, even if we unmute the channel
> after
> > > >> > connection.max.idle.ms, it is likely that the consumers have
> > already
> > > >> > reached request.timeout.ms before that and has reconnected to the
> > > >> broker.
> > > >> > So there is no real difference whether we close the throttled
> > channel
> > > or
> > > >> > not.
> > > >> >
> > > >> > For consumers after the KIP, because they will honor the throttle
> > > time,
> > > >> > they will back off until throttle time is reached. If that
> throttle
> > > >> time is
> > > >> > longer than connection.max.idle.ms, it seems not a big overhead
> > > because
> > > >> > there will only be one connection re-establishment in quite a few
> > > >> minutes.
> > > >> > Compared with such overhead, it seems more important to honor the
> > > quota
> > > >> so
> > > >> > the broker can survive.
> > > >> >
> > > >> >
> > > >> > > 3. Are we changing the behaviour of network bandwidth quota for
> > > >> > > Produce/Fetch and retaining current behaviour for request
> quotas?
> > > >> > >
> > > >> > This is going to be applied to all the quota. Including request
> > > quotas.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Rajini
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > >> > >
> > > >> > > > Hi, Jiangjie,
> > > >> > > >
> > > >> > > > Thanks for the updated KIP. +1
> > > >> > > >
> > > >> > > > Jun
> > > >> > > >
> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
> > becket.qin@gmail.com>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Thanks for the comments, Jun.
> > > >> > > > >
> > > >> > > > > 1. Good point.
> > > >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is
> > high
> > > >> > enough
> > > >> > > > so
> > > >> > > > > the throttling is impacted.
> > > >> > > > >
> > > >> > > > > I have updated the KIP to reflect the changes.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > >
> > > >> > > > > Jiangjie (Becket) Qin
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io>
> > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi, Jiangjie,
> > > >> > > > > >
> > > >> > > > > > Sorry for the late response. The proposal sounds good
> > > overall. A
> > > >> > > couple
> > > >> > > > > of
> > > >> > > > > > minor comments below.
> > > >> > > > > >
> > > >> > > > > > 1. For throttling a fetch request, we could potentially
> just
> > > >> send
> > > >> > an
> > > >> > > > > empty
> > > >> > > > > > response. We can return a throttle time calculated from a
> > full
> > > >> > > > response,
> > > >> > > > > > but only mute the channel on the server based on a
> throttle
> > > time
> > > >> > > > > calculated
> > > >> > > > > > based on the empty response. This has the benefit that the
> > > >> server
> > > >> > > will
> > > >> > > > > mute
> > > >> > > > > > the channel much shorter, which will prevent the consumer
> > from
> > > >> > > > > rebalancing
> > > >> > > > > > when throttled.
> > > >> > > > > >
> > > >> > > > > > 2. The wiki says "connections.max.idle.ms should be
> ignored
> > > >> during
> > > >> > > the
> > > >> > > > > > throttle time X." This has the potential issue that a
> server
> > > may
> > > >> > not
> > > >> > > > > detect
> > > >> > > > > > that a client connection is already gone until after an
> > > >> arbitrary
> > > >> > > > amount
> > > >> > > > > of
> > > >> > > > > > time. Perhaps we could still just close a connection if
> the
> > > >> server
> > > >> > > has
> > > >> > > > > > muted it for longer than connections.max.idle.ms. This
> will
> > > at
> > > >> > least
> > > >> > > > > bound
> > > >> > > > > > the time for a server to detect closed client connections.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > >
> > > >> > > > > > Jun
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> > > >> becket.qin@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi,
> > > >> > > > > > >
> > > >> > > > > > > We would like to start the voting thread for KIP-219.
> The
> > > KIP
> > > >> > > > proposes
> > > >> > > > > to
> > > >> > > > > > > improve the quota communication between the brokers and
> > > >> clients,
> > > >> > > > > > especially
> > > >> > > > > > > for cases of long throttling time.
> > > >> > > > > > >
> > > >> > > > > > > The KIP wiki is following:
> > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> > > > > > 219+-+Improve+quota+
> > > >> > > > > > > communication
> > > >> > > > > > >
> > > >> > > > > > > The discussion thread is here:
> > > >> > > > > > >
> http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Jiangjie (Becket) Qin
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Thanks for the discussion and voting.

KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).

On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <ra...@gmail.com>
wrote:

> Hi Becket,
>
> Thanks for the update. Yes, it does address my concern.
>
> +1 (binding)
>
> Regards,
>
> Rajini
>
> On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Actually returning an empty fetch request may still be useful to reduce
> the
> > throttle time due to request quota violation because the FetchResponse
> send
> > time will be less. I just updated the KIP.
> >
> > Rajini, does that address your concern?
> >
> > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Thanks for the reply, Jun.
> > >
> > > Currently the byte rate quota does not apply to HeartbeatRequest,
> > > JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> > > throttled is because the request quota is violated. In that case, the
> > > throttle time does not really matter whether we return a full
> > FetchResponse
> > > or an empty one. Would it be more consistent if we throttle based on
> the
> > > actual throttle time / channel mute time?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Hi, Jiangjie,
> > >>
> > >> You are right that the heartbeat is done in a channel different from
> the
> > >> fetch request. I think it's still useful to return an empty fetch
> > response
> > >> when the quota is violated. This way, the throttle time for the
> > heartbeat
> > >> request won't be large. I agree that we can just mute the channel for
> > the
> > >> fetch request for the throttle time computed based on a full fetch
> > >> response. This probably also partially addresses Rajini's #1 concern.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >>
> > >> > Hi Rajini,
> > >> >
> > >> > Thanks for the comments. Pleas see the reply inline.
> > >> >
> > >> > Hi Jun,
> > >> >
> > >> > Thinking about the consumer rebalance case a bit more, I am not sure
> > if
> > >> we
> > >> > need to worry about the delayed rebalance due to quota violation or
> > not.
> > >> > The rebalance actually uses a separate channel to coordinator.
> > Therefore
> > >> > unless the request quota is hit, the rebalance won't be throttled.
> > Even
> > >> if
> > >> > request quota is hit, it seems unlikely to be delayed long. So it
> > looks
> > >> > that we don't need to unmute the channel earlier than needed. Does
> > this
> > >> > address your concern?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > >> rajinisivaram@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Becket,
> > >> > >
> > >> > > A few questions:
> > >> > >
> > >> > > 1. KIP says: *Although older client implementations (prior to
> > >> knowledge
> > >> > of
> > >> > > this KIP) will immediately send the next request after the broker
> > >> > responds
> > >> > > without paying attention to the throttle time field, the broker is
> > >> > > protected by virtue of muting the channel for time X. i.e., the
> next
> > >> > > request will not be processed until the channel is unmuted. *
> > >> > > For fetch requests, the response is sent immediately and the mute
> > >> time on
> > >> > > the broker based on empty fetch response will often be zero
> (unlike
> > >> the
> > >> > > throttle time returned to the client based on non-empty response).
> > >> Won't
> > >> > > that lead to a tight loop of fetch requests from old clients
> > >> > (particularly
> > >> > > expensive with SSL)? Wouldn't it be better to retain current
> > behaviour
> > >> > for
> > >> > > old clients? Also, if we change the behaviour for old clients,
> > client
> > >> > > metrics that track throttle time will report a lot of throttle
> > >> unrelated
> > >> > to
> > >> > >  actual throttle time.
> > >> > >
> > >> > For consumers, if quota is violated, the throttle time on the broker
> > >> will
> > >> > not be 0. It is just that the throttle time will not be increasing
> > >> because
> > >> > the consumer will return an empty response in this case. So there
> > should
> > >> > not be a tight loop.
> > >> >
> > >> >
> > >> > > 2. KIP says: *The usual idle timeout i.e.,
> connections.max.idle.ms
> > >> > > <http://connections.max.idle.ms> will still be honored during the
> > >> > throttle
> > >> > > time X. This makes sure that the brokers will detect client
> > connection
> > >> > > closure in a bounded time.*
> > >> > > Wouldn't it be better to bound maximum throttle time to
> > >> > > *connections.max.idle.ms
> > >> > > <http://connections.max.idle.ms>*? If we mute for a time greater
> > than
> > >> > > *connections.max.idle.ms
> > >> > > <http://connections.max.idle.ms>* and then close a client
> > connection
> > >> > > simply
> > >> > > because we had muted it on the broker for a longer throttle time,
> we
> > >> > force
> > >> > > a reconnection and read the next request from the new connection
> > >> straight
> > >> > > away. This feels the same as a bound on the quota of *
> > >> > > connections.max.idle.ms
> > >> > > <http://connections.max.idle.ms>*, but with added load on the
> > broker
> > >> for
> > >> > > authenticating another connection (expensive with SSL).
> > >> > >
> > >> > I think we need to think about the consumer prior to and after this
> > KIP
> > >> > separately.
> > >> >
> > >> > For consumer prior to this KIP, even if we unmute the channel after
> > >> > connection.max.idle.ms, it is likely that the consumers have
> already
> > >> > reached request.timeout.ms before that and has reconnected to the
> > >> broker.
> > >> > So there is no real difference whether we close the throttled
> channel
> > or
> > >> > not.
> > >> >
> > >> > For consumers after the KIP, because they will honor the throttle
> > time,
> > >> > they will back off until throttle time is reached. If that throttle
> > >> time is
> > >> > longer than connection.max.idle.ms, it seems not a big overhead
> > because
> > >> > there will only be one connection re-establishment in quite a few
> > >> minutes.
> > >> > Compared with such overhead, it seems more important to honor the
> > quota
> > >> so
> > >> > the broker can survive.
> > >> >
> > >> >
> > >> > > 3. Are we changing the behaviour of network bandwidth quota for
> > >> > > Produce/Fetch and retaining current behaviour for request quotas?
> > >> > >
> > >> > This is going to be applied to all the quota. Including request
> > quotas.
> > >> >
> > >> >
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Rajini
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io>
> wrote:
> > >> > >
> > >> > > > Hi, Jiangjie,
> > >> > > >
> > >> > > > Thanks for the updated KIP. +1
> > >> > > >
> > >> > > > Jun
> > >> > > >
> > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
> becket.qin@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > Thanks for the comments, Jun.
> > >> > > > >
> > >> > > > > 1. Good point.
> > >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is
> high
> > >> > enough
> > >> > > > so
> > >> > > > > the throttling is impacted.
> > >> > > > >
> > >> > > > > I have updated the KIP to reflect the changes.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > >
> > >> > > > > Jiangjie (Becket) Qin
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io>
> > wrote:
> > >> > > > >
> > >> > > > > > Hi, Jiangjie,
> > >> > > > > >
> > >> > > > > > Sorry for the late response. The proposal sounds good
> > overall. A
> > >> > > couple
> > >> > > > > of
> > >> > > > > > minor comments below.
> > >> > > > > >
> > >> > > > > > 1. For throttling a fetch request, we could potentially just
> > >> send
> > >> > an
> > >> > > > > empty
> > >> > > > > > response. We can return a throttle time calculated from a
> full
> > >> > > > response,
> > >> > > > > > but only mute the channel on the server based on a throttle
> > time
> > >> > > > > calculated
> > >> > > > > > based on the empty response. This has the benefit that the
> > >> server
> > >> > > will
> > >> > > > > mute
> > >> > > > > > the channel much shorter, which will prevent the consumer
> from
> > >> > > > > rebalancing
> > >> > > > > > when throttled.
> > >> > > > > >
> > >> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
> > >> during
> > >> > > the
> > >> > > > > > throttle time X." This has the potential issue that a server
> > may
> > >> > not
> > >> > > > > detect
> > >> > > > > > that a client connection is already gone until after an
> > >> arbitrary
> > >> > > > amount
> > >> > > > > of
> > >> > > > > > time. Perhaps we could still just close a connection if the
> > >> server
> > >> > > has
> > >> > > > > > muted it for longer than connections.max.idle.ms. This will
> > at
> > >> > least
> > >> > > > > bound
> > >> > > > > > the time for a server to detect closed client connections.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > >
> > >> > > > > > Jun
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> > >> becket.qin@gmail.com>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi,
> > >> > > > > > >
> > >> > > > > > > We would like to start the voting thread for KIP-219. The
> > KIP
> > >> > > > proposes
> > >> > > > > to
> > >> > > > > > > improve the quota communication between the brokers and
> > >> clients,
> > >> > > > > > especially
> > >> > > > > > > for cases of long throttling time.
> > >> > > > > > >
> > >> > > > > > > The KIP wiki is following:
> > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > > > > 219+-+Improve+quota+
> > >> > > > > > > communication
> > >> > > > > > >
> > >> > > > > > > The discussion thread is here:
> > >> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Jiangjie (Becket) Qin
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Rajini Sivaram <ra...@gmail.com>.
Hi Becket,

Thanks for the update. Yes, it does address my concern.

+1 (binding)

Regards,

Rajini

On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <be...@gmail.com> wrote:

> Actually returning an empty fetch request may still be useful to reduce the
> throttle time due to request quota violation because the FetchResponse send
> time will be less. I just updated the KIP.
>
> Rajini, does that address your concern?
>
> On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the reply, Jun.
> >
> > Currently the byte rate quota does not apply to HeartbeatRequest,
> > JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> > throttled is because the request quota is violated. In that case, the
> > throttle time does not really matter whether we return a full
> FetchResponse
> > or an empty one. Would it be more consistent if we throttle based on the
> > actual throttle time / channel mute time?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Jiangjie,
> >>
> >> You are right that the heartbeat is done in a channel different from the
> >> fetch request. I think it's still useful to return an empty fetch
> response
> >> when the quota is violated. This way, the throttle time for the
> heartbeat
> >> request won't be large. I agree that we can just mute the channel for
> the
> >> fetch request for the throttle time computed based on a full fetch
> >> response. This probably also partially addresses Rajini's #1 concern.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com>
> wrote:
> >>
> >> > Hi Rajini,
> >> >
> >> > Thanks for the comments. Pleas see the reply inline.
> >> >
> >> > Hi Jun,
> >> >
> >> > Thinking about the consumer rebalance case a bit more, I am not sure
> if
> >> we
> >> > need to worry about the delayed rebalance due to quota violation or
> not.
> >> > The rebalance actually uses a separate channel to coordinator.
> Therefore
> >> > unless the request quota is hit, the rebalance won't be throttled.
> Even
> >> if
> >> > request quota is hit, it seems unlikely to be delayed long. So it
> looks
> >> > that we don't need to unmute the channel earlier than needed. Does
> this
> >> > address your concern?
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> >> rajinisivaram@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Becket,
> >> > >
> >> > > A few questions:
> >> > >
> >> > > 1. KIP says: *Although older client implementations (prior to
> >> knowledge
> >> > of
> >> > > this KIP) will immediately send the next request after the broker
> >> > responds
> >> > > without paying attention to the throttle time field, the broker is
> >> > > protected by virtue of muting the channel for time X. i.e., the next
> >> > > request will not be processed until the channel is unmuted. *
> >> > > For fetch requests, the response is sent immediately and the mute
> >> time on
> >> > > the broker based on empty fetch response will often be zero (unlike
> >> the
> >> > > throttle time returned to the client based on non-empty response).
> >> Won't
> >> > > that lead to a tight loop of fetch requests from old clients
> >> > (particularly
> >> > > expensive with SSL)? Wouldn't it be better to retain current
> behaviour
> >> > for
> >> > > old clients? Also, if we change the behaviour for old clients,
> client
> >> > > metrics that track throttle time will report a lot of throttle
> >> unrelated
> >> > to
> >> > >  actual throttle time.
> >> > >
> >> > For consumers, if quota is violated, the throttle time on the broker
> >> will
> >> > not be 0. It is just that the throttle time will not be increasing
> >> because
> >> > the consumer will return an empty response in this case. So there
> should
> >> > not be a tight loop.
> >> >
> >> >
> >> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> >> > > <http://connections.max.idle.ms> will still be honored during the
> >> > throttle
> >> > > time X. This makes sure that the brokers will detect client
> connection
> >> > > closure in a bounded time.*
> >> > > Wouldn't it be better to bound maximum throttle time to
> >> > > *connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>*? If we mute for a time greater
> than
> >> > > *connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>* and then close a client
> connection
> >> > > simply
> >> > > because we had muted it on the broker for a longer throttle time, we
> >> > force
> >> > > a reconnection and read the next request from the new connection
> >> straight
> >> > > away. This feels the same as a bound on the quota of *
> >> > > connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>*, but with added load on the
> broker
> >> for
> >> > > authenticating another connection (expensive with SSL).
> >> > >
> >> > I think we need to think about the consumer prior to and after this
> KIP
> >> > separately.
> >> >
> >> > For consumer prior to this KIP, even if we unmute the channel after
> >> > connection.max.idle.ms, it is likely that the consumers have already
> >> > reached request.timeout.ms before that and has reconnected to the
> >> broker.
> >> > So there is no real difference whether we close the throttled channel
> or
> >> > not.
> >> >
> >> > For consumers after the KIP, because they will honor the throttle
> time,
> >> > they will back off until throttle time is reached. If that throttle
> >> time is
> >> > longer than connection.max.idle.ms, it seems not a big overhead
> because
> >> > there will only be one connection re-establishment in quite a few
> >> minutes.
> >> > Compared with such overhead, it seems more important to honor the
> quota
> >> so
> >> > the broker can survive.
> >> >
> >> >
> >> > > 3. Are we changing the behaviour of network bandwidth quota for
> >> > > Produce/Fetch and retaining current behaviour for request quotas?
> >> > >
> >> > This is going to be applied to all the quota. Including request
> quotas.
> >> >
> >> >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Rajini
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
> >> > >
> >> > > > Hi, Jiangjie,
> >> > > >
> >> > > > Thanks for the updated KIP. +1
> >> > > >
> >> > > > Jun
> >> > > >
> >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Thanks for the comments, Jun.
> >> > > > >
> >> > > > > 1. Good point.
> >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> >> > enough
> >> > > > so
> >> > > > > the throttling is impacted.
> >> > > > >
> >> > > > > I have updated the KIP to reflect the changes.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Jiangjie (Becket) Qin
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io>
> wrote:
> >> > > > >
> >> > > > > > Hi, Jiangjie,
> >> > > > > >
> >> > > > > > Sorry for the late response. The proposal sounds good
> overall. A
> >> > > couple
> >> > > > > of
> >> > > > > > minor comments below.
> >> > > > > >
> >> > > > > > 1. For throttling a fetch request, we could potentially just
> >> send
> >> > an
> >> > > > > empty
> >> > > > > > response. We can return a throttle time calculated from a full
> >> > > > response,
> >> > > > > > but only mute the channel on the server based on a throttle
> time
> >> > > > > calculated
> >> > > > > > based on the empty response. This has the benefit that the
> >> server
> >> > > will
> >> > > > > mute
> >> > > > > > the channel much shorter, which will prevent the consumer from
> >> > > > > rebalancing
> >> > > > > > when throttled.
> >> > > > > >
> >> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
> >> during
> >> > > the
> >> > > > > > throttle time X." This has the potential issue that a server
> may
> >> > not
> >> > > > > detect
> >> > > > > > that a client connection is already gone until after an
> >> arbitrary
> >> > > > amount
> >> > > > > of
> >> > > > > > time. Perhaps we could still just close a connection if the
> >> server
> >> > > has
> >> > > > > > muted it for longer than connections.max.idle.ms. This will
> at
> >> > least
> >> > > > > bound
> >> > > > > > the time for a server to detect closed client connections.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > >
> >> > > > > > Jun
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> >> becket.qin@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi,
> >> > > > > > >
> >> > > > > > > We would like to start the voting thread for KIP-219. The
> KIP
> >> > > > proposes
> >> > > > > to
> >> > > > > > > improve the quota communication between the brokers and
> >> clients,
> >> > > > > > especially
> >> > > > > > > for cases of long throttling time.
> >> > > > > > >
> >> > > > > > > The KIP wiki is following:
> >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > > > > 219+-+Improve+quota+
> >> > > > > > > communication
> >> > > > > > >
> >> > > > > > > The discussion thread is here:
> >> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Jiangjie (Becket) Qin
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Actually returning an empty fetch request may still be useful to reduce the
throttle time due to request quota violation because the FetchResponse send
time will be less. I just updated the KIP.

Rajini, does that address your concern?

On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <be...@gmail.com> wrote:

> Thanks for the reply, Jun.
>
> Currently the byte rate quota does not apply to HeartbeatRequest,
> JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> throttled is because the request quota is violated. In that case, the
> throttle time does not really matter whether we return a full FetchResponse
> or an empty one. Would it be more consistent if we throttle based on the
> actual throttle time / channel mute time?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Jiangjie,
>>
>> You are right that the heartbeat is done in a channel different from the
>> fetch request. I think it's still useful to return an empty fetch response
>> when the quota is violated. This way, the throttle time for the heartbeat
>> request won't be large. I agree that we can just mute the channel for the
>> fetch request for the throttle time computed based on a full fetch
>> response. This probably also partially addresses Rajini's #1 concern.
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com> wrote:
>>
>> > Hi Rajini,
>> >
>> > Thanks for the comments. Pleas see the reply inline.
>> >
>> > Hi Jun,
>> >
>> > Thinking about the consumer rebalance case a bit more, I am not sure if
>> we
>> > need to worry about the delayed rebalance due to quota violation or not.
>> > The rebalance actually uses a separate channel to coordinator. Therefore
>> > unless the request quota is hit, the rebalance won't be throttled. Even
>> if
>> > request quota is hit, it seems unlikely to be delayed long. So it looks
>> > that we don't need to unmute the channel earlier than needed. Does this
>> > address your concern?
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> rajinisivaram@gmail.com>
>> > wrote:
>> >
>> > > Hi Becket,
>> > >
>> > > A few questions:
>> > >
>> > > 1. KIP says: *Although older client implementations (prior to
>> knowledge
>> > of
>> > > this KIP) will immediately send the next request after the broker
>> > responds
>> > > without paying attention to the throttle time field, the broker is
>> > > protected by virtue of muting the channel for time X. i.e., the next
>> > > request will not be processed until the channel is unmuted. *
>> > > For fetch requests, the response is sent immediately and the mute
>> time on
>> > > the broker based on empty fetch response will often be zero (unlike
>> the
>> > > throttle time returned to the client based on non-empty response).
>> Won't
>> > > that lead to a tight loop of fetch requests from old clients
>> > (particularly
>> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
>> > for
>> > > old clients? Also, if we change the behaviour for old clients, client
>> > > metrics that track throttle time will report a lot of throttle
>> unrelated
>> > to
>> > >  actual throttle time.
>> > >
>> > For consumers, if quota is violated, the throttle time on the broker
>> will
>> > not be 0. It is just that the throttle time will not be increasing
>> because
>> > the consumer will return an empty response in this case. So there should
>> > not be a tight loop.
>> >
>> >
>> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
>> > > <http://connections.max.idle.ms> will still be honored during the
>> > throttle
>> > > time X. This makes sure that the brokers will detect client connection
>> > > closure in a bounded time.*
>> > > Wouldn't it be better to bound maximum throttle time to
>> > > *connections.max.idle.ms
>> > > <http://connections.max.idle.ms>*? If we mute for a time greater than
>> > > *connections.max.idle.ms
>> > > <http://connections.max.idle.ms>* and then close a client connection
>> > > simply
>> > > because we had muted it on the broker for a longer throttle time, we
>> > force
>> > > a reconnection and read the next request from the new connection
>> straight
>> > > away. This feels the same as a bound on the quota of *
>> > > connections.max.idle.ms
>> > > <http://connections.max.idle.ms>*, but with added load on the broker
>> for
>> > > authenticating another connection (expensive with SSL).
>> > >
>> > I think we need to think about the consumer prior to and after this KIP
>> > separately.
>> >
>> > For consumer prior to this KIP, even if we unmute the channel after
>> > connection.max.idle.ms, it is likely that the consumers have already
>> > reached request.timeout.ms before that and has reconnected to the
>> broker.
>> > So there is no real difference whether we close the throttled channel or
>> > not.
>> >
>> > For consumers after the KIP, because they will honor the throttle time,
>> > they will back off until throttle time is reached. If that throttle
>> time is
>> > longer than connection.max.idle.ms, it seems not a big overhead because
>> > there will only be one connection re-establishment in quite a few
>> minutes.
>> > Compared with such overhead, it seems more important to honor the quota
>> so
>> > the broker can survive.
>> >
>> >
>> > > 3. Are we changing the behaviour of network bandwidth quota for
>> > > Produce/Fetch and retaining current behaviour for request quotas?
>> > >
>> > This is going to be applied to all the quota. Including request quotas.
>> >
>> >
>> > >
>> > > Thanks,
>> > >
>> > > Rajini
>> > >
>> > >
>> > >
>> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
>> > >
>> > > > Hi, Jiangjie,
>> > > >
>> > > > Thanks for the updated KIP. +1
>> > > >
>> > > > Jun
>> > > >
>> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Thanks for the comments, Jun.
>> > > > >
>> > > > > 1. Good point.
>> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
>> > enough
>> > > > so
>> > > > > the throttling is impacted.
>> > > > >
>> > > > > I have updated the KIP to reflect the changes.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
>> > > > >
>> > > > > > Hi, Jiangjie,
>> > > > > >
>> > > > > > Sorry for the late response. The proposal sounds good overall. A
>> > > couple
>> > > > > of
>> > > > > > minor comments below.
>> > > > > >
>> > > > > > 1. For throttling a fetch request, we could potentially just
>> send
>> > an
>> > > > > empty
>> > > > > > response. We can return a throttle time calculated from a full
>> > > > response,
>> > > > > > but only mute the channel on the server based on a throttle time
>> > > > > calculated
>> > > > > > based on the empty response. This has the benefit that the
>> server
>> > > will
>> > > > > mute
>> > > > > > the channel much shorter, which will prevent the consumer from
>> > > > > rebalancing
>> > > > > > when throttled.
>> > > > > >
>> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
>> during
>> > > the
>> > > > > > throttle time X." This has the potential issue that a server may
>> > not
>> > > > > detect
>> > > > > > that a client connection is already gone until after an
>> arbitrary
>> > > > amount
>> > > > > of
>> > > > > > time. Perhaps we could still just close a connection if the
>> server
>> > > has
>> > > > > > muted it for longer than connections.max.idle.ms. This will at
>> > least
>> > > > > bound
>> > > > > > the time for a server to detect closed client connections.
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
>> becket.qin@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > We would like to start the voting thread for KIP-219. The KIP
>> > > > proposes
>> > > > > to
>> > > > > > > improve the quota communication between the brokers and
>> clients,
>> > > > > > especially
>> > > > > > > for cases of long throttling time.
>> > > > > > >
>> > > > > > > The KIP wiki is following:
>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 219+-+Improve+quota+
>> > > > > > > communication
>> > > > > > >
>> > > > > > > The discussion thread is here:
>> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
>> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Jiangjie (Becket) Qin
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Thanks for the reply, Jun.

Currently the byte rate quota does not apply to HeartbeatRequest,
JoinGroupRequest/SyncGroupRequest. So the only case those requests are
throttled is because the request quota is violated. In that case, the
throttle time does not really matter whether we return a full FetchResponse
or an empty one. Would it be more consistent if we throttle based on the
actual throttle time / channel mute time?

Thanks,

Jiangjie (Becket) Qin

On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Jiangjie,
>
> You are right that the heartbeat is done in a channel different from the
> fetch request. I think it's still useful to return an empty fetch response
> when the quota is violated. This way, the throttle time for the heartbeat
> request won't be large. I agree that we can just mute the channel for the
> fetch request for the throttle time computed based on a full fetch
> response. This probably also partially addresses Rajini's #1 concern.
>
> Thanks,
>
> Jun
>
> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi Rajini,
> >
> > Thanks for the comments. Pleas see the reply inline.
> >
> > Hi Jun,
> >
> > Thinking about the consumer rebalance case a bit more, I am not sure if
> we
> > need to worry about the delayed rebalance due to quota violation or not.
> > The rebalance actually uses a separate channel to coordinator. Therefore
> > unless the request quota is hit, the rebalance won't be throttled. Even
> if
> > request quota is hit, it seems unlikely to be delayed long. So it looks
> > that we don't need to unmute the channel earlier than needed. Does this
> > address your concern?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <rajinisivaram@gmail.com
> >
> > wrote:
> >
> > > Hi Becket,
> > >
> > > A few questions:
> > >
> > > 1. KIP says: *Although older client implementations (prior to knowledge
> > of
> > > this KIP) will immediately send the next request after the broker
> > responds
> > > without paying attention to the throttle time field, the broker is
> > > protected by virtue of muting the channel for time X. i.e., the next
> > > request will not be processed until the channel is unmuted. *
> > > For fetch requests, the response is sent immediately and the mute time
> on
> > > the broker based on empty fetch response will often be zero (unlike the
> > > throttle time returned to the client based on non-empty response).
> Won't
> > > that lead to a tight loop of fetch requests from old clients
> > (particularly
> > > expensive with SSL)? Wouldn't it be better to retain current behaviour
> > for
> > > old clients? Also, if we change the behaviour for old clients, client
> > > metrics that track throttle time will report a lot of throttle
> unrelated
> > to
> > >  actual throttle time.
> > >
> > For consumers, if quota is violated, the throttle time on the broker will
> > not be 0. It is just that the throttle time will not be increasing
> because
> > the consumer will return an empty response in this case. So there should
> > not be a tight loop.
> >
> >
> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> > > <http://connections.max.idle.ms> will still be honored during the
> > throttle
> > > time X. This makes sure that the brokers will detect client connection
> > > closure in a bounded time.*
> > > Wouldn't it be better to bound maximum throttle time to
> > > *connections.max.idle.ms
> > > <http://connections.max.idle.ms>*? If we mute for a time greater than
> > > *connections.max.idle.ms
> > > <http://connections.max.idle.ms>* and then close a client connection
> > > simply
> > > because we had muted it on the broker for a longer throttle time, we
> > force
> > > a reconnection and read the next request from the new connection
> straight
> > > away. This feels the same as a bound on the quota of *
> > > connections.max.idle.ms
> > > <http://connections.max.idle.ms>*, but with added load on the broker
> for
> > > authenticating another connection (expensive with SSL).
> > >
> > I think we need to think about the consumer prior to and after this KIP
> > separately.
> >
> > For consumer prior to this KIP, even if we unmute the channel after
> > connection.max.idle.ms, it is likely that the consumers have already
> > reached request.timeout.ms before that and has reconnected to the
> broker.
> > So there is no real difference whether we close the throttled channel or
> > not.
> >
> > For consumers after the KIP, because they will honor the throttle time,
> > they will back off until throttle time is reached. If that throttle time
> is
> > longer than connection.max.idle.ms, it seems not a big overhead because
> > there will only be one connection re-establishment in quite a few
> minutes.
> > Compared with such overhead, it seems more important to honor the quota
> so
> > the broker can survive.
> >
> >
> > > 3. Are we changing the behaviour of network bandwidth quota for
> > > Produce/Fetch and retaining current behaviour for request quotas?
> > >
> > This is going to be applied to all the quota. Including request quotas.
> >
> >
> > >
> > > Thanks,
> > >
> > > Rajini
> > >
> > >
> > >
> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > Thanks for the updated KIP. +1
> > > >
> > > > Jun
> > > >
> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for the comments, Jun.
> > > > >
> > > > > 1. Good point.
> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> > enough
> > > > so
> > > > > the throttling is impacted.
> > > > >
> > > > > I have updated the KIP to reflect the changes.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Hi, Jiangjie,
> > > > > >
> > > > > > Sorry for the late response. The proposal sounds good overall. A
> > > couple
> > > > > of
> > > > > > minor comments below.
> > > > > >
> > > > > > 1. For throttling a fetch request, we could potentially just send
> > an
> > > > > empty
> > > > > > response. We can return a throttle time calculated from a full
> > > > response,
> > > > > > but only mute the channel on the server based on a throttle time
> > > > > calculated
> > > > > > based on the empty response. This has the benefit that the server
> > > will
> > > > > mute
> > > > > > the channel much shorter, which will prevent the consumer from
> > > > > rebalancing
> > > > > > when throttled.
> > > > > >
> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
> during
> > > the
> > > > > > throttle time X." This has the potential issue that a server may
> > not
> > > > > detect
> > > > > > that a client connection is already gone until after an arbitrary
> > > > amount
> > > > > of
> > > > > > time. Perhaps we could still just close a connection if the
> server
> > > has
> > > > > > muted it for longer than connections.max.idle.ms. This will at
> > least
> > > > > bound
> > > > > > the time for a server to detect closed client connections.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We would like to start the voting thread for KIP-219. The KIP
> > > > proposes
> > > > > to
> > > > > > > improve the quota communication between the brokers and
> clients,
> > > > > > especially
> > > > > > > for cases of long throttling time.
> > > > > > >
> > > > > > > The KIP wiki is following:
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 219+-+Improve+quota+
> > > > > > > communication
> > > > > > >
> > > > > > > The discussion thread is here:
> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Jun Rao <ju...@confluent.io>.
Hi, Jiangjie,

You are right that the heartbeat is done in a channel different from the
fetch request. I think it's still useful to return an empty fetch response
when the quota is violated. This way, the throttle time for the heartbeat
request won't be large. I agree that we can just mute the channel for the
fetch request for the throttle time computed based on a full fetch
response. This probably also partially addresses Rajini's #1 concern.

Thanks,

Jun

On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <be...@gmail.com> wrote:

> Hi Rajini,
>
> Thanks for the comments. Pleas see the reply inline.
>
> Hi Jun,
>
> Thinking about the consumer rebalance case a bit more, I am not sure if we
> need to worry about the delayed rebalance due to quota violation or not.
> The rebalance actually uses a separate channel to coordinator. Therefore
> unless the request quota is hit, the rebalance won't be throttled. Even if
> request quota is hit, it seems unlikely to be delayed long. So it looks
> that we don't need to unmute the channel earlier than needed. Does this
> address your concern?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <ra...@gmail.com>
> wrote:
>
> > Hi Becket,
> >
> > A few questions:
> >
> > 1. KIP says: *Although older client implementations (prior to knowledge
> of
> > this KIP) will immediately send the next request after the broker
> responds
> > without paying attention to the throttle time field, the broker is
> > protected by virtue of muting the channel for time X. i.e., the next
> > request will not be processed until the channel is unmuted. *
> > For fetch requests, the response is sent immediately and the mute time on
> > the broker based on empty fetch response will often be zero (unlike the
> > throttle time returned to the client based on non-empty response). Won't
> > that lead to a tight loop of fetch requests from old clients
> (particularly
> > expensive with SSL)? Wouldn't it be better to retain current behaviour
> for
> > old clients? Also, if we change the behaviour for old clients, client
> > metrics that track throttle time will report a lot of throttle unrelated
> to
> >  actual throttle time.
> >
> For consumers, if quota is violated, the throttle time on the broker will
> not be 0. It is just that the throttle time will not be increasing because
> the consumer will return an empty response in this case. So there should
> not be a tight loop.
>
>
> > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> > <http://connections.max.idle.ms> will still be honored during the
> throttle
> > time X. This makes sure that the brokers will detect client connection
> > closure in a bounded time.*
> > Wouldn't it be better to bound maximum throttle time to
> > *connections.max.idle.ms
> > <http://connections.max.idle.ms>*? If we mute for a time greater than
> > *connections.max.idle.ms
> > <http://connections.max.idle.ms>* and then close a client connection
> > simply
> > because we had muted it on the broker for a longer throttle time, we
> force
> > a reconnection and read the next request from the new connection straight
> > away. This feels the same as a bound on the quota of *
> > connections.max.idle.ms
> > <http://connections.max.idle.ms>*, but with added load on the broker for
> > authenticating another connection (expensive with SSL).
> >
> I think we need to think about the consumer prior to and after this KIP
> separately.
>
> For consumer prior to this KIP, even if we unmute the channel after
> connection.max.idle.ms, it is likely that the consumers have already
> reached request.timeout.ms before that and has reconnected to the broker.
> So there is no real difference whether we close the throttled channel or
> not.
>
> For consumers after the KIP, because they will honor the throttle time,
> they will back off until throttle time is reached. If that throttle time is
> longer than connection.max.idle.ms, it seems not a big overhead because
> there will only be one connection re-establishment in quite a few minutes.
> Compared with such overhead, it seems more important to honor the quota so
> the broker can survive.
>
>
> > 3. Are we changing the behaviour of network bandwidth quota for
> > Produce/Fetch and retaining current behaviour for request quotas?
> >
> This is going to be applied to all the quota. Including request quotas.
>
>
> >
> > Thanks,
> >
> > Rajini
> >
> >
> >
> > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Thanks for the updated KIP. +1
> > >
> > > Jun
> > >
> > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Thanks for the comments, Jun.
> > > >
> > > > 1. Good point.
> > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> enough
> > > so
> > > > the throttling is impacted.
> > > >
> > > > I have updated the KIP to reflect the changes.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Hi, Jiangjie,
> > > > >
> > > > > Sorry for the late response. The proposal sounds good overall. A
> > couple
> > > > of
> > > > > minor comments below.
> > > > >
> > > > > 1. For throttling a fetch request, we could potentially just send
> an
> > > > empty
> > > > > response. We can return a throttle time calculated from a full
> > > response,
> > > > > but only mute the channel on the server based on a throttle time
> > > > calculated
> > > > > based on the empty response. This has the benefit that the server
> > will
> > > > mute
> > > > > the channel much shorter, which will prevent the consumer from
> > > > rebalancing
> > > > > when throttled.
> > > > >
> > > > > 2. The wiki says "connections.max.idle.ms should be ignored during
> > the
> > > > > throttle time X." This has the potential issue that a server may
> not
> > > > detect
> > > > > that a client connection is already gone until after an arbitrary
> > > amount
> > > > of
> > > > > time. Perhaps we could still just close a connection if the server
> > has
> > > > > muted it for longer than connections.max.idle.ms. This will at
> least
> > > > bound
> > > > > the time for a server to detect closed client connections.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We would like to start the voting thread for KIP-219. The KIP
> > > proposes
> > > > to
> > > > > > improve the quota communication between the brokers and clients,
> > > > > especially
> > > > > > for cases of long throttling time.
> > > > > >
> > > > > > The KIP wiki is following:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 219+-+Improve+quota+
> > > > > > communication
> > > > > >
> > > > > > The discussion thread is here:
> > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Hi Rajini,

Thanks for the comments. Pleas see the reply inline.

Hi Jun,

Thinking about the consumer rebalance case a bit more, I am not sure if we
need to worry about the delayed rebalance due to quota violation or not.
The rebalance actually uses a separate channel to coordinator. Therefore
unless the request quota is hit, the rebalance won't be throttled. Even if
request quota is hit, it seems unlikely to be delayed long. So it looks
that we don't need to unmute the channel earlier than needed. Does this
address your concern?

Thanks,

Jiangjie (Becket) Qin

On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <ra...@gmail.com>
wrote:

> Hi Becket,
>
> A few questions:
>
> 1. KIP says: *Although older client implementations (prior to knowledge of
> this KIP) will immediately send the next request after the broker responds
> without paying attention to the throttle time field, the broker is
> protected by virtue of muting the channel for time X. i.e., the next
> request will not be processed until the channel is unmuted. *
> For fetch requests, the response is sent immediately and the mute time on
> the broker based on empty fetch response will often be zero (unlike the
> throttle time returned to the client based on non-empty response). Won't
> that lead to a tight loop of fetch requests from old clients (particularly
> expensive with SSL)? Wouldn't it be better to retain current behaviour for
> old clients? Also, if we change the behaviour for old clients, client
> metrics that track throttle time will report a lot of throttle unrelated to
>  actual throttle time.
>
For consumers, if quota is violated, the throttle time on the broker will
not be 0. It is just that the throttle time will not be increasing because
the consumer will return an empty response in this case. So there should
not be a tight loop.


> 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> <http://connections.max.idle.ms> will still be honored during the throttle
> time X. This makes sure that the brokers will detect client connection
> closure in a bounded time.*
> Wouldn't it be better to bound maximum throttle time to
> *connections.max.idle.ms
> <http://connections.max.idle.ms>*? If we mute for a time greater than
> *connections.max.idle.ms
> <http://connections.max.idle.ms>* and then close a client connection
> simply
> because we had muted it on the broker for a longer throttle time, we force
> a reconnection and read the next request from the new connection straight
> away. This feels the same as a bound on the quota of *
> connections.max.idle.ms
> <http://connections.max.idle.ms>*, but with added load on the broker for
> authenticating another connection (expensive with SSL).
>
I think we need to think about the consumer prior to and after this KIP
separately.

For consumer prior to this KIP, even if we unmute the channel after
connection.max.idle.ms, it is likely that the consumers have already
reached request.timeout.ms before that and has reconnected to the broker.
So there is no real difference whether we close the throttled channel or
not.

For consumers after the KIP, because they will honor the throttle time,
they will back off until throttle time is reached. If that throttle time is
longer than connection.max.idle.ms, it seems not a big overhead because
there will only be one connection re-establishment in quite a few minutes.
Compared with such overhead, it seems more important to honor the quota so
the broker can survive.


> 3. Are we changing the behaviour of network bandwidth quota for
> Produce/Fetch and retaining current behaviour for request quotas?
>
This is going to be applied to all the quota. Including request quotas.


>
> Thanks,
>
> Rajini
>
>
>
> On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Jiangjie,
> >
> > Thanks for the updated KIP. +1
> >
> > Jun
> >
> > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com> wrote:
> >
> > > Thanks for the comments, Jun.
> > >
> > > 1. Good point.
> > > 2. Also makes sense. Usually the connection.max.idle.ms is high enough
> > so
> > > the throttling is impacted.
> > >
> > > I have updated the KIP to reflect the changes.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > Sorry for the late response. The proposal sounds good overall. A
> couple
> > > of
> > > > minor comments below.
> > > >
> > > > 1. For throttling a fetch request, we could potentially just send an
> > > empty
> > > > response. We can return a throttle time calculated from a full
> > response,
> > > > but only mute the channel on the server based on a throttle time
> > > calculated
> > > > based on the empty response. This has the benefit that the server
> will
> > > mute
> > > > the channel much shorter, which will prevent the consumer from
> > > rebalancing
> > > > when throttled.
> > > >
> > > > 2. The wiki says "connections.max.idle.ms should be ignored during
> the
> > > > throttle time X." This has the potential issue that a server may not
> > > detect
> > > > that a client connection is already gone until after an arbitrary
> > amount
> > > of
> > > > time. Perhaps we could still just close a connection if the server
> has
> > > > muted it for longer than connections.max.idle.ms. This will at least
> > > bound
> > > > the time for a server to detect closed client connections.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We would like to start the voting thread for KIP-219. The KIP
> > proposes
> > > to
> > > > > improve the quota communication between the brokers and clients,
> > > > especially
> > > > > for cases of long throttling time.
> > > > >
> > > > > The KIP wiki is following:
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 219+-+Improve+quota+
> > > > > communication
> > > > >
> > > > > The discussion thread is here:
> > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Rajini Sivaram <ra...@gmail.com>.
Hi Becket,

A few questions:

1. KIP says: *Although older client implementations (prior to knowledge of
this KIP) will immediately send the next request after the broker responds
without paying attention to the throttle time field, the broker is
protected by virtue of muting the channel for time X. i.e., the next
request will not be processed until the channel is unmuted. *
For fetch requests, the response is sent immediately and the mute time on
the broker based on empty fetch response will often be zero (unlike the
throttle time returned to the client based on non-empty response). Won't
that lead to a tight loop of fetch requests from old clients (particularly
expensive with SSL)? Wouldn't it be better to retain current behaviour for
old clients? Also, if we change the behaviour for old clients, client
metrics that track throttle time will report a lot of throttle unrelated to
 actual throttle time.

2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
<http://connections.max.idle.ms> will still be honored during the throttle
time X. This makes sure that the brokers will detect client connection
closure in a bounded time.*
Wouldn't it be better to bound maximum throttle time to
*connections.max.idle.ms
<http://connections.max.idle.ms>*? If we mute for a time greater than
*connections.max.idle.ms
<http://connections.max.idle.ms>* and then close a client connection simply
because we had muted it on the broker for a longer throttle time, we force
a reconnection and read the next request from the new connection straight
away. This feels the same as a bound on the quota of *connections.max.idle.ms
<http://connections.max.idle.ms>*, but with added load on the broker for
authenticating another connection (expensive with SSL).

3. Are we changing the behaviour of network bandwidth quota for
Produce/Fetch and retaining current behaviour for request quotas?

Thanks,

Rajini



On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Jiangjie,
>
> Thanks for the updated KIP. +1
>
> Jun
>
> On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the comments, Jun.
> >
> > 1. Good point.
> > 2. Also makes sense. Usually the connection.max.idle.ms is high enough
> so
> > the throttling is impacted.
> >
> > I have updated the KIP to reflect the changes.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Jiangjie,
> > >
> > > Sorry for the late response. The proposal sounds good overall. A couple
> > of
> > > minor comments below.
> > >
> > > 1. For throttling a fetch request, we could potentially just send an
> > empty
> > > response. We can return a throttle time calculated from a full
> response,
> > > but only mute the channel on the server based on a throttle time
> > calculated
> > > based on the empty response. This has the benefit that the server will
> > mute
> > > the channel much shorter, which will prevent the consumer from
> > rebalancing
> > > when throttled.
> > >
> > > 2. The wiki says "connections.max.idle.ms should be ignored during the
> > > throttle time X." This has the potential issue that a server may not
> > detect
> > > that a client connection is already gone until after an arbitrary
> amount
> > of
> > > time. Perhaps we could still just close a connection if the server has
> > > muted it for longer than connections.max.idle.ms. This will at least
> > bound
> > > the time for a server to detect closed client connections.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We would like to start the voting thread for KIP-219. The KIP
> proposes
> > to
> > > > improve the quota communication between the brokers and clients,
> > > especially
> > > > for cases of long throttling time.
> > > >
> > > > The KIP wiki is following:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 219+-+Improve+quota+
> > > > communication
> > > >
> > > > The discussion thread is here:
> > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Jun Rao <ju...@confluent.io>.
Hi, Jiangjie,

Thanks for the updated KIP. +1

Jun

On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <be...@gmail.com> wrote:

> Thanks for the comments, Jun.
>
> 1. Good point.
> 2. Also makes sense. Usually the connection.max.idle.ms is high enough so
> the throttling is impacted.
>
> I have updated the KIP to reflect the changes.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Jiangjie,
> >
> > Sorry for the late response. The proposal sounds good overall. A couple
> of
> > minor comments below.
> >
> > 1. For throttling a fetch request, we could potentially just send an
> empty
> > response. We can return a throttle time calculated from a full response,
> > but only mute the channel on the server based on a throttle time
> calculated
> > based on the empty response. This has the benefit that the server will
> mute
> > the channel much shorter, which will prevent the consumer from
> rebalancing
> > when throttled.
> >
> > 2. The wiki says "connections.max.idle.ms should be ignored during the
> > throttle time X." This has the potential issue that a server may not
> detect
> > that a client connection is already gone until after an arbitrary amount
> of
> > time. Perhaps we could still just close a connection if the server has
> > muted it for longer than connections.max.idle.ms. This will at least
> bound
> > the time for a server to detect closed client connections.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We would like to start the voting thread for KIP-219. The KIP proposes
> to
> > > improve the quota communication between the brokers and clients,
> > especially
> > > for cases of long throttling time.
> > >
> > > The KIP wiki is following:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 219+-+Improve+quota+
> > > communication
> > >
> > > The discussion thread is here:
> > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Posted by Becket Qin <be...@gmail.com>.
Thanks for the comments, Jun.

1. Good point.
2. Also makes sense. Usually the connection.max.idle.ms is high enough so
the throttling is impacted.

I have updated the KIP to reflect the changes.

Thanks,

Jiangjie (Becket) Qin


On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Jiangjie,
>
> Sorry for the late response. The proposal sounds good overall. A couple of
> minor comments below.
>
> 1. For throttling a fetch request, we could potentially just send an empty
> response. We can return a throttle time calculated from a full response,
> but only mute the channel on the server based on a throttle time calculated
> based on the empty response. This has the benefit that the server will mute
> the channel much shorter, which will prevent the consumer from rebalancing
> when throttled.
>
> 2. The wiki says "connections.max.idle.ms should be ignored during the
> throttle time X." This has the potential issue that a server may not detect
> that a client connection is already gone until after an arbitrary amount of
> time. Perhaps we could still just close a connection if the server has
> muted it for longer than connections.max.idle.ms. This will at least bound
> the time for a server to detect closed client connections.
>
> Thanks,
>
> Jun
>
>
> On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <be...@gmail.com> wrote:
>
> > Hi,
> >
> > We would like to start the voting thread for KIP-219. The KIP proposes to
> > improve the quota communication between the brokers and clients,
> especially
> > for cases of long throttling time.
> >
> > The KIP wiki is following:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 219+-+Improve+quota+
> > communication
> >
> > The discussion thread is here:
> > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
>