You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by 东方甲乙 <25...@qq.com> on 2017/06/16 14:45:20 UTC

Re: [DISCUSS] KIP-148: Add a connect timeout for client

Hi Colin,
    I think the exponential backoff should still apply, thanks for the explanation.
thanks,
David


------------------ 原始邮件 ------------------
发件人: "Colin McCabe";<cm...@apache.org>;
发送时间: 2017年6月13日(星期二) 凌晨1:43
收件人: "dev"<de...@kafka.apache.org>; 

主题: Re: 回复:Re: [DISCUSS] KIP-148: Add a  connect timeout for client



Just a note: KIP-144 added exponential backoff for broker reconnect
attempts, configured via reconnect.backoff.max.ms.

cheers,
Colin

On Sat, Jun 10, 2017, at 08:42, 东方甲乙 wrote:
> ------------------ 原始邮件 ------------------
> 发件人: "东方甲乙";<25...@qq.com>;
> 发送时间: 2017年6月4日(星期天) 晚上6:05
> 收件人: "dev"<de...@kafka.apache.org>; 
> 
> 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> 
> 
> 
> >I guess one obvious question is, how does this interact with retries? 
> >Does it result in a failure getting delivered to the end user more
> >quickly if connecting is impossible the first few times we try?  Does
> >exponential backoff still apply?
> 
> 
> Yes, for the retries it will make the end user more quickly to connect. 
> After the produce request 
> failed because of timeout,  network client close the connection and start
> to connect to the leastLoadedNode node.
> If the node has no response, we will quickly close the connecting in the
> specified timeout and try another node.
> 
> 
> And for the exponential backoff, do you mean for the TCP's exponential
> backoff or the NetworkClient's exponential backoff ?
> It seems the NetworkClient has no exponential backoff (the
> reconnect.backoff.ms parameter)
> 
> 
> Thanks
> David
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Colin McCabe";<cm...@apache.org>;
> 发送时间: 2017年5月31日(星期三) 凌晨2:44
> 收件人: "dev"<de...@kafka.apache.org>; 
> 
> 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> 
> 
> 
> On Mon, May 29, 2017, at 15:46, Guozhang Wang wrote:
> > On Wed, May 24, 2017 at 9:59 AM, Colin McCabe <cm...@apache.org> wrote:
> > 
> > > On Tue, May 23, 2017, at 19:07, Guozhang Wang wrote:
> > > > I think using a single config to cover end-to-end latency with connecting
> > > > and request round-trip may not be best appropriate since 1) some request
> > > > may need much more time than others since they are parked (fetch request
> > > > with long polling, join group request etc) or throttled,
> > >
> > > Hmm.  My proposal was to implement _both_ end-to-end timeouts and
> > > per-call timeouts.  In that case, some requests needing much more time
> > > than others should not be a concern, since we can simply set a higher
> > > per-call timeout on the requests we think will need more time.
> > >
> > > > and 2) some
> > > > requests are prerequisite of others, like group request to discover the
> > > > coordinator before the fetch offset request, and implementation wise
> > > > these
> > > > request send/receive is embedded in latter ones, hence it is not clear if
> > > > the `request.timeout.ms` should cover just a single RPC or more.
> > >
> > > As far as I know, the request timeout has always covered a single RP  If
> > > we want to implement a higher level timeout that spans multiple RPCs, we
> > > can set the per-call timeouts appropriately.  For example:
> > >
> > > > long deadline = System.currentTimeMillis() + 60000;
> > > > callA(callTimeout = deadline - System.currentTimeMillis())
> > > > callB(callTimeout = deadline - System.currentTimeMillis())
> > >
> > >
> > I may have misunderstand your previous email. Just clarifying:
> > 
> > 1) On the client we already have some configs for controlling end-to-end
> > timeout, e.g. "max.block.ms" on producer controls how long "send()" and
> > "partitionsFor()" will block for, and inside such API calls multiple
> > request round trips may be sent, and for the first request round trip, a
> > connecting phase may or may not be included. All of these are be covered
> > in
> > this "max.block.ms" timeout today. However, as we discussed before not
> > all
> > request round trips have similar latency expectation, so it is better to
> > make a per-request "request.timeout.ms" and the overall "max.block.ms"
> > would need to be at least the max of them.
> 
> That makes sense.
> 
> Just to be clear, when you say "per-request timeout" are you talking
> about a timeout that can be different for each request?  (This doesn't
> exist today, but has been proposed.)  Or are you talking about
> request.timeout.ms, the single timeout that currently applies to all
> requests in NetworkClient?
> 
> > 
> > 2) Now back to the question whether we should make "request.timeout.ms"
> > include potential connection phase as well: assume we are going to add
> > the
> > pre-request "request.timeout.ms" as suggested above, then we may still
> > have
> > a tight bound on how long connecting should take. For example, let's say
> > we
> > make "joingroup.request.timeout.ms" (or "fetch.request.timeout.ms" to be
> > large since we want really long polling behavior) to be a large value,
> > say
> > 200 seconds, then if the client is trying to connect to the broker while
> > sending the request, and the broker has died, then we may still be
> > blocked
> > waiting for 30 seconds while I think David's motivation is to fail-fast
> > in
> > these cases.
> 
> Thanks for the explanation.  I think I understand better now.  David
> wants to be able to have a long timeout for waiting for the server to
> process the request, but a shorter timeout for waiting for the
> connection to be established.  In that case, implementing the additional
> timeout makes sense.
> 
> I guess one obvious question is, how does this interact with retries? 
> Does it result in a failure getting delivered to the end user more
> quickly if connecting is impossible the first few times we try?  Does
> exponential backoff still apply?
> 
> best,
> Colin
> 
> 
> > 
> > 
> > > >
> > > > So no matter whether we add a `connect.timeout.ms` in addition to `
> > > > request.timeout.ms`, we should consider adding per-request-type timeout
> > > > value, and make `request.timeout.ms` a global default; if we add the `
> > > > connect.timeout.ms` the per-request value is only for the round trip,
> > > > otherwise it is supposed to include the connecting time. Personally I'd
> > > > prefer the first option to add a universal `connect.timeout.ms`, and in
> > > > another KIP consider adding per-request-type timeout overrides.
> > >
> > > Why have a special case for time spent connecting, though?  Why would
> > > the user care where the time went, as long as the timeout was met?  It
> > > feels like this is just a hack because we couldn't raise
> > > request.timeout.ms to the value that it "should" have been at for the
> > > shorter requests.  As someone already commented, it's confusing to have
> > > all these knobs that we don't really need.
> > >
> > >
> > I think that is exactly what David cares (please correct me if I'm
> > wrong):
> > for some request I would like to wait long enough for it to be completed,
> > like join-group request; while at the same time if it has encountered
> > some
> > issues while trying to connect to the broker to send the join group
> > request, I want to be notified sooner.
> > 
> > 
> > > >
> > > > BTW if the consumer issue is the only cause that we are having a high
> > > > default value, I'd suggest we separate the consumer rebalance timeout and
> > > > not piggy-back on the session timeout. Then we can set the default `
> > > > request.timeout.ms` to a smaller value, like 10 secs. This is orthogonal
> > > > to
> > > > this KIP discussion and we can continue this in a separate thread.
> > >
> > > +1
> > >
> > > cheers,
> > > Colin
> > >
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Tue, May 23, 2017 at 3:31 PM, Colin McCabe <cm...@apache.org>
> > > wrote:
> > > >
> > > > > Another note-- it would be really nice if timeouts were end-to-end,
> > > > > rather than being set for particular phases of an RP  From a user point
> > > > > of view, a 30 second timeout should mean that the call either succeeds
> > > > > or fails after 30 seconds, regardless of how much time is spent looking
> > > > > for metadata, connecting to brokers, waiting for brokers, etc.  This is
> > > > > implemented in AdminClient by setting a deadline when the call is first
> > > > > created and referring to that afterwards.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, May 23, 2017, at 13:18, Colin McCabe wrote:
> > > > > > In the AdminClient, we allow setting per-call timeouts.  The global
> > > > > > timeout is just a default.  It seems like that is really what we
> > > should
> > > > > > do in the producer and consumer as well, rather than having a lot of
> > > > > > special cases for timeouts in  connecting vs. other call states.
> > > Then
> > > > > > join requests could gave a 5 minute timeout, but other requests could
> > > > > > gave a shorter one.  Thoughts?
> > > > > >
> > > > > > Cheers,
> > > > > > Colin
> > > > > >
> > > > > > OnTue, May 23, 2017, at 04:27, Rajini Sivaram wrote:
> > > > > > > Guozhang,
> > > > > > >
> > > > > > > At the moment we don't have a connect timeout. And the behaviour
> > > > > > > suggested
> > > > > > > in the KIP is useful to address this.
> > > > > > >
> > > > > > > We do however have a request.timeout.ms. This is the amount of
> > > time it
> > > > > > > would take to detect a crashed broker if the broker crashed after a
> > > > > > > connection was established. Unfortunately in the consumer, this was
> > > > > > > increased to > 5minutes since JoinRequest can take up to
> > > > > > > max.poll.interval.ms, which has a default of  5 minutes. Since the
> > > > > > > whole point of this timeout is to detect a crashed broker, 5
> > > minutes is
> > > > > > > too
> > > > > > > large.
> > > > > > >
> > > > > > > My suggestion was to use request.timeout.ms to also detect
> > > connection
> > > > > > > timeouts to a crashed broker - implement the behavior suggested in
> > > the
> > > > > > > KIP
> > > > > > > without adding a new config parameter. As Ismael has said, this
> > > will
> > > > > need
> > > > > > > to fix request.timeout.ms in the consumer.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 22, 2017 at 1:23 PM, Simon Souter <
> > > > > simons@cakesolutions.net>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The following tickets are probably relevant to this KIP:
> > > > > > > >
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-3457
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-1894
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-3834
> > > > > > > >
> > > > > > > > On 22 May 2017 at 16:30, Rajini Sivaram <rajinisivaram@gmail.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Ismael,
> > > > > > > > >
> > > > > > > > > Yes, agree. My concern was that a connection can be shutdown
> > > > > uncleanly at
> > > > > > > > > any time. If a client is in the middle of a request, then it
> > > times
> > > > > out
> > > > > > > > > after min(request.timeout.ms, tcp-timeout). If we add another
> > > > > config
> > > > > > > > > option
> > > > > > > > > connect.timeout.ms, then we will sometimes wait for min(
> > > > > > > > connect.timeout.ms
> > > > > > > > > ,
> > > > > > > > > tcp-timeout) and sometimes for min(request.timeout.ms,
> > > > > tcp-timeout),
> > > > > > > > > depending
> > > > > > > > > on connection state. One config option feels neater to me.
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 11:21 AM, Ismael Juma <
> > > ismael@juma.me.uk>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Rajini,
> > > > > > > > > >
> > > > > > > > > > For this to have the desired effect, we'd probably need to
> > > lower
> > > > > the
> > > > > > > > > > default request.timeout.ms for the consumer and fix the
> > > > > underlying
> > > > > > > > > reason
> > > > > > > > > > why it is a little over 5 minutes at the moment.
> > > > > > > > > >
> > > > > > > > > > Ismael
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 4:15 PM, Rajini Sivaram <
> > > > > > > > rajinisivaram@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi David,
> > > > > > > > > > >
> > > > > > > > > > > Sorry, what I meant was: Can you reuse the existing
> > > > > configuration
> > > > > > > > > option
> > > > > > > > > > > request.timeout,ms , instead of adding a new config and
> > > add the
> > > > > > > > > behaviour
> > > > > > > > > > > that you have proposed in the KIP for the connection phase
> > > > > using this
> > > > > > > > > > > timeout? I think the timeout for connection is useful. I am
> > > > > not sure
> > > > > > > > we
> > > > > > > > > > > need another configuration option to implement it.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Rajini
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 22, 2017 at 11:06 AM, 东方甲乙 <25...@qq.com>
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Rajini.
> > > > > > > > > > > >
> > > > > > > > > > > > When kafka node' machine is shutdown or network is
> > > closed,
> > > > > the
> > > > > > > > > > connecting
> > > > > > > > > > > > phase could not use the request.timeout.ms, because the
> > > > > client
> > > > > > > > > haven't
> > > > > > > > > > > > send a req yet.   And no response for the nio, the
> > > selector
> > > > > will
> > > > > > > > not
> > > > > > > > > > > close
> > > > > > > > > > > > the connect, so it will not choose other good node to
> > > get the
> > > > > > > > > metadata.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > David
> > > > > > > > > > > >
> > > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > > *发件人:* "Rajini Sivaram" <ra...@gmail.com>;
> > > > > > > > > > > > *发送时间:* 2017年5月22日(星期一) 20:17
> > > > > > > > > > > > *收件人:* "dev" <de...@kafka.apache.org>;
> > > > > > > > > > > > *主题:* Re: [DISCUSS] KIP-148: Add a connect timeout for
> > > client
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi David,
> > > > > > > > > > > >
> > > > > > > > > > > > Is there a reason why you wouldn't want to use
> > > > > request.timeout.ms
> > > > > > > > as
> > > > > > > > > > the
> > > > > > > > > > > > timeout parameter for connections? Then you would use the
> > > > > same
> > > > > > > > > timeout
> > > > > > > > > > > for
> > > > > > > > > > > > connected and connecting phases when shutdown is unclean.
> > > > > You could
> > > > > > > > > > still
> > > > > > > > > > > > use the timeout to ensure that next metadata request is
> > > sent
> > > > > to
> > > > > > > > > another
> > > > > > > > > > > > node.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Rajini
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <25...@qq.com>
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the clarify. For the clarify 2, I think the
> > > key
> > > > > thing
> > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > > > > users control how much time in maximum to wait for
> > > inside
> > > > > code,
> > > > > > > > but
> > > > > > > > > > is
> > > > > > > > > > > > the
> > > > > > > > > > > > > network client can be aware of the connecting can't be
> > > > > finished
> > > > > > > > and
> > > > > > > > > > > try a
> > > > > > > > > > > > > good node. In the producer.sender even the
> > > selector.poll
> > > > > can
> > > > > > > > > timeout,
> > > > > > > > > > > but
> > > > > > > > > > > > > the next time is also not close the previous connecting
> > > > > and try
> > > > > > > > > > another
> > > > > > > > > > > > > good node.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > In out test env, QA shutdown one of the leader node,
> > > the
> > > > > producer
> > > > > > > > > > send
> > > > > > > > > > > > the
> > > > > > > > > > > > > request will timeout and close the node's connection
> > > then
> > > > > request
> > > > > > > > > the
> > > > > > > > > > > > > metadata.  But sometimes the request node is also the
> > > > > shutdown
> > > > > > > > > node.
> > > > > > > > > > > > When
> > > > > > > > > > > > > connecting the shutting down node to get the metadata,
> > > it
> > > > > is in
> > > > > > > > the
> > > > > > > > > > > > > connecting phase, network client mark the connecting
> > > > > node's state
> > > > > > > > > to
> > > > > > > > > > > > > CONNECTING, but if the node is shutdown,  the socket
> > > can't
> > > > > be
> > > > > > > > aware
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > connecting is broken. Though the selector.poll has
> > > timeout
> > > > > > > > > parameter,
> > > > > > > > > > > but
> > > > > > > > > > > > > it will not close the connection, so the next
> > > > > > > > > > > > > time in the "networkclient.maybeUpdate" it will check
> > > if
> > > > > > > > > > > > > isAnyNodeConnecting, then will not connect to any good
> > > > > node the
> > > > > > > > get
> > > > > > > > > > the
> > > > > > > > > > > > > metadata.  It need about several minutes to
> > > > > > > > > > > > > aware the connecting is timeout and try other node.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > So I want to add a connect.timeout parameter,  the
> > > > > selector can
> > > > > > > > > find
> > > > > > > > > > > the
> > > > > > > > > > > > > connecting is timeout and close the connection.  It
> > > seems
> > > > > the
> > > > > > > > > > currently
> > > > > > > > > > > > the
> > > > > > > > > > > > > timeout value passed in `selector.poll()`
> > > > > > > > > > > > > seems can not do this.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > > > 发件人: "Guozhang Wang";<wa...@gmail.com>;
> > > > > > > > > > > > > 发送时间: 2017年5月16日(星期二) 凌晨1:51
> > > > > > > > > > > > > 收件人: "dev@kafka.apache.org"<de...@kafka.apache.org>;
> > > > > > > > > > > > >
> > > > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for
> > > client
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi David,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I may be a bit confused before, just clarifying a few
> > > > > things:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. As you mentioned, a client will always try to first
> > > > > establish
> > > > > > > > > the
> > > > > > > > > > > > > connection with a broker node before it tries to send
> > > any
> > > > > request
> > > > > > > > > to
> > > > > > > > > > > it.
> > > > > > > > > > > > > And after connection is established, it will either
> > > > > continuously
> > > > > > > > > send
> > > > > > > > > > > > many
> > > > > > > > > > > > > requests (e.g. produce) for just a single request (e.g.
> > > > > metadata)
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > broker, so these two phases are indeed different.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. In the connected phase, connections.max.idle.ms is
> > > > > used to
> > > > > > > > > > > > > auto-disconnect the socket if no requests has been
> > > sent /
> > > > > > > > received
> > > > > > > > > > > during
> > > > > > > > > > > > > that period of time; in the connecting phase, we always
> > > > > try to
> > > > > > > > > create
> > > > > > > > > > > the
> > > > > > > > > > > > > socket via "socketChannel.connect" in a non-blocking
> > > call,
> > > > > and
> > > > > > > > then
> > > > > > > > > > > > checks
> > > > > > > > > > > > > if the connection has been established, but all the
> > > > > callers of
> > > > > > > > this
> > > > > > > > > > > > > function (in either producer or consumer) has a timeout
> > > > > parameter
> > > > > > > > > as
> > > > > > > > > > in
> > > > > > > > > > > > > `selector.poll()`, and the timeout parameter is set
> > > either
> > > > > by
> > > > > > > > > > > > calculations
> > > > > > > > > > > > > based on metadata.expiration.time and backoff for
> > > > > > > > producer#sender,
> > > > > > > > > or
> > > > > > > > > > > by
> > > > > > > > > > > > > directly passed values from consumer#poll(timeout), so
> > > > > although
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > no
> > > > > > > > > > > > > directly config controlling that, users can still
> > > control
> > > > > how
> > > > > > > > much
> > > > > > > > > > time
> > > > > > > > > > > > in
> > > > > > > > > > > > > maximum to wait for inside code.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I originally thought your scenarios is more on the
> > > > > connected
> > > > > > > > phase,
> > > > > > > > > > but
> > > > > > > > > > > > now
> > > > > > > > > > > > > I feel you are talking about the connecting phase. For
> > > that
> > > > > > > > case, I
> > > > > > > > > > > still
> > > > > > > > > > > > > feel currently the timeout value passed in
> > > > > `selector.poll()`
> > > > > > > > which
> > > > > > > > > is
> > > > > > > > > > > > > controllable from user code should be sufficient?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <
> > > 254479818@qq.com>
> > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry for the delay, thanks for the question.  It
> > > seems
> > > > > two
> > > > > > > > > > different
> > > > > > > > > > > > > > parameters to me:
> > > > > > > > > > > > > > connect.timeout.ms: only work for the connecting
> > > > > phrase, after
> > > > > > > > > > > > connected
> > > > > > > > > > > > > > phrase this parameter is not used.
> > > > > > > > > > > > > > connections.max.idle.ms: currently not work in the
> > > > > connecting
> > > > > > > > > > phrase
> > > > > > > > > > > > > > (only select return readyKeys >0) will add to the
> > > expired
> > > > > > > > > manager,
> > > > > > > > > > > > after
> > > > > > > > > > > > > > connected will check if the connection is still
> > > alive in
> > > > > some
> > > > > > > > > time.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Even if we change the connections.max.idle.ms to
> > > work
> > > > > > > > including
> > > > > > > > > > the
> > > > > > > > > > > > > > connecting phrase, we can not set this parameter to a
> > > > > small
> > > > > > > > > value,
> > > > > > > > > > > such
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > 5 seconds. Because the client is maybe busy sending
> > > > > message to
> > > > > > > > > > other
> > > > > > > > > > > > > node,
> > > > > > > > > > > > > > it will be disconnected in 5 seconds, so the default
> > > > > value of
> > > > > > > > > > > > > > connections.max.idle.ms is setting to a larger
> > > time. We
> > > > > should
> > > > > > > > > > have
> > > > > > > > > > > > two
> > > > > > > > > > > > > > parameters to control the connecting phrase behavior
> > > and
> > > > > the
> > > > > > > > > > > connected
> > > > > > > > > > > > > > phrase behavior, do you think so?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > David
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > > > > 发件人: "Guozhang Wang";<wa...@gmail.com>;
> > > > > > > > > > > > > > 发送时间: 2017年5月6日(星期六) 上午7:52
> > > > > > > > > > > > > > 收件人: "dev@kafka.apache.org"<de...@kafka.apache.org>;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for
> > > > > client
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello David,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP. For the described issue, I'm
> > > > > wondering if
> > > > > > > > it
> > > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > > > resolved by tuning the
> > > CONNECTIONS_MAX_IDLE_MS_CONFIG (
> > > > > > > > > > > > > > connections.max.idle.ms) on the client side?
> > > Default is
> > > > > 9
> > > > > > > > > minutes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <
> > > 254479818@qq.com>
> > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Currently in our test environment, we found that
> > > after
> > > > > one of
> > > > > > > > > the
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > node crash (reboot or os crash), the client may
> > > still
> > > > > be
> > > > > > > > > > connecting
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > crash node to send metadata request or other
> > > request,
> > > > > and it
> > > > > > > > > > needs
> > > > > > > > > > > > > > several
> > > > > > > > > > > > > > > minutes to be aware that the connection is timeout
> > > > > then try
> > > > > > > > > > another
> > > > > > > > > > > > > node
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > connect to send the request. Then the client may
> > > still
> > > > > not be
> > > > > > > > > > aware
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > metadata change after several minutes.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So I want to add a connect timeout on the  client,
> > > > > please
> > > > > > > > > take a
> > > > > > > > > > > > look
> > > > > > > > > > > > > > at:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > > 148%3A+Add+a+connect+timeout+for+client
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > David
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > [image: cake_logo_strap_screen 400.jpg] <
> > > > > http://www.cakesolutions.net>
> > > > > > > >
> > > > > > > > Simon Souter
> > > > > > > > (Office) 0845 617 1200
> > > > > > > > Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5
> > > 6DA, UK
> > > > > > > > [image: twitter-circle-darkgrey.png]
> > > > > > > > <https://twitter.com/cakesolutions> [image:
> > > > > > > > facebook-circle-darkgrey.png]
> > > > > > > > <https://www.facebook.com/cakesolutionslimited/> [image:
> > > > > > > > linkedin-circle-darkgrey.png]
> > > > > > > > <https://www.linkedin.com/company/cake-solutions-limited>
> > > > > > > > [image: Reactive Applications]
> > > > > > > > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0>
> > > > > > > > Company registered in the UK, No. 4184567 If you have received
> > > this
> > > > > e-mail
> > > > > > > > in error, please accept our apologies, destroy it immediately,
> > > and
> > > > > it would
> > > > > > > > be greatly appreciated if you notified the sender. It is your
> > > > > > > > responsibility to protect your system from viruses and any other
> > > > > harmful
> > > > > > > > code or device. We try to eliminate them from e-mails and
> > > > > attachments, but
> > > > > > > > we accept no liability for any which remain. We may monitor or
> > > > > access any
> > > > > > > > or all e-mails sent to us.
> > > > > > > > [image: Powered by Sigstr]
> > > > > > > > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0/
> > > > > watermark>
> > > > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> > 
> > 
> > 
> > -- 
> > -- Guozhang