You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by 东方甲乙 <25...@qq.com> on 2017/06/04 10:05:36 UTC

Re: [DISCUSS] KIP-148: Add a connect timeout for client

>I guess one obvious question is, how does this interact with retries? 
>Does it result in a failure getting delivered to the end user more
>quickly if connecting is impossible the first few times we try?  Does
>exponential backoff still apply?


Yes, for the retries it will make the end user more quickly to connect.  After the produce request 
failed because of timeout,  network client close the connection and start to connect to the leastLoadedNode node.
If the node has no response, we will quickly close the connecting in the specified timeout and try another node.


And for the exponential backoff, do you mean for the TCP's exponential backoff or the NetworkClient's exponential backoff ?
It seems the NetworkClient has no exponential backoff (the reconnect.backoff.ms parameter)


Thanks
David




------------------ 原始邮件 ------------------
发件人: "Colin McCabe";<cm...@apache.org>;
发送时间: 2017年5月31日(星期三) 凌晨2:44
收件人: "dev"<de...@kafka.apache.org>; 

主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client



On Mon, May 29, 2017, at 15:46, Guozhang Wang wrote:
> On Wed, May 24, 2017 at 9:59 AM, Colin McCabe <cm...@apache.org> wrote:
> 
> > On Tue, May 23, 2017, at 19:07, Guozhang Wang wrote:
> > > I think using a single config to cover end-to-end latency with connecting
> > > and request round-trip may not be best appropriate since 1) some request
> > > may need much more time than others since they are parked (fetch request
> > > with long polling, join group request etc) or throttled,
> >
> > Hmm.  My proposal was to implement _both_ end-to-end timeouts and
> > per-call timeouts.  In that case, some requests needing much more time
> > than others should not be a concern, since we can simply set a higher
> > per-call timeout on the requests we think will need more time.
> >
> > > and 2) some
> > > requests are prerequisite of others, like group request to discover the
> > > coordinator before the fetch offset request, and implementation wise
> > > these
> > > request send/receive is embedded in latter ones, hence it is not clear if
> > > the `request.timeout.ms` should cover just a single RPC or more.
> >
> > As far as I know, the request timeout has always covered a single RP  If
> > we want to implement a higher level timeout that spans multiple RPCs, we
> > can set the per-call timeouts appropriately.  For example:
> >
> > > long deadline = System.currentTimeMillis() + 60000;
> > > callA(callTimeout = deadline - System.currentTimeMillis())
> > > callB(callTimeout = deadline - System.currentTimeMillis())
> >
> >
> I may have misunderstand your previous email. Just clarifying:
> 
> 1) On the client we already have some configs for controlling end-to-end
> timeout, e.g. "max.block.ms" on producer controls how long "send()" and
> "partitionsFor()" will block for, and inside such API calls multiple
> request round trips may be sent, and for the first request round trip, a
> connecting phase may or may not be included. All of these are be covered
> in
> this "max.block.ms" timeout today. However, as we discussed before not
> all
> request round trips have similar latency expectation, so it is better to
> make a per-request "request.timeout.ms" and the overall "max.block.ms"
> would need to be at least the max of them.

That makes sense.

Just to be clear, when you say "per-request timeout" are you talking
about a timeout that can be different for each request?  (This doesn't
exist today, but has been proposed.)  Or are you talking about
request.timeout.ms, the single timeout that currently applies to all
requests in NetworkClient?

> 
> 2) Now back to the question whether we should make "request.timeout.ms"
> include potential connection phase as well: assume we are going to add
> the
> pre-request "request.timeout.ms" as suggested above, then we may still
> have
> a tight bound on how long connecting should take. For example, let's say
> we
> make "joingroup.request.timeout.ms" (or "fetch.request.timeout.ms" to be
> large since we want really long polling behavior) to be a large value,
> say
> 200 seconds, then if the client is trying to connect to the broker while
> sending the request, and the broker has died, then we may still be
> blocked
> waiting for 30 seconds while I think David's motivation is to fail-fast
> in
> these cases.

Thanks for the explanation.  I think I understand better now.  David
wants to be able to have a long timeout for waiting for the server to
process the request, but a shorter timeout for waiting for the
connection to be established.  In that case, implementing the additional
timeout makes sense.

I guess one obvious question is, how does this interact with retries? 
Does it result in a failure getting delivered to the end user more
quickly if connecting is impossible the first few times we try?  Does
exponential backoff still apply?

best,
Colin


> 
> 
> > >
> > > So no matter whether we add a `connect.timeout.ms` in addition to `
> > > request.timeout.ms`, we should consider adding per-request-type timeout
> > > value, and make `request.timeout.ms` a global default; if we add the `
> > > connect.timeout.ms` the per-request value is only for the round trip,
> > > otherwise it is supposed to include the connecting time. Personally I'd
> > > prefer the first option to add a universal `connect.timeout.ms`, and in
> > > another KIP consider adding per-request-type timeout overrides.
> >
> > Why have a special case for time spent connecting, though?  Why would
> > the user care where the time went, as long as the timeout was met?  It
> > feels like this is just a hack because we couldn't raise
> > request.timeout.ms to the value that it "should" have been at for the
> > shorter requests.  As someone already commented, it's confusing to have
> > all these knobs that we don't really need.
> >
> >
> I think that is exactly what David cares (please correct me if I'm
> wrong):
> for some request I would like to wait long enough for it to be completed,
> like join-group request; while at the same time if it has encountered
> some
> issues while trying to connect to the broker to send the join group
> request, I want to be notified sooner.
> 
> 
> > >
> > > BTW if the consumer issue is the only cause that we are having a high
> > > default value, I'd suggest we separate the consumer rebalance timeout and
> > > not piggy-back on the session timeout. Then we can set the default `
> > > request.timeout.ms` to a smaller value, like 10 secs. This is orthogonal
> > > to
> > > this KIP discussion and we can continue this in a separate thread.
> >
> > +1
> >
> > cheers,
> > Colin
> >
> > >
> > >
> > > Guozhang
> > >
> > > On Tue, May 23, 2017 at 3:31 PM, Colin McCabe <cm...@apache.org>
> > wrote:
> > >
> > > > Another note-- it would be really nice if timeouts were end-to-end,
> > > > rather than being set for particular phases of an RP  From a user point
> > > > of view, a 30 second timeout should mean that the call either succeeds
> > > > or fails after 30 seconds, regardless of how much time is spent looking
> > > > for metadata, connecting to brokers, waiting for brokers, etc.  This is
> > > > implemented in AdminClient by setting a deadline when the call is first
> > > > created and referring to that afterwards.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Tue, May 23, 2017, at 13:18, Colin McCabe wrote:
> > > > > In the AdminClient, we allow setting per-call timeouts.  The global
> > > > > timeout is just a default.  It seems like that is really what we
> > should
> > > > > do in the producer and consumer as well, rather than having a lot of
> > > > > special cases for timeouts in  connecting vs. other call states.
> > Then
> > > > > join requests could gave a 5 minute timeout, but other requests could
> > > > > gave a shorter one.  Thoughts?
> > > > >
> > > > > Cheers,
> > > > > Colin
> > > > >
> > > > > OnTue, May 23, 2017, at 04:27, Rajini Sivaram wrote:
> > > > > > Guozhang,
> > > > > >
> > > > > > At the moment we don't have a connect timeout. And the behaviour
> > > > > > suggested
> > > > > > in the KIP is useful to address this.
> > > > > >
> > > > > > We do however have a request.timeout.ms. This is the amount of
> > time it
> > > > > > would take to detect a crashed broker if the broker crashed after a
> > > > > > connection was established. Unfortunately in the consumer, this was
> > > > > > increased to > 5minutes since JoinRequest can take up to
> > > > > > max.poll.interval.ms, which has a default of  5 minutes. Since the
> > > > > > whole point of this timeout is to detect a crashed broker, 5
> > minutes is
> > > > > > too
> > > > > > large.
> > > > > >
> > > > > > My suggestion was to use request.timeout.ms to also detect
> > connection
> > > > > > timeouts to a crashed broker - implement the behavior suggested in
> > the
> > > > > > KIP
> > > > > > without adding a new config parameter. As Ismael has said, this
> > will
> > > > need
> > > > > > to fix request.timeout.ms in the consumer.
> > > > > >
> > > > > >
> > > > > > On Mon, May 22, 2017 at 1:23 PM, Simon Souter <
> > > > simons@cakesolutions.net>
> > > > > > wrote:
> > > > > >
> > > > > > > The following tickets are probably relevant to this KIP:
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-3457
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-1894
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-3834
> > > > > > >
> > > > > > > On 22 May 2017 at 16:30, Rajini Sivaram <rajinisivaram@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Ismael,
> > > > > > > >
> > > > > > > > Yes, agree. My concern was that a connection can be shutdown
> > > > uncleanly at
> > > > > > > > any time. If a client is in the middle of a request, then it
> > times
> > > > out
> > > > > > > > after min(request.timeout.ms, tcp-timeout). If we add another
> > > > config
> > > > > > > > option
> > > > > > > > connect.timeout.ms, then we will sometimes wait for min(
> > > > > > > connect.timeout.ms
> > > > > > > > ,
> > > > > > > > tcp-timeout) and sometimes for min(request.timeout.ms,
> > > > tcp-timeout),
> > > > > > > > depending
> > > > > > > > on connection state. One config option feels neater to me.
> > > > > > > >
> > > > > > > > On Mon, May 22, 2017 at 11:21 AM, Ismael Juma <
> > ismael@juma.me.uk>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Rajini,
> > > > > > > > >
> > > > > > > > > For this to have the desired effect, we'd probably need to
> > lower
> > > > the
> > > > > > > > > default request.timeout.ms for the consumer and fix the
> > > > underlying
> > > > > > > > reason
> > > > > > > > > why it is a little over 5 minutes at the moment.
> > > > > > > > >
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 4:15 PM, Rajini Sivaram <
> > > > > > > rajinisivaram@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi David,
> > > > > > > > > >
> > > > > > > > > > Sorry, what I meant was: Can you reuse the existing
> > > > configuration
> > > > > > > > option
> > > > > > > > > > request.timeout,ms , instead of adding a new config and
> > add the
> > > > > > > > behaviour
> > > > > > > > > > that you have proposed in the KIP for the connection phase
> > > > using this
> > > > > > > > > > timeout? I think the timeout for connection is useful. I am
> > > > not sure
> > > > > > > we
> > > > > > > > > > need another configuration option to implement it.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Rajini
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 11:06 AM, 东方甲乙 <25...@qq.com>
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Rajini.
> > > > > > > > > > >
> > > > > > > > > > > When kafka node' machine is shutdown or network is
> > closed,
> > > > the
> > > > > > > > > connecting
> > > > > > > > > > > phase could not use the request.timeout.ms, because the
> > > > client
> > > > > > > > haven't
> > > > > > > > > > > send a req yet.   And no response for the nio, the
> > selector
> > > > will
> > > > > > > not
> > > > > > > > > > close
> > > > > > > > > > > the connect, so it will not choose other good node to
> > get the
> > > > > > > > metadata.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > David
> > > > > > > > > > >
> > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > *发件人:* "Rajini Sivaram" <ra...@gmail.com>;
> > > > > > > > > > > *发送时间:* 2017年5月22日(星期一) 20:17
> > > > > > > > > > > *收件人:* "dev" <de...@kafka.apache.org>;
> > > > > > > > > > > *主题:* Re: [DISCUSS] KIP-148: Add a connect timeout for
> > client
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi David,
> > > > > > > > > > >
> > > > > > > > > > > Is there a reason why you wouldn't want to use
> > > > request.timeout.ms
> > > > > > > as
> > > > > > > > > the
> > > > > > > > > > > timeout parameter for connections? Then you would use the
> > > > same
> > > > > > > > timeout
> > > > > > > > > > for
> > > > > > > > > > > connected and connecting phases when shutdown is unclean.
> > > > You could
> > > > > > > > > still
> > > > > > > > > > > use the timeout to ensure that next metadata request is
> > sent
> > > > to
> > > > > > > > another
> > > > > > > > > > > node.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Rajini
> > > > > > > > > > >
> > > > > > > > > > > On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <25...@qq.com>
> > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the clarify. For the clarify 2, I think the
> > key
> > > > thing
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > > > users control how much time in maximum to wait for
> > inside
> > > > code,
> > > > > > > but
> > > > > > > > > is
> > > > > > > > > > > the
> > > > > > > > > > > > network client can be aware of the connecting can't be
> > > > finished
> > > > > > > and
> > > > > > > > > > try a
> > > > > > > > > > > > good node. In the producer.sender even the
> > selector.poll
> > > > can
> > > > > > > > timeout,
> > > > > > > > > > but
> > > > > > > > > > > > the next time is also not close the previous connecting
> > > > and try
> > > > > > > > > another
> > > > > > > > > > > > good node.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > In out test env, QA shutdown one of the leader node,
> > the
> > > > producer
> > > > > > > > > send
> > > > > > > > > > > the
> > > > > > > > > > > > request will timeout and close the node's connection
> > then
> > > > request
> > > > > > > > the
> > > > > > > > > > > > metadata.  But sometimes the request node is also the
> > > > shutdown
> > > > > > > > node.
> > > > > > > > > > > When
> > > > > > > > > > > > connecting the shutting down node to get the metadata,
> > it
> > > > is in
> > > > > > > the
> > > > > > > > > > > > connecting phase, network client mark the connecting
> > > > node's state
> > > > > > > > to
> > > > > > > > > > > > CONNECTING, but if the node is shutdown,  the socket
> > can't
> > > > be
> > > > > > > aware
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > connecting is broken. Though the selector.poll has
> > timeout
> > > > > > > > parameter,
> > > > > > > > > > but
> > > > > > > > > > > > it will not close the connection, so the next
> > > > > > > > > > > > time in the "networkclient.maybeUpdate" it will check
> > if
> > > > > > > > > > > > isAnyNodeConnecting, then will not connect to any good
> > > > node the
> > > > > > > get
> > > > > > > > > the
> > > > > > > > > > > > metadata.  It need about several minutes to
> > > > > > > > > > > > aware the connecting is timeout and try other node.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > So I want to add a connect.timeout parameter,  the
> > > > selector can
> > > > > > > > find
> > > > > > > > > > the
> > > > > > > > > > > > connecting is timeout and close the connection.  It
> > seems
> > > > the
> > > > > > > > > currently
> > > > > > > > > > > the
> > > > > > > > > > > > timeout value passed in `selector.poll()`
> > > > > > > > > > > > seems can not do this.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > David
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > > 发件人: "Guozhang Wang";<wa...@gmail.com>;
> > > > > > > > > > > > 发送时间: 2017年5月16日(星期二) 凌晨1:51
> > > > > > > > > > > > 收件人: "dev@kafka.apache.org"<de...@kafka.apache.org>;
> > > > > > > > > > > >
> > > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for
> > client
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi David,
> > > > > > > > > > > >
> > > > > > > > > > > > I may be a bit confused before, just clarifying a few
> > > > things:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. As you mentioned, a client will always try to first
> > > > establish
> > > > > > > > the
> > > > > > > > > > > > connection with a broker node before it tries to send
> > any
> > > > request
> > > > > > > > to
> > > > > > > > > > it.
> > > > > > > > > > > > And after connection is established, it will either
> > > > continuously
> > > > > > > > send
> > > > > > > > > > > many
> > > > > > > > > > > > requests (e.g. produce) for just a single request (e.g.
> > > > metadata)
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > broker, so these two phases are indeed different.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. In the connected phase, connections.max.idle.ms is
> > > > used to
> > > > > > > > > > > > auto-disconnect the socket if no requests has been
> > sent /
> > > > > > > received
> > > > > > > > > > during
> > > > > > > > > > > > that period of time; in the connecting phase, we always
> > > > try to
> > > > > > > > create
> > > > > > > > > > the
> > > > > > > > > > > > socket via "socketChannel.connect" in a non-blocking
> > call,
> > > > and
> > > > > > > then
> > > > > > > > > > > checks
> > > > > > > > > > > > if the connection has been established, but all the
> > > > callers of
> > > > > > > this
> > > > > > > > > > > > function (in either producer or consumer) has a timeout
> > > > parameter
> > > > > > > > as
> > > > > > > > > in
> > > > > > > > > > > > `selector.poll()`, and the timeout parameter is set
> > either
> > > > by
> > > > > > > > > > > calculations
> > > > > > > > > > > > based on metadata.expiration.time and backoff for
> > > > > > > producer#sender,
> > > > > > > > or
> > > > > > > > > > by
> > > > > > > > > > > > directly passed values from consumer#poll(timeout), so
> > > > although
> > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > no
> > > > > > > > > > > > directly config controlling that, users can still
> > control
> > > > how
> > > > > > > much
> > > > > > > > > time
> > > > > > > > > > > in
> > > > > > > > > > > > maximum to wait for inside code.
> > > > > > > > > > > >
> > > > > > > > > > > > I originally thought your scenarios is more on the
> > > > connected
> > > > > > > phase,
> > > > > > > > > but
> > > > > > > > > > > now
> > > > > > > > > > > > I feel you are talking about the connecting phase. For
> > that
> > > > > > > case, I
> > > > > > > > > > still
> > > > > > > > > > > > feel currently the timeout value passed in
> > > > `selector.poll()`
> > > > > > > which
> > > > > > > > is
> > > > > > > > > > > > controllable from user code should be sufficient?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <
> > 254479818@qq.com>
> > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for the delay, thanks for the question.  It
> > seems
> > > > two
> > > > > > > > > different
> > > > > > > > > > > > > parameters to me:
> > > > > > > > > > > > > connect.timeout.ms: only work for the connecting
> > > > phrase, after
> > > > > > > > > > > connected
> > > > > > > > > > > > > phrase this parameter is not used.
> > > > > > > > > > > > > connections.max.idle.ms: currently not work in the
> > > > connecting
> > > > > > > > > phrase
> > > > > > > > > > > > > (only select return readyKeys >0) will add to the
> > expired
> > > > > > > > manager,
> > > > > > > > > > > after
> > > > > > > > > > > > > connected will check if the connection is still
> > alive in
> > > > some
> > > > > > > > time.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Even if we change the connections.max.idle.ms to
> > work
> > > > > > > including
> > > > > > > > > the
> > > > > > > > > > > > > connecting phrase, we can not set this parameter to a
> > > > small
> > > > > > > > value,
> > > > > > > > > > such
> > > > > > > > > > > > as
> > > > > > > > > > > > > 5 seconds. Because the client is maybe busy sending
> > > > message to
> > > > > > > > > other
> > > > > > > > > > > > node,
> > > > > > > > > > > > > it will be disconnected in 5 seconds, so the default
> > > > value of
> > > > > > > > > > > > > connections.max.idle.ms is setting to a larger
> > time. We
> > > > should
> > > > > > > > > have
> > > > > > > > > > > two
> > > > > > > > > > > > > parameters to control the connecting phrase behavior
> > and
> > > > the
> > > > > > > > > > connected
> > > > > > > > > > > > > phrase behavior, do you think so?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > > > > > > 发件人: "Guozhang Wang";<wa...@gmail.com>;
> > > > > > > > > > > > > 发送时间: 2017年5月6日(星期六) 上午7:52
> > > > > > > > > > > > > 收件人: "dev@kafka.apache.org"<de...@kafka.apache.org>;
> > > > > > > > > > > > >
> > > > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for
> > > > client
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello David,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the KIP. For the described issue, I'm
> > > > wondering if
> > > > > > > it
> > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > > resolved by tuning the
> > CONNECTIONS_MAX_IDLE_MS_CONFIG (
> > > > > > > > > > > > > connections.max.idle.ms) on the client side?
> > Default is
> > > > 9
> > > > > > > > minutes.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <
> > 254479818@qq.com>
> > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Currently in our test environment, we found that
> > after
> > > > one of
> > > > > > > > the
> > > > > > > > > > > > broker
> > > > > > > > > > > > > > node crash (reboot or os crash), the client may
> > still
> > > > be
> > > > > > > > > connecting
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > crash node to send metadata request or other
> > request,
> > > > and it
> > > > > > > > > needs
> > > > > > > > > > > > > several
> > > > > > > > > > > > > > minutes to be aware that the connection is timeout
> > > > then try
> > > > > > > > > another
> > > > > > > > > > > > node
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > connect to send the request. Then the client may
> > still
> > > > not be
> > > > > > > > > aware
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > metadata change after several minutes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So I want to add a connect timeout on the  client,
> > > > please
> > > > > > > > take a
> > > > > > > > > > > look
> > > > > > > > > > > > > at:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > 148%3A+Add+a+connect+timeout+for+client
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > [image: cake_logo_strap_screen 400.jpg] <
> > > > http://www.cakesolutions.net>
> > > > > > >
> > > > > > > Simon Souter
> > > > > > > (Office) 0845 617 1200
> > > > > > > Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5
> > 6DA, UK
> > > > > > > [image: twitter-circle-darkgrey.png]
> > > > > > > <https://twitter.com/cakesolutions> [image:
> > > > > > > facebook-circle-darkgrey.png]
> > > > > > > <https://www.facebook.com/cakesolutionslimited/> [image:
> > > > > > > linkedin-circle-darkgrey.png]
> > > > > > > <https://www.linkedin.com/company/cake-solutions-limited>
> > > > > > > [image: Reactive Applications]
> > > > > > > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0>
> > > > > > > Company registered in the UK, No. 4184567 If you have received
> > this
> > > > e-mail
> > > > > > > in error, please accept our apologies, destroy it immediately,
> > and
> > > > it would
> > > > > > > be greatly appreciated if you notified the sender. It is your
> > > > > > > responsibility to protect your system from viruses and any other
> > > > harmful
> > > > > > > code or device. We try to eliminate them from e-mails and
> > > > attachments, but
> > > > > > > we accept no liability for any which remain. We may monitor or
> > > > access any
> > > > > > > or all e-mails sent to us.
> > > > > > > [image: Powered by Sigstr]
> > > > > > > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0/
> > > > watermark>
> > > > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
> 
> 
> 
> -- 
> -- Guozhang